1
|
Nakai K, Wei L. Recent Advances in the Prediction of Subcellular Localization of Proteins and Related Topics. FRONTIERS IN BIOINFORMATICS 2022; 2:910531. [PMID: 36304291 PMCID: PMC9580943 DOI: 10.3389/fbinf.2022.910531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 04/25/2022] [Indexed: 11/13/2022] Open
Abstract
Prediction of subcellular localization of proteins from their amino acid sequences has a long history in bioinformatics and is still actively developing, incorporating the latest advances in machine learning and proteomics. Notably, deep learning-based methods for natural language processing have made great contributions. Here, we review recent advances in the field as well as its related fields, such as subcellular proteomics and the prediction/recognition of subcellular localization from image data.
Collapse
Affiliation(s)
- Kenta Nakai
- Institute of Medical Science, The University of Tokyo, Minato-Ku, Japan
- *Correspondence: Kenta Nakai,
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China
| |
Collapse
|
2
|
Rojas ML, Kubo MTK, Caetano‐Silva ME, Augusto PED. Ultrasound processing of fruits and vegetables, structural modification and impact on nutrient and bioactive compounds: a review. Int J Food Sci Technol 2021. [DOI: 10.1111/ijfs.15113] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Meliza Lindsay Rojas
- Dirección de Investigación y Desarrollo Universidad Privada del Norte (UPN) Trujillo Peru
| | - Mirian T. K. Kubo
- Department of Agri‐food Industry, Food and Nutrition (LAN) Luiz de Queiroz College of Agriculture (ESALQ) University of São Paulo (USP) Piracicaba Brazil
| | | | - Pedro E. D. Augusto
- Department of Agri‐food Industry, Food and Nutrition (LAN) Luiz de Queiroz College of Agriculture (ESALQ) University of São Paulo (USP) Piracicaba Brazil
- Food and Nutrition Research Center (NAPAN) University of São Paulo (USP) São Paulo Brazil
| |
Collapse
|
3
|
Sahu SS, Loaiza CD, Kaundal R. Plant-mSubP: a computational framework for the prediction of single- and multi-target protein subcellular localization using integrated machine-learning approaches. AOB PLANTS 2020; 12:plz068. [PMID: 32528639 PMCID: PMC7274489 DOI: 10.1093/aobpla/plz068] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Accepted: 10/11/2019] [Indexed: 05/18/2023]
Abstract
The subcellular localization of proteins is very important for characterizing its function in a cell. Accurate prediction of the subcellular locations in computational paradigm has been an active area of interest. Most of the work has been focused on single localization prediction. Only few studies have discussed the multi-target localization, but have not achieved good accuracy so far; in plant sciences, very limited work has been done. Here we report the development of a novel tool Plant-mSubP, which is based on integrated machine learning approaches to efficiently predict the subcellular localizations in plant proteomes. The proposed approach predicts with high accuracy 11 single localizations and three dual locations of plant cell. Several hybrid features based on composition and physicochemical properties of a protein such as amino acid composition, pseudo amino acid composition, auto-correlation descriptors, quasi-sequence-order descriptors and hybrid features are used to represent the protein. The performance of the proposed method has been assessed through a training set as well as an independent test set. Using the hybrid feature of the pseudo amino acid composition, N-Center-C terminal amino acid composition and the dipeptide composition (PseAAC-NCC-DIPEP), an overall accuracy of 81.97 %, 84.75 % and 87.88 % is achieved on the training data set of proteins containing the single-label, single- and dual-label combined, and dual-label proteins, respectively. When tested on the independent data, an accuracy of 64.36 %, 64.84 % and 81.08 % is achieved on the single-label, single- and dual-label, and dual-label proteins, respectively. The prediction models have been implemented on a web server available at http://bioinfo.usu.edu/Plant-mSubP/. The results indicate that the proposed approach is comparable to the existing methods in single localization prediction and outperforms all other existing tools when compared for dual-label proteins. The prediction tool will be a useful resource for better annotation of various plant proteomes.
Collapse
Affiliation(s)
- Sitanshu S Sahu
- Department of Electronics and Communication Engineering, Birla Institute of Technology, Mesra, Ranchi, India
| | - Cristian D Loaiza
- Department of Plants, Soils, and Climate/Center for Integrated BioSystems, College of Agriculture and Applied Sciences, Utah State University, Logan, UT, USA
| | - Rakesh Kaundal
- Department of Plants, Soils, and Climate/Center for Integrated BioSystems, College of Agriculture and Applied Sciences, Utah State University, Logan, UT, USA
- Bioinformatics Facility, Center for Integrated BioSystems, Utah State University, Logan, UT, USA
- Corresponding author’s e-mail address:
| |
Collapse
|
4
|
Kogay R, Neely TB, Birnbaum DP, Hankel CR, Shakya M, Zhaxybayeva O. Machine-Learning Classification Suggests That Many Alphaproteobacterial Prophages May Instead Be Gene Transfer Agents. Genome Biol Evol 2020; 11:2941-2953. [PMID: 31560374 PMCID: PMC6821227 DOI: 10.1093/gbe/evz206] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/25/2019] [Indexed: 12/20/2022] Open
Abstract
Many of the sequenced bacterial and archaeal genomes encode regions of viral provenance. Yet, not all of these regions encode bona fide viruses. Gene transfer agents (GTAs) are thought to be former viruses that are now maintained in genomes of some bacteria and archaea and are hypothesized to enable exchange of DNA within bacterial populations. In Alphaproteobacteria, genes homologous to the "head-tail" gene cluster that encodes structural components of the Rhodobacter capsulatus GTA (RcGTA) are found in many taxa, even if they are only distantly related to Rhodobacter capsulatus. Yet, in most genomes available in GenBank RcGTA-like genes have annotations of typical viral proteins, and therefore are not easily distinguished from their viral homologs without additional analyses. Here, we report a "support vector machine" classifier that quickly and accurately distinguishes RcGTA-like genes from their viral homologs by capturing the differences in the amino acid composition of the encoded proteins. Our open-source classifier is implemented in Python and can be used to scan homologs of the RcGTA genes in newly sequenced genomes. The classifier can also be trained to identify other types of GTAs, or even to detect other elements of viral ancestry. Using the classifier trained on a manually curated set of homologous viruses and GTAs, we detected RcGTA-like "head-tail" gene clusters in 57.5% of the 1,423 examined alphaproteobacterial genomes. We also demonstrated that more than half of the in silico prophage predictions are instead likely to be GTAs, suggesting that in many alphaproteobacterial genomes the RcGTA-like elements remain unrecognized.
Collapse
Affiliation(s)
- Roman Kogay
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire
| | - Taylor B Neely
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire.,Amazon.com Inc., Seattle, WA
| | - Daniel P Birnbaum
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire.,School of Engineering and Applied Sciences, Harvard University, Cambridge, MA
| | - Camille R Hankel
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire.,Department of Earth and Planetary Sciences, Harvard University, Cambridge, MA
| | - Migun Shakya
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire.,Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM
| | - Olga Zhaxybayeva
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire.,Department of Computer Science, Dartmouth College, Hanover, New Hampshire
| |
Collapse
|
5
|
Lande NV, Barua P, Gayen D, Kumar S, Chakraborty S, Chakraborty N. Proteomic dissection of the chloroplast: Moving beyond photosynthesis. J Proteomics 2019; 212:103542. [PMID: 31704367 DOI: 10.1016/j.jprot.2019.103542] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 09/15/2019] [Accepted: 10/03/2019] [Indexed: 01/28/2023]
Abstract
Chloroplast, the photosynthetic machinery, converts photoenergy to ATP and NADPH, which powers the production of carbohydrates from atmospheric CO2 and H2O. It also serves as a major production site of multivariate pro-defense molecules, and coordinate with other organelles for cell defense. Chloroplast harbors 30-50% of total cellular proteins, out of which 80% are membrane residents and are difficult to solubilize. While proteome profiling has illuminated vast areas of biological protein space, a great deal of effort must be invested to understand the proteomic landscape of the chloroplast, which plays central role in photosynthesis, energy metabolism and stress-adaptation. Therefore, characterization of chloroplast proteome would not only provide the foundation for future investigation of expression and function of chloroplast proteins, but would open up new avenues for modulation of plant productivity through synchronizing chloroplastic key components. In this review, we summarize the progress that has been made to build new understanding of the chloroplast proteome and implications of chloroplast dynamicsing generate metabolic energy and modulating stress adaptation.
Collapse
Affiliation(s)
- Nilesh Vikram Lande
- National Institute of Plant Genome Research, Jawaharlal Nehru University Campus, Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Pragya Barua
- National Institute of Plant Genome Research, Jawaharlal Nehru University Campus, Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Dipak Gayen
- National Institute of Plant Genome Research, Jawaharlal Nehru University Campus, Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Sunil Kumar
- National Institute of Plant Genome Research, Jawaharlal Nehru University Campus, Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Subhra Chakraborty
- National Institute of Plant Genome Research, Jawaharlal Nehru University Campus, Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Niranjan Chakraborty
- National Institute of Plant Genome Research, Jawaharlal Nehru University Campus, Aruna Asaf Ali Marg, New Delhi 110067, India.
| |
Collapse
|
6
|
Füssy Z, Faitová T, Oborník M. Subcellular Compartments Interplay for Carbon and Nitrogen Allocation in Chromera velia and Vitrella brassicaformis. Genome Biol Evol 2019; 11:1765-1779. [PMID: 31192348 PMCID: PMC6668581 DOI: 10.1093/gbe/evz123] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/10/2019] [Indexed: 12/20/2022] Open
Abstract
Endosymbioses necessitate functional cooperation of cellular compartments to avoid pathway redundancy and streamline the control of biological processes. To gain insight into the metabolic compartmentation in chromerids, phototrophic relatives to apicomplexan parasites, we prepared a reference set of proteins probably localized to mitochondria, cytosol, and the plastid, taking advantage of available genomic and transcriptomic data. Training of prediction algorithms with the reference set now allows a genome-wide analysis of protein localization in Chromera velia and Vitrella brassicaformis. We confirm that the chromerid plastids house enzymatic pathways needed for their maintenance and photosynthetic activity, but for carbon and nitrogen allocation, metabolite exchange is necessary with the cytosol and mitochondria. This indeed suggests that the regulatory mechanisms operate in the cytosol to control carbon metabolism based on the availability of both light and nutrients. We discuss that this arrangement is largely shared with apicomplexans and dinoflagellates, possibly stemming from a common ancestral metabolic architecture, and supports the mixotrophy of the chromerid algae.
Collapse
Affiliation(s)
- Zoltán Füssy
- Faculty of Science, Department of Molecular Biology and Genetics, University of South Bohemia, České Budějovice, Czech Republic
- Department of Evolutionary Protistology, Institute of Parasitology, Biology Centre CAS, České Budějovice, Czech Republic
| | - Tereza Faitová
- Faculty of Science, Department of Molecular Biology and Genetics, University of South Bohemia, České Budějovice, Czech Republic
- Department of Evolutionary Protistology, Institute of Parasitology, Biology Centre CAS, České Budějovice, Czech Republic
- Faculty of Engineering and Natural Sciences, Department of Computer Science, Johannes Kepler University, Linz, Austria
| | - Miroslav Oborník
- Faculty of Science, Department of Molecular Biology and Genetics, University of South Bohemia, České Budějovice, Czech Republic
- Department of Evolutionary Protistology, Institute of Parasitology, Biology Centre CAS, České Budějovice, Czech Republic
| |
Collapse
|
7
|
SCMPSP: Prediction and characterization of photosynthetic proteins based on a scoring card method. BMC Bioinformatics 2015; 16 Suppl 1:S8. [PMID: 25708243 PMCID: PMC4331707 DOI: 10.1186/1471-2105-16-s1-s8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Photosynthetic proteins (PSPs) greatly differ in their structure and function as they are involved in numerous subprocesses that take place inside an organelle called a chloroplast. Few studies predict PSPs from sequences due to their high variety of sequences and structues. This work aims to predict and characterize PSPs by establishing the datasets of PSP and non-PSP sequences and developing prediction methods. RESULTS A novel bioinformatics method of predicting and characterizing PSPs based on scoring card method (SCMPSP) was used. First, a dataset consisting of 649 PSPs was established by using a Gene Ontology term GO:0015979 and 649 non-PSPs from the SwissProt database with sequence identity <= 25%.- Several prediction methods are presented based on support vector machine (SVM), decision tree J48, Bayes, BLAST, and SCM. The SVM method using dipeptide features-performed well and yielded - a test accuracy of 72.31%. The SCMPSP method uses the estimated propensity scores of 400 dipeptides - as PSPs and has a test accuracy of 71.54%, which is comparable to that of the SVM method. The derived propensity scores of 20 amino acids were further used to identify informative physicochemical properties for characterizing PSPs. The analytical results reveal the following four characteristics of PSPs: 1) PSPs favour hydrophobic side chain amino acids; 2) PSPs are composed of the amino acids prone to form helices in membrane environments; 3) PSPs have low interaction with water; and 4) PSPs prefer to be composed of the amino acids of electron-reactive side chains. CONCLUSIONS The SCMPSP method not only estimates the propensity of a sequence to be PSPs, it also discovers characteristics that further improve understanding of PSPs. The SCMPSP source code and the datasets used in this study are available at http://iclab.life.nctu.edu.tw/SCMPSP/.
Collapse
|
8
|
LacSubPred: predicting subtypes of Laccases, an important lignin metabolism-related enzyme class, using in silico approaches. BMC Bioinformatics 2014; 15 Suppl 11:S15. [PMID: 25350584 PMCID: PMC4251044 DOI: 10.1186/1471-2105-15-s11-s15] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Background Laccases (E.C. 1.10.3.2) are multi-copper oxidases that have gained importance in many industries such as biofuels, pulp production, textile dye bleaching, bioremediation, and food production. Their usefulness stems from the ability to act on a diverse range of phenolic compounds such as o-/p-quinols, aminophenols, polyphenols, polyamines, aryl diamines, and aromatic thiols. Despite acting on a wide range of compounds as a family, individual Laccases often exhibit distinctive and varied substrate ranges. This is likely due to Laccases involvement in many metabolic roles across diverse taxa. Classification systems for multi-copper oxidases have been developed using multiple sequence alignments, however, these systems seem to largely follow species taxonomy rather than substrate ranges, enzyme properties, or specific function. It has been suggested that the roles and substrates of various Laccases are related to their optimal pH. This is consistent with the observation that fungal Laccases usually prefer acidic conditions, whereas plant and bacterial Laccases prefer basic conditions. Based on these observations, we hypothesize that a descriptor-based unsupervised learning system could generate homology independent classification system for better describing the functional properties of Laccases. Results In this study, we first utilized unsupervised learning approach to develop a novel homology independent Laccase classification system. From the descriptors considered, physicochemical properties showed the best performance. Physicochemical properties divided the Laccases into twelve subtypes. Analysis of the clusters using a t-test revealed that the majority of the physicochemical descriptors had statistically significant differences between the classes. Feature selection identified the most important features as negatively charges residues, the peptide isoelectric point, and acidic or amidic residues. Secondly, to allow for classification of new Laccases, a supervised learning system was developed from the clusters. The models showed high performance with an overall accuracy of 99.03%, error of 0.49%, MCC of 0.9367, precision of 94.20%, sensitivity of 94.20%, and specificity of 99.47% in a 5-fold cross-validation test. In an independent test, our models still provide a high accuracy of 97.98%, error rate of 1.02%, MCC of 0.8678, precision of 87.88%, sensitivity of 87.88% and specificity of 98.90%. Conclusion This study provides a useful classification system for better understanding of Laccases from their physicochemical properties perspective. We also developed a publically available web tool for the characterization of Laccase protein sequences (http://lacsubpred.bioinfo.ucr.edu/). Finally, the programs used in the study are made available for researchers interested in applying the system to other enzyme classes (https://github.com/tweirick/SubClPred).
Collapse
|
9
|
Wren JD, Dozmorov MG, Burian D, Kaundal R, Perkins A, Perkins E, Kupfer DM, Springer GK. Proceedings of the 2013 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference. BMC Bioinformatics 2013; 14 Suppl 14:S1. [PMID: 24267415 PMCID: PMC3851158 DOI: 10.1186/1471-2105-14-s14-s1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|