1
|
Veszelyi K, Czegle I, Varga V, Németh CE, Besztercei B, Margittai É. Subcellular Localization of Thioredoxin/Thioredoxin Reductase System-A Missing Link in Endoplasmic Reticulum Redox Balance. Int J Mol Sci 2024; 25:6647. [PMID: 38928353 PMCID: PMC11204020 DOI: 10.3390/ijms25126647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 06/12/2024] [Accepted: 06/14/2024] [Indexed: 06/28/2024] Open
Abstract
The lumen of the endoplasmic reticulum (ER) is usually considered an oxidative environment; however, oxidized thiol-disulfides and reduced pyridine nucleotides occur there parallelly, indicating that the ER lumen lacks components which connect the two systems. Here, we investigated the luminal presence of the thioredoxin (Trx)/thioredoxin reductase (TrxR) proteins, capable of linking the protein thiol and pyridine nucleotide pools in different compartments. It was shown that specific activity of TrxR in the ER is undetectable, whereas higher activities were measured in the cytoplasm and mitochondria. None of the Trx/TrxR isoforms were expressed in the ER by Western blot analysis. Co-localization studies of various isoforms of Trx and TrxR with ER marker Grp94 by immunofluorescent analysis further confirmed their absence from the lumen. The probability of luminal localization of each isoform was also predicted to be very low by several in silico analysis tools. ER-targeted transient transfection of HeLa cells with Trx1 and TrxR1 significantly decreased cell viability and induced apoptotic cell death. In conclusion, the absence of this electron transfer chain may explain the uncoupling of the redox systems in the ER lumen, allowing parallel presence of a reduced pyridine nucleotide and a probably oxidized protein pool necessary for cellular viability.
Collapse
Affiliation(s)
- Krisztina Veszelyi
- Institute of Translational Medicine, Semmelweis University, H-1085 Budapest, Hungary; (K.V.); (V.V.); (B.B.)
| | - Ibolya Czegle
- Department of Internal Medicine and Haematology, Semmelweis University, H-1085 Budapest, Hungary;
| | - Viola Varga
- Institute of Translational Medicine, Semmelweis University, H-1085 Budapest, Hungary; (K.V.); (V.V.); (B.B.)
| | - Csilla Emese Németh
- Institute of Biochemistry and Molecular Biology, Department of Molecular Biology, Semmelweis University, H-1085 Budapest, Hungary;
| | - Balázs Besztercei
- Institute of Translational Medicine, Semmelweis University, H-1085 Budapest, Hungary; (K.V.); (V.V.); (B.B.)
| | - Éva Margittai
- Institute of Translational Medicine, Semmelweis University, H-1085 Budapest, Hungary; (K.V.); (V.V.); (B.B.)
| |
Collapse
|
2
|
Li X, Qian Y, Hu Y, Chen J, Yue H, Deng L. MSF-PFP: A Novel Multisource Feature Fusion Model for Protein Function Prediction. J Chem Inf Model 2024; 64:1502-1511. [PMID: 38413369 DOI: 10.1021/acs.jcim.3c01794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/29/2024]
Abstract
Protein function prediction is essential for disease treatment and drug development; yet, traditional biological experimental methods are less efficient in annotating protein function, and existing automated methods fail to fully leverage protein multisource data. Here, we present MSF-PFP, a computational framework that fuses multisource data features to predict protein function with high accuracy. Our framework designs specific models for feature extraction based on the characteristics of various data sources, including a global-local-individual strategy for local location features. MSF-PFP then integrates extracted features through a multisource feature fusion model, ultimately categorizing protein functions. Experimental results demonstrate that MSF-PFP outperforms eight state-of-the-art models, achieving FMax scores of 0.542, 0.675, and 0.624 for the biological process (BP), molecular function (MF), and cellular component (CC), respectively. The source code and data set for MSF-PFP are available at https://swanhub.co/TianGua/MSF-PFP, facilitating further exploration and validation of the proposed framework. This study highlights the potential of multisource data fusion in enhancing protein function prediction, contributing to improved disease therapy and medication discovery strategies.
Collapse
Affiliation(s)
- Xinhui Li
- School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi 830046, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Yurong Qian
- School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi 830046, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Yue Hu
- School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi 830046, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Jiaying Chen
- School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi 830046, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Haitao Yue
- School of Future Technology, Xinjiang University, Urumqi 830017, China
- Laboratory of Synthetic Biology, School of Life Science and Technology, Xinjiang University, Urumqi 830017, China
| | - Lei Deng
- School of Software, Xinjiang University, Urumqi 830091, China
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
3
|
Nielsen H. Protein Sorting Prediction. Methods Mol Biol 2024; 2715:27-63. [PMID: 37930519 DOI: 10.1007/978-1-0716-3445-5_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2023]
Abstract
Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global property-based, and homology-based prediction. In this chapter, the strengths and drawbacks of each of these approaches are described through many examples of methods that predict secretion, integration into membranes, or subcellular locations in general. The aim of this chapter is to provide a user-level introduction to the field with a minimum of computational theory.
Collapse
Affiliation(s)
- Henrik Nielsen
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.
| |
Collapse
|
4
|
Kushwah AS, Dixit H, Upadhyay V, Yadav S, Verma SK, Prasad R. Elucidating the zinc-binding proteome of Fusarium oxysporum f. sp. lycopersici with particular emphasis on zinc-binding effector proteins. Arch Microbiol 2023; 205:298. [PMID: 37516670 DOI: 10.1007/s00203-023-03638-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/29/2023] [Accepted: 07/14/2023] [Indexed: 07/31/2023]
Abstract
Fusarium oxysporum f. sp. lycopersici is a soil-borne phytopathogenic species which causes vascular wilt disease in the Solanum lycopersicum (tomato). Due to the continuous competition for zinc usage by Fusarium and its host during infection makes zinc-binding proteins a hotspot for focused investigation. Zinc-binding effector proteins are pivotal during the infection process, working in conjunction with other essential proteins crucial for its biological activities. This work aims at identifying and analysing zinc-binding proteins and zinc-binding proteins effector candidates of Fusarium. We have identified three hundred forty-six putative zinc-binding proteins; among these proteins, we got two hundred and thirty zinc-binding proteins effector candidates. The functional annotation, subcellular localization, and Gene Ontology analysis of these putative zinc-binding proteins revealed their probable role in wide range of cellular and biological processes such as metabolism, gene expression, gene expression regulation, protein biosynthesis, protein folding, cell signalling, DNA repair, and RNA processing. Sixteen proteins were found to be putatively secretory in nature. Eleven of these were putative zinc-binding protein effector candidates may be involved in pathogen-host interaction during infection. The information obtained here may enhance our understanding to design, screen, and apply the zinc-metal ion-based antifungal agents to protect the S. lycopersicum and control the vascular wilt caused by F. oxysporum.
Collapse
Affiliation(s)
- Ankita Singh Kushwah
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, 247667, India
| | - Himisha Dixit
- Centre for Computational Biology & Bioinformatics, Central University of Himachal Pradesh, Kangra, Himachal Pradesh, 176206, India
| | - Vipin Upadhyay
- Centre for Computational Biology & Bioinformatics, Central University of Himachal Pradesh, Kangra, Himachal Pradesh, 176206, India
| | - Siddharth Yadav
- Department of Computer Science and Engineering, Thapar Institute of Engineering & Technology, Patiala, Punjab, 147004, India
| | - Shailender Kumar Verma
- Centre for Computational Biology & Bioinformatics, Central University of Himachal Pradesh, Kangra, Himachal Pradesh, 176206, India
- Department of Environmental Studies, University of Delhi, New Delhi, Delhi, 110007, India
| | - Ramasare Prasad
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, 247667, India.
| |
Collapse
|
5
|
Yang B, Zhang L, Xiang S, Chen H, Qu C, Lu K, Li J. Identification of Trehalose-6-Phosphate Synthase (TPS) Genes Associated with Both Source-/Sink-Related Yield Traits and Drought Response in Rapeseed ( Brassica napus L.). PLANTS (BASEL, SWITZERLAND) 2023; 12:981. [PMID: 36903842 PMCID: PMC10005558 DOI: 10.3390/plants12050981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 02/15/2023] [Accepted: 02/16/2023] [Indexed: 06/18/2023]
Abstract
Trehalose-6-phosphate synthase (TPS) is an important enzyme for the synthesis of Trehalose-6-phosphate (T6P). In addition to being a signaling regulator of carbon allocation that improves crop yields, T6P also plays essential roles in desiccation tolerance. However, comprehensive studies, such as evolutionary analysis, expression analysis, and functional classification of the TPS family in rapeseed (Brassica napus L.) are lacking. Here, we identified 35 BnTPSs, 14 BoTPSs, and 17 BrTPSs in cruciferous plants, which were classified into three subfamilies. Phylogenetic and syntenic analysis of TPS genes in four cruciferous species indicated that only gene elimination occurred during their evolution. Combined phylogenetic, protein property, and expression analysis of the 35 BnTPSs suggested that changes in gene structures might have led to changes in their expression profiles and further functional differentiation during their evolution. In addition, we analyzed one set of transcriptome data from Zhongshuang11 (ZS11) and two sets of data from extreme materials associated with source-/sink-related yield traits and the drought response. The expression levels of four BnTPSs (BnTPS6, BnTPS8, BnTPS9, and BnTPS11) increased sharply after drought stress, and three differentially expressed genes (BnTPS1, BnTPS5, and BnTPS9) exhibited variable expression patterns among source and sink tissues between yield-related materials. Our findings provide a reference for fundamental studies of TPSs in rapeseed and a framework for future functional research of the roles of BnTPSs in both yield and drought resistance.
Collapse
Affiliation(s)
- Bo Yang
- Chongqing Rapeseed Engineering Research Center, College of Agronomy and Biotechnology, Southwest University, Chongqing 400716, China
| | - Liyuan Zhang
- Academy of Agricultural Sciences, Southwest University, Chongqing 400716, China
| | - Sirou Xiang
- Chongqing Rapeseed Engineering Research Center, College of Agronomy and Biotechnology, Southwest University, Chongqing 400716, China
| | - Huan Chen
- Chongqing Rapeseed Engineering Research Center, College of Agronomy and Biotechnology, Southwest University, Chongqing 400716, China
| | - Cunmin Qu
- Chongqing Rapeseed Engineering Research Center, College of Agronomy and Biotechnology, Southwest University, Chongqing 400716, China
- Academy of Agricultural Sciences, Southwest University, Chongqing 400716, China
| | - Kun Lu
- Chongqing Rapeseed Engineering Research Center, College of Agronomy and Biotechnology, Southwest University, Chongqing 400716, China
- Academy of Agricultural Sciences, Southwest University, Chongqing 400716, China
| | - Jiana Li
- Chongqing Rapeseed Engineering Research Center, College of Agronomy and Biotechnology, Southwest University, Chongqing 400716, China
| |
Collapse
|
6
|
Li Z, Gao E, Zhou J, Han W, Xu X, Gao X. Applications of deep learning in understanding gene regulation. CELL REPORTS METHODS 2023; 3:100384. [PMID: 36814848 PMCID: PMC9939384 DOI: 10.1016/j.crmeth.2022.100384] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Gene regulation is a central topic in cell biology. Advances in omics technologies and the accumulation of omics data have provided better opportunities for gene regulation studies than ever before. For this reason deep learning, as a data-driven predictive modeling approach, has been successfully applied to this field during the past decade. In this article, we aim to give a brief yet comprehensive overview of representative deep-learning methods for gene regulation. Specifically, we discuss and compare the design principles and datasets used by each method, creating a reference for researchers who wish to replicate or improve existing methods. We also discuss the common problems of existing approaches and prospectively introduce the emerging deep-learning paradigms that will potentially alleviate them. We hope that this article will provide a rich and up-to-date resource and shed light on future research directions in this area.
Collapse
Affiliation(s)
- Zhongxiao Li
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Elva Gao
- The KAUST School, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Juexiao Zhou
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Wenkai Han
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Xiaopeng Xu
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| |
Collapse
|
7
|
Mitra N, Dey S. Understanding the catalytic abilities of class IV sirtuin OsSRT1 and its linkage to the DNA repair system under stress conditions. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2022; 323:111398. [PMID: 35917976 DOI: 10.1016/j.plantsci.2022.111398] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 07/04/2022] [Accepted: 07/24/2022] [Indexed: 06/15/2023]
Abstract
The roles of sirtuins in plants are slowly unraveling. Regarding OsSRT1, there are only reports of its H3K9Ac deacetylation. Here we detect the other lysine deacetylation sites in histones, H3 and H4. Further, our studies shed light on its dual enzyme capability with preference for mono ADP ribosylation over deacetylation. OsSRT1 can specifically transfer the single ADP ribose group on its substrates in an enzymatic manner. This mono ADPr effect is not well known in plants, more so for deacetylases. The products of this reaction (NAM and ADP ribose) have a negative effect on this enzyme's action suggesting a tighter regulation. Resveratrol, a natural plant polyphenol proves to be a good activator of this enzyme at 150 ± 40 µM concentration. Under different abiotic stress conditions, we could link this ADP ribosylase activity to the DNA damage repair (DDR) pathway by activating the enzyme PARP1. There is also evidence of OsSRT1's interaction with the components of DDR machinery. Changes in the extent of different histone deacetylation by OsSRT1 is also related with these stress conditions. Metal stress in plants also influences these enzyme activities. Structurally there is a long C-terminal domain in OsSRT1 in comparison to other classes of plant sirtuins, which is required for its catalysis.
Collapse
Affiliation(s)
- Nilabhra Mitra
- Department of Life Sciences, Presidency University, 86/1 College Street, Kolkata, West Bengal 700073, India
| | - Sanghamitra Dey
- Department of Life Sciences, Presidency University, 86/1 College Street, Kolkata, West Bengal 700073, India.
| |
Collapse
|
8
|
Kha QH, Ho QT, Le NQK. Identifying SNARE Proteins Using an Alignment-Free Method Based on Multiscan Convolutional Neural Network and PSSM Profiles. J Chem Inf Model 2022; 62:4820-4826. [PMID: 36166351 PMCID: PMC9554904 DOI: 10.1021/acs.jcim.2c01034] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
![]()
Background: SNARE proteins play a vital
role in
membrane fusion and cellular physiology and pathological processes.
Many potential therapeutics for mental diseases or even cancer based
on SNAREs are also developed. Therefore, there is a dire need to predict
the SNAREs for further manipulation of these essential proteins, which
demands new and efficient approaches. Methods: Some
computational frameworks were proposed to tackle the hurdles of biological
methods, which take plenty of time and budget to conduct the identification
of SNAREs. However, the performances of existing frameworks were insufficiently
satisfied, as they failed to retain the SNARE sequence order and capture
the mass hidden features from SNAREs. This paper proposed a novel
model constructed on the multiscan convolutional neural network (CNN)
and position-specific scoring matrix (PSSM) profiles to address these
limitations. We employed and trained our model on the benchmark dataset
with fivefold cross-validation and two different independent datasets. Results: Overall, the multiscan CNN was cross-validated
on the training set and excelled in the SNARE classification reaching
0.963 in AUC and 0.955 in AUPRC. On top of that, with the sensitivity,
specificity, accuracy, and MCC of 0.842, 0.968, 0.955, and 0.767,
respectively, our proposed framework outperformed previous models
in the SNARE recognition task. Conclusions: It is
truly believed that our model can contribute to the discrimination
of SNARE proteins and general proteins.
Collapse
Affiliation(s)
- Quang-Hien Kha
- International Master/Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan
| | - Quang-Thai Ho
- College of Information & Communication Technology, Can Tho University, Can Tho 90000, Viet Nam.,Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan.,Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan.,Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan
| |
Collapse
|
9
|
Tu Y, Lei H, Shen HB, Yang Y. SIFLoc: a self-supervised pre-training method for enhancing the recognition of protein subcellular localization in immunofluorescence microscopic images. Brief Bioinform 2022; 23:6527276. [DOI: 10.1093/bib/bbab605] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 12/15/2021] [Accepted: 12/27/2021] [Indexed: 12/19/2022] Open
Abstract
Abstract
With the rapid growth of high-resolution microscopy imaging data, revealing the subcellular map of human proteins has become a central task in the spatial proteome. The cell atlas of the Human Protein Atlas (HPA) provides precious resources for recognizing subcellular localization patterns at the cell level, and the large-scale annotated data enable learning via advanced deep neural networks. However, the existing predictors still suffer from the imbalanced class distribution and the lack of labeled data for minor classes. Thus, it is necessary to develop new methods for coping with these issues. We leverage the self-supervised learning protocol to address these problems. Especially, we propose a pre-training scheme to enhance the conventional supervised learning framework called SIFLoc. The pre-training is featured by a hybrid data augmentation method and a modified contrastive loss function, aiming to learn good feature representations from microscopic images. The experiments are performed on a large-scale immunofluorescence microscopic image dataset collected from the HPA database. Using the same deep neural networks as the classifier, the model pre-trained via SIFLoc not only outperforms the model without pre-training by a large margin but also shows advantages over the state-of-the-art self-supervised learning methods. Especially, SIFLoc improves the prediction accuracy for minor organelles significantly.
Collapse
Affiliation(s)
- Yanlun Tu
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200240 Shanghai, China
| | - Houchao Lei
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200240 Shanghai, China
| | - Hong-Bin Shen
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200240 Shanghai, China
- Institute of Image Processing and Pattern Recognition and Key Laboratory of System Control and Information Processing, Shanghai Jiao Tong University, 200240 Shanghai, China
| | - Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200240 Shanghai, China
| |
Collapse
|
10
|
Yadav NS, Kumar P, Singh I. Structural and functional analysis of protein. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00026-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
11
|
Ofer D, Brandes N, Linial M. The language of proteins: NLP, machine learning & protein sequences. Comput Struct Biotechnol J 2021; 19:1750-1758. [PMID: 33897979 PMCID: PMC8050421 DOI: 10.1016/j.csbj.2021.03.022] [Citation(s) in RCA: 97] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 03/19/2021] [Accepted: 03/19/2021] [Indexed: 12/12/2022] Open
Abstract
Natural language processing (NLP) is a field of computer science concerned with automated text and language analysis. In recent years, following a series of breakthroughs in deep and machine learning, NLP methods have shown overwhelming progress. Here, we review the success, promise and pitfalls of applying NLP algorithms to the study of proteins. Proteins, which can be represented as strings of amino-acid letters, are a natural fit to many NLP methods. We explore the conceptual similarities and differences between proteins and language, and review a range of protein-related tasks amenable to machine learning. We present methods for encoding the information of proteins as text and analyzing it with NLP methods, reviewing classic concepts such as bag-of-words, k-mers/n-grams and text search, as well as modern techniques such as word embedding, contextualized embedding, deep learning and neural language models. In particular, we focus on recent innovations such as masked language modeling, self-supervised learning and attention-based models. Finally, we discuss trends and challenges in the intersection of NLP and protein research.
Collapse
Affiliation(s)
| | - Nadav Brandes
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
12
|
Pyrih J, Žárský V, Fellows JD, Grosche C, Wloga D, Striepen B, Maier UG, Tachezy J. The iron-sulfur scaffold protein HCF101 unveils the complexity of organellar evolution in SAR, Haptista and Cryptista. BMC Ecol Evol 2021; 21:46. [PMID: 33740894 PMCID: PMC7980591 DOI: 10.1186/s12862-021-01777-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 03/08/2021] [Indexed: 11/22/2022] Open
Abstract
Background Nbp35-like proteins (Nbp35, Cfd1, HCF101, Ind1, and AbpC) are P-loop NTPases that serve as components of iron-sulfur cluster (FeS) assembly machineries. In eukaryotes, Ind1 is present in mitochondria, and its function is associated with the assembly of FeS clusters in subunits of respiratory Complex I, Nbp35 and Cfd1 are the components of the cytosolic FeS assembly (CIA) pathway, and HCF101 is involved in FeS assembly of photosystem I in plastids of plants (chHCF101). The AbpC protein operates in Bacteria and Archaea. To date, the cellular distribution of these proteins is considered to be highly conserved with only a few exceptions. Results We searched for the genes of all members of the Nbp35-like protein family and analyzed their targeting sequences. Nbp35 and Cfd1 were predicted to reside in the cytoplasm with some exceptions of Nbp35 localization to the mitochondria; Ind1was found in the mitochondria, and HCF101 was predicted to reside in plastids (chHCF101) of all photosynthetically active eukaryotes. Surprisingly, we found a second HCF101 paralog in all members of Cryptista, Haptista, and SAR that was predicted to predominantly target mitochondria (mHCF101), whereas Ind1 appeared to be absent in these organisms. We also identified a few exceptions, as apicomplexans possess mHCF101 predicted to localize in the cytosol and Nbp35 in the mitochondria. Our predictions were experimentally confirmed in selected representatives of Apicomplexa (Toxoplasma gondii), Stramenopila (Phaeodactylum tricornutum, Thalassiosira pseudonana), and Ciliophora (Tetrahymena thermophila) by tagging proteins with a transgenic reporter. Phylogenetic analysis suggested that chHCF101 and mHCF101 evolved from a common ancestral HCF101 independently of the Nbp35/Cfd1 and Ind1 proteins. Interestingly, phylogenetic analysis supports rather a lateral gene transfer of ancestral HCF101 from bacteria than its acquisition being associated with either α-proteobacterial or cyanobacterial endosymbionts. Conclusion Our searches for Nbp35-like proteins across eukaryotic lineages revealed that SAR, Haptista, and Cryptista possess mitochondrial HCF101. Because plastid localization of HCF101 was only known thus far, the discovery of its mitochondrial paralog explains confusion regarding the presence of HCF101 in organisms that possibly lost secondary plastids (e.g., ciliates, Cryptosporidium) or possess reduced nonphotosynthetic plastids (apicomplexans). Supplementary Information The online version contains supplementary material available at 10.1186/s12862-021-01777-x.
Collapse
Affiliation(s)
- Jan Pyrih
- Department of Parasitology, Faculty of Science, Charles University, BIOCEV, Průmyslová 595, 25250, Vestec, Czech Republic
| | - Vojtěch Žárský
- Department of Parasitology, Faculty of Science, Charles University, BIOCEV, Průmyslová 595, 25250, Vestec, Czech Republic
| | - Justin D Fellows
- Department of Cellular Biology, University of Georgia, Athens, GA, USA
| | - Christopher Grosche
- Laboratory for Cell Biology, Philipps University Marburg, Karl-von-Frisch-Str. 8, 35032, Marburg, Germany.,LOEWE Center for Synthetic Microbiology (Synmikro), Hans-Meerwein-Str. 6, 35032, Marburg, Germany
| | - Dorota Wloga
- Laboratory of Cytoskeleton and Cilia Biology, Nencki Institute of Experimental Biology of Polish Academy of Sciences, 3 Pasteur Street, 02-093, Warsaw, Poland
| | - Boris Striepen
- Department of Cellular Biology, University of Georgia, Athens, GA, USA.,Department of Pathobiology, School of Veterinary Medicine, University of Pennsylvania, 380 South University Avenue, Philadelphia, PA, 19104, USA
| | - Uwe G Maier
- Laboratory for Cell Biology, Philipps University Marburg, Karl-von-Frisch-Str. 8, 35032, Marburg, Germany.,LOEWE Center for Synthetic Microbiology (Synmikro), Hans-Meerwein-Str. 6, 35032, Marburg, Germany
| | - Jan Tachezy
- Department of Parasitology, Faculty of Science, Charles University, BIOCEV, Průmyslová 595, 25250, Vestec, Czech Republic.
| |
Collapse
|
13
|
Imai K, Nakai K. Tools for the Recognition of Sorting Signals and the Prediction of Subcellular Localization of Proteins From Their Amino Acid Sequences. Front Genet 2020; 11:607812. [PMID: 33324450 PMCID: PMC7723863 DOI: 10.3389/fgene.2020.607812] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 11/03/2020] [Indexed: 12/13/2022] Open
Abstract
At the time of translation, nascent proteins are thought to be sorted into their final subcellular localization sites, based on the part of their amino acid sequences (i.e., sorting or targeting signals). Thus, it is interesting to computationally recognize these signals from the amino acid sequences of any given proteins and to predict their final subcellular localization with such information, supplemented with additional information (e.g., k-mer frequency). This field has a long history and many prediction tools have been released. Even in this era of proteomic atlas at the single-cell level, researchers continue to develop new algorithms, aiming at accessing the impact of disease-causing mutations/cell type-specific alternative splicing, for example. In this article, we overview the entire field and discuss its future direction.
Collapse
Affiliation(s)
- Kenichiro Imai
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Kenta Nakai
- The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
14
|
Semwal R, Varadwaj PK. HumDLoc: Human Protein Subcellular Localization Prediction Using Deep Neural Network. Curr Genomics 2020; 21:546-557. [PMID: 33214771 PMCID: PMC7604748 DOI: 10.2174/1389202921999200528160534] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 03/27/2020] [Accepted: 03/30/2020] [Indexed: 11/24/2022] Open
Abstract
Aims To develop a tool that can annotate subcellular localization of human proteins. Background With the progression of high throughput human proteomics projects, an enormous amount of protein sequence data has been discovered in the recent past. All these raw sequence data require precise mapping and annotation for their respective biological role and functional attributes. The functional characteristics of protein molecules are highly dependent on the subcellular localization/compartment. Therefore, a fully automated and reliable protein subcellular localization prediction system would be very useful for current proteomic research. Objective To develop a machine learning-based predictive model that can annotate the subcellular localization of human proteins with high accuracy and precision. Methods In this study, we used the PSI-CD-HIT homology criterion and utilized the sequence-based features of protein sequences to develop a powerful subcellular localization predictive model. The dataset used to train the HumDLoc model was extracted from a reliable data source, Uniprot knowledge base, which helps the model to generalize on the unseen dataset. Results The proposed model, HumDLoc, was compared with two of the most widely used techniques: CELLO and DeepLoc, and other machine learning-based tools. The result demonstrated promising predictive performance of HumDLoc model based on various machine learning parameters such as accuracy (≥97.00%), precision (≥0.86), recall (≥0.89), MCC score (≥0.86), ROC curve (0.98 square unit), and precision-recall curve (0.93 square unit). Conclusion In conclusion, HumDLoc was able to outperform several alternative tools for correctly predicting subcellular localization of human proteins. The HumDLoc has been hosted as a web-based tool at https://bioserver.iiita.ac.in/HumDLoc/.
Collapse
Affiliation(s)
- Rahul Semwal
- 1Department of Information Technology (Bioinformatics), Indian Institute of Information Technology-Allahabad, Jhalwa, Prayagraj, India; 2Department of Bioinformatics and Applied Science, Indian Institute of Information Technology-Allahabad, Jhalwa, Prayagraj, India
| | - Pritish Kumar Varadwaj
- 1Department of Information Technology (Bioinformatics), Indian Institute of Information Technology-Allahabad, Jhalwa, Prayagraj, India; 2Department of Bioinformatics and Applied Science, Indian Institute of Information Technology-Allahabad, Jhalwa, Prayagraj, India
| |
Collapse
|
15
|
Plasma Proteome Profiling of Coronary Artery Disease Patients: Downregulation of Transthyretin-An Important Event. Mediators Inflamm 2020; 2020:3429541. [PMID: 33299376 PMCID: PMC7707994 DOI: 10.1155/2020/3429541] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 10/24/2020] [Indexed: 02/07/2023] Open
Abstract
Coronary artery disease (CAD) is a prevalent chronic inflammatory cardiac disorder. An early diagnosis is likely to help in the prevention and proper management of this disease. As the study of proteomics provides the potential markers for detection of a disease, in the present investigation, attempt has been made to identify disease-associated differential proteins involved in CAD pathogenesis. For this study, a total of 200 selected CAD patients were considered, who were recruited for percutaneous coronary intervention (PCI) treatment. The proteomic analysis was performed using two-dimensional gel electrophoresis (2-DE) and MALDI-TOF MS/MS. Samples were also subjected to Western blot analysis, enzyme-linked immunosorbent assay (ELISA), peripheral blood mononuclear cells isolation immunofluorescence (IF) analysis, analytical screening by fluorescence-activated cell sorting (FACS), and in silico analysis. The representative data were shown as mean ± SD of at least three experiments. A total of 19 proteins were identified. Among them, the most abundant five proteins (serotransferrin, talin-1, alpha-2HS glycoprotein, transthyretin (TTR), fibrinogen-α chain) were found to have altered level in CAD. Serotransferrin, talin-1, alpha-2HS glycoprotein, and transthyretin (TTR) were found to have lower level, whereas fibrinogen-α chain was found to have higher level in CAD plasma compared to healthy, confirmed by Western blot analysis. TTR, an important acute phase transport protein, was validated low level in 200 CAD patients who confirmed to undergo PCI treatment. Further, in silico and in vitro studies of TTR indicated a downexpression of CAD in plasma as compared to the plasma of healthy individuals. Lower level of plasma TTR was determined to be an important risk marker in the atherosclerotic-approved CAD patients. We suggest that the TTR lower level predicts disease severity and hence may serve as an important marker tool for CAD screening. However, further large-scale studies are required to determine the clinical significance of TTR.
Collapse
|
16
|
Rotenberg D, Baumann AA, Ben-Mahmoud S, Christiaens O, Dermauw W, Ioannidis P, Jacobs CGC, Vargas Jentzsch IM, Oliver JE, Poelchau MF, Rajarapu SP, Schneweis DJ, Snoeck S, Taning CNT, Wei D, Widana Gamage SMK, Hughes DST, Murali SC, Bailey ST, Bejerman NE, Holmes CJ, Jennings EC, Rosendale AJ, Rosselot A, Hervey K, Schneweis BA, Cheng S, Childers C, Simão FA, Dietzgen RG, Chao H, Dinh H, Doddapaneni HV, Dugan S, Han Y, Lee SL, Muzny DM, Qu J, Worley KC, Benoit JB, Friedrich M, Jones JW, Panfilio KA, Park Y, Robertson HM, Smagghe G, Ullman DE, van der Zee M, Van Leeuwen T, Veenstra JA, Waterhouse RM, Weirauch MT, Werren JH, Whitfield AE, Zdobnov EM, Gibbs RA, Richards S. Genome-enabled insights into the biology of thrips as crop pests. BMC Biol 2020; 18:142. [PMID: 33070780 PMCID: PMC7570057 DOI: 10.1186/s12915-020-00862-9] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 09/02/2020] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND The western flower thrips, Frankliniella occidentalis (Pergande), is a globally invasive pest and plant virus vector on a wide array of food, fiber, and ornamental crops. The underlying genetic mechanisms of the processes governing thrips pest and vector biology, feeding behaviors, ecology, and insecticide resistance are largely unknown. To address this gap, we present the F. occidentalis draft genome assembly and official gene set. RESULTS We report on the first genome sequence for any member of the insect order Thysanoptera. Benchmarking Universal Single-Copy Ortholog (BUSCO) assessments of the genome assembly (size = 415.8 Mb, scaffold N50 = 948.9 kb) revealed a relatively complete and well-annotated assembly in comparison to other insect genomes. The genome is unusually GC-rich (50%) compared to other insect genomes to date. The official gene set (OGS v1.0) contains 16,859 genes, of which ~ 10% were manually verified and corrected by our consortium. We focused on manual annotation, phylogenetic, and expression evidence analyses for gene sets centered on primary themes in the life histories and activities of plant-colonizing insects. Highlights include the following: (1) divergent clades and large expansions in genes associated with environmental sensing (chemosensory receptors) and detoxification (CYP4, CYP6, and CCE enzymes) of substances encountered in agricultural environments; (2) a comprehensive set of salivary gland genes supported by enriched expression; (3) apparent absence of members of the IMD innate immune defense pathway; and (4) developmental- and sex-specific expression analyses of genes associated with progression from larvae to adulthood through neometaboly, a distinct form of maturation differing from either incomplete or complete metamorphosis in the Insecta. CONCLUSIONS Analysis of the F. occidentalis genome offers insights into the polyphagous behavior of this insect pest that finds, colonizes, and survives on a widely diverse array of plants. The genomic resources presented here enable a more complete analysis of insect evolution and biology, providing a missing taxon for contemporary insect genomics-based analyses. Our study also offers a genomic benchmark for molecular and evolutionary investigations of other Thysanoptera species.
Collapse
Affiliation(s)
- Dorith Rotenberg
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC, 27695, USA.
| | - Aaron A Baumann
- Virology Section, College of Veterinary Medicine, University of Tennessee, A239 VTH, 2407 River Drive, Knoxville, TN, 37996, USA
| | - Sulley Ben-Mahmoud
- Department of Entomology and Nematology, University of California Davis, Davis, CA, 95616, USA
| | - Olivier Christiaens
- Laboratory of Agrozoology, Department of Plants and Crops, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Wannes Dermauw
- Laboratory of Agrozoology, Department of Plants and Crops, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Panagiotis Ioannidis
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Vassilika Vouton, 70013, Heraklion, Greece
- Department of Genetic Medicine and Development, University of Geneva Medical School, and Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Chris G C Jacobs
- Institute of Biology, Leiden University, 2333 BE, Leiden, The Netherlands
| | - Iris M Vargas Jentzsch
- Institute for Zoology: Developmental Biology, University of Cologne, 50674, Cologne, Germany
| | - Jonathan E Oliver
- Department of Plant Pathology, University of Georgia - Tifton Campus, Tifton, GA, 31793-5737, USA
| | | | - Swapna Priya Rajarapu
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC, 27695, USA
| | - Derek J Schneweis
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 66506, USA
| | - Simon Snoeck
- Laboratory of Agrozoology, Department of Plants and Crops, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
- Department of Biology, University of Washington, Seattle, WA, 98105, USA
| | - Clauvis N T Taning
- Laboratory of Agrozoology, Department of Plants and Crops, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Dong Wei
- Laboratory of Agrozoology, Department of Plants and Crops, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
- Chongqing Key Laboratory of Entomology and Pest Control Engineering, College of Plant Protection, Southwest University, Chongqing, China
- International Joint Laboratory of China-Belgium on Sustainable Crop Pest Control, Academy of Agricultural Sciences, Southwest University, Chongqing, China and Ghent University, Ghent, Belgium
| | | | - Daniel S T Hughes
- Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Shwetha C Murali
- Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Samuel T Bailey
- Department of Biological Sciences, University of Cincinnati, Cincinnati, OH, 45221, USA
| | | | - Christopher J Holmes
- Department of Biological Sciences, University of Cincinnati, Cincinnati, OH, 45221, USA
| | - Emily C Jennings
- Department of Biological Sciences, University of Cincinnati, Cincinnati, OH, 45221, USA
| | - Andrew J Rosendale
- Department of Biological Sciences, University of Cincinnati, Cincinnati, OH, 45221, USA
- Department of Biology, Mount St. Joseph University, Cincinnati, OH, 45233, USA
| | - Andrew Rosselot
- Department of Biological Sciences, University of Cincinnati, Cincinnati, OH, 45221, USA
| | - Kaylee Hervey
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 66506, USA
| | - Brandi A Schneweis
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 66506, USA
| | - Sammy Cheng
- Department of Biology, University of Rochester, Rochester, NY, 14627, USA
| | | | - Felipe A Simão
- Department of Genetic Medicine and Development, University of Geneva Medical School, and Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Ralf G Dietzgen
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, QLD, 4072, Australia
| | - Hsu Chao
- Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Huyen Dinh
- Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Harsha Vardhan Doddapaneni
- Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Shannon Dugan
- Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Yi Han
- Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Sandra L Lee
- Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Jiaxin Qu
- Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Kim C Worley
- Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Joshua B Benoit
- Department of Biological Sciences, University of Cincinnati, Cincinnati, OH, 45221, USA
| | - Markus Friedrich
- Department of Biological Sciences, Wayne State University, Detroit, MI, 48202, USA
| | - Jeffery W Jones
- Department of Biological Sciences, Wayne State University, Detroit, MI, 48202, USA
| | - Kristen A Panfilio
- Institute for Zoology: Developmental Biology, University of Cologne, 50674, Cologne, Germany
- School of Life Sciences, University of Warwick, Gibbet Hill Campus, Coventry, CV4 7AL, UK
| | - Yoonseong Park
- Department of Entomology, Kansas State University, Manhattan, KS, 66506, USA
| | - Hugh M Robertson
- Department of Entomology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Guy Smagghe
- Laboratory of Agrozoology, Department of Plants and Crops, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
- Chongqing Key Laboratory of Entomology and Pest Control Engineering, College of Plant Protection, Southwest University, Chongqing, China
- International Joint Laboratory of China-Belgium on Sustainable Crop Pest Control, Academy of Agricultural Sciences, Southwest University, Chongqing, China and Ghent University, Ghent, Belgium
| | - Diane E Ullman
- Department of Entomology and Nematology, University of California Davis, Davis, CA, 95616, USA
| | | | - Thomas Van Leeuwen
- Laboratory of Agrozoology, Department of Plants and Crops, Ghent University, Coupure Links 653, 9000, Ghent, Belgium
| | - Jan A Veenstra
- INCIA UMR 5287 CNRS, University of Bordeaux, Pessac, France
| | - Robert M Waterhouse
- Department of Ecology and Evolution, Swiss Institute of Bioinformatics, University of Lausanne, 1015, Lausanne, Switzerland
| | - Matthew T Weirauch
- Center for Autoimmune Genomics and Etiology, Divisions of Biomedical Informatics and Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, 45229, USA
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, OH, 45229, USA
| | - John H Werren
- Department of Biology, University of Rochester, Rochester, NY, 14627, USA
| | - Anna E Whitfield
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC, 27695, USA
| | - Evgeny M Zdobnov
- Department of Genetic Medicine and Development, University of Geneva Medical School, and Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Richard A Gibbs
- Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Stephen Richards
- Human Genome Sequencing Center, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| |
Collapse
|
17
|
Xu YY, Zhou H, Murphy RF, Shen HB. Consistency and variation of protein subcellular location annotations. Proteins 2020; 89:242-250. [PMID: 32935893 DOI: 10.1002/prot.26010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 07/09/2020] [Accepted: 09/13/2020] [Indexed: 11/09/2022]
Abstract
A major challenge for protein databases is reconciling information from diverse sources. This is especially difficult when some information consists of secondary, human-interpreted rather than primary data. For example, the Swiss-Prot database contains curated annotations of subcellular location that are based on predictions from protein sequence, statements in scientific articles, and published experimental evidence. The Human Protein Atlas (HPA) consists of millions of high-resolution microscopic images that show protein spatial distribution on a cellular and subcellular level. These images are manually annotated with protein subcellular locations by trained experts. The image annotations in HPA can capture the variation of subcellular location across different cell lines, tissues, or tissue states. Systematic investigation of the consistency between HPA and Swiss-Prot assignments of subcellular location, which is important for understanding and utilizing protein location data from the two databases, has not been described previously. In this paper, we quantitatively evaluate the consistency of subcellular location annotations between HPA and Swiss-Prot at multiple levels, as well as variation of protein locations across cell lines and tissues. Our results show that annotations of these two databases differ significantly in many cases, leading to proposed procedures for deriving and integrating the protein subcellular location data. We also find that proteins having highly variable locations are more likely to be biomarkers of diseases, providing support for incorporating analysis of subcellular location in protein biomarker identification and screening.
Collapse
Affiliation(s)
- Ying-Ying Xu
- School of Biomedical Engineering, Southern Medical University, Guangzhou, China.,Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China.,Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Hang Zhou
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
| | - Robert F Murphy
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
18
|
Deveshwar P, Sharma S, Prusty A, Sinha N, Zargar SM, Karwal D, Parashar V, Singh S, Tyagi AK. Analysis of rice nuclear-localized seed-expressed proteins and their database (RSNP-DB). Sci Rep 2020; 10:15116. [PMID: 32934280 PMCID: PMC7492263 DOI: 10.1038/s41598-020-70713-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 08/03/2020] [Indexed: 01/16/2023] Open
Abstract
Nuclear proteins are primarily regulatory factors governing gene expression. Multiple factors determine the localization of a protein in the nucleus. An upright identification of nuclear proteins is way far from accuracy. We have attempted to combine information from subcellular prediction tools, experimental evidence, and nuclear proteome data to identify a reliable list of seed-expressed nuclear proteins in rice. Depending upon the number of prediction tools calling a protein nuclear, we could sort 19,441 seed expressed proteins into five categories. Of which, half of the seed-expressed proteins were called nuclear by at least one out of four prediction tools. Further, gene ontology (GO) enrichment and transcription factor composition analysis showed that 6116 seed-expressed proteins could be called nuclear with a greater assertion. Localization evidence from experimental data was available for 1360 proteins. Their analysis showed that a 92.04% accuracy of a nuclear call is valid for proteins predicted nuclear by at least three tools. Distribution of nuclear localization signals and nuclear export signals showed that the majority of category four members were nuclear resident proteins, whereas other categories have a low fraction of nuclear resident proteins and significantly higher constitution of shuttling proteins. We compiled all the above information for the seed-expressed genes in the form of a searchable database named Rice Seed Nuclear Protein DataBase (RSNP-DB) https://pmb.du.ac.in/rsnpdb. This information will be useful for comprehending the role of seed nuclear proteome in rice.
Collapse
Affiliation(s)
- Priyanka Deveshwar
- Interdisciplinary Centre for Plant Genomics and Department of Plant Molecular Biology, University of Delhi, South Campus, New Delhi, India
| | - Shivam Sharma
- Interdisciplinary Centre for Plant Genomics and Department of Plant Molecular Biology, University of Delhi, South Campus, New Delhi, India
| | - Ankita Prusty
- Interdisciplinary Centre for Plant Genomics and Department of Plant Molecular Biology, University of Delhi, South Campus, New Delhi, India
| | - Neha Sinha
- Interdisciplinary Centre for Plant Genomics and Department of Plant Molecular Biology, University of Delhi, South Campus, New Delhi, India
| | - Sajad Majeed Zargar
- Interdisciplinary Centre for Plant Genomics and Department of Plant Molecular Biology, University of Delhi, South Campus, New Delhi, India.,Proteomics Laboratory, Division of Plant Biotechnology, Sher-e-Kashmir University of Agricultural Sciences & Technology of Kashmir, Shalimar, Srinagar, Jammu & Kashmir, India
| | - Divya Karwal
- Institute of Informatics and Communications, University of Delhi, South Campus, New Delhi, India
| | - Vishal Parashar
- Institute of Informatics and Communications, University of Delhi, South Campus, New Delhi, India
| | - Sanjeev Singh
- Institute of Informatics and Communications, University of Delhi, South Campus, New Delhi, India
| | - Akhilesh Kumar Tyagi
- Interdisciplinary Centre for Plant Genomics and Department of Plant Molecular Biology, University of Delhi, South Campus, New Delhi, India.
| |
Collapse
|
19
|
Genome-Wide Identification and Expression Profiling of Monosaccharide Transporter Genes Associated with High Harvest Index Values in Rapeseed ( Brassica napus L.). Genes (Basel) 2020; 11:genes11060653. [PMID: 32549312 PMCID: PMC7349323 DOI: 10.3390/genes11060653] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 06/10/2020] [Accepted: 06/12/2020] [Indexed: 01/15/2023] Open
Abstract
Sugars are important throughout a plant’s lifecycle. Monosaccharide transporters (MST) are essential sugar transporters that have been identified in many plants, but little is known about the evolution or functions of MST genes in rapeseed (Brassica napus). In this study, we identified 175 MST genes in B. napus, 87 in Brassica oleracea, and 83 in Brassica rapa. These genes were separated into the sugar transport protein (STP), polyol transporter (PLT), vacuolar glucose transporter (VGT), tonoplast monosaccharide transporter (TMT), inositol transporter (INT), plastidic glucose transporter (pGlcT), and ERD6-like subfamilies, respectively. Phylogenetic and syntenic analysis indicated that gene redundancy and gene elimination have commonly occurred in Brassica species during polyploidization. Changes in exon-intron structures during evolution likely resulted in the differences in coding regions, expression patterns, and functions seen among BnMST genes. In total, 31 differentially expressed genes (DEGs) were identified through RNA-seq among materials with high and low harvest index (HI) values, which were divided into two categories based on the qRT-PCR results, expressed more highly in source or sink organs. We finally identified four genes, including BnSTP5, BnSTP13, BnPLT5, and BnERD6-like14, which might be involved in monosaccharide uptake or unloading and further affect the HI of rapeseed. These findings provide fundamental information about MST genes in Brassica and reveal the importance of BnMST genes to high HI in B. napus.
Collapse
|
20
|
Sahu SS, Loaiza CD, Kaundal R. Plant-mSubP: a computational framework for the prediction of single- and multi-target protein subcellular localization using integrated machine-learning approaches. AOB PLANTS 2020; 12:plz068. [PMID: 32528639 PMCID: PMC7274489 DOI: 10.1093/aobpla/plz068] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Accepted: 10/11/2019] [Indexed: 05/18/2023]
Abstract
The subcellular localization of proteins is very important for characterizing its function in a cell. Accurate prediction of the subcellular locations in computational paradigm has been an active area of interest. Most of the work has been focused on single localization prediction. Only few studies have discussed the multi-target localization, but have not achieved good accuracy so far; in plant sciences, very limited work has been done. Here we report the development of a novel tool Plant-mSubP, which is based on integrated machine learning approaches to efficiently predict the subcellular localizations in plant proteomes. The proposed approach predicts with high accuracy 11 single localizations and three dual locations of plant cell. Several hybrid features based on composition and physicochemical properties of a protein such as amino acid composition, pseudo amino acid composition, auto-correlation descriptors, quasi-sequence-order descriptors and hybrid features are used to represent the protein. The performance of the proposed method has been assessed through a training set as well as an independent test set. Using the hybrid feature of the pseudo amino acid composition, N-Center-C terminal amino acid composition and the dipeptide composition (PseAAC-NCC-DIPEP), an overall accuracy of 81.97 %, 84.75 % and 87.88 % is achieved on the training data set of proteins containing the single-label, single- and dual-label combined, and dual-label proteins, respectively. When tested on the independent data, an accuracy of 64.36 %, 64.84 % and 81.08 % is achieved on the single-label, single- and dual-label, and dual-label proteins, respectively. The prediction models have been implemented on a web server available at http://bioinfo.usu.edu/Plant-mSubP/. The results indicate that the proposed approach is comparable to the existing methods in single localization prediction and outperforms all other existing tools when compared for dual-label proteins. The prediction tool will be a useful resource for better annotation of various plant proteomes.
Collapse
Affiliation(s)
- Sitanshu S Sahu
- Department of Electronics and Communication Engineering, Birla Institute of Technology, Mesra, Ranchi, India
| | - Cristian D Loaiza
- Department of Plants, Soils, and Climate/Center for Integrated BioSystems, College of Agriculture and Applied Sciences, Utah State University, Logan, UT, USA
| | - Rakesh Kaundal
- Department of Plants, Soils, and Climate/Center for Integrated BioSystems, College of Agriculture and Applied Sciences, Utah State University, Logan, UT, USA
- Bioinformatics Facility, Center for Integrated BioSystems, Utah State University, Logan, UT, USA
- Corresponding author’s e-mail address:
| |
Collapse
|
21
|
Barman RK, Mukhopadhyay A, Maulik U, Das S. Identification of infectious disease-associated host genes using machine learning techniques. BMC Bioinformatics 2019; 20:736. [PMID: 31881961 PMCID: PMC6935192 DOI: 10.1186/s12859-019-3317-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 12/16/2019] [Indexed: 02/06/2023] Open
Abstract
Background With the global spread of multidrug resistance in pathogenic microbes, infectious diseases emerge as a key public health concern of the recent time. Identification of host genes associated with infectious diseases will improve our understanding about the mechanisms behind their development and help to identify novel therapeutic targets. Results We developed a machine learning techniques-based classification approach to identify infectious disease-associated host genes by integrating sequence and protein interaction network features. Among different methods, Deep Neural Networks (DNN) model with 16 selected features for pseudo-amino acid composition (PAAC) and network properties achieved the highest accuracy of 86.33% with sensitivity of 85.61% and specificity of 86.57%. The DNN classifier also attained an accuracy of 83.33% on a blind dataset and a sensitivity of 83.1% on an independent dataset. Furthermore, to predict unknown infectious disease-associated host genes, we applied the proposed DNN model to all reviewed proteins from the database. Seventy-six out of 100 highly-predicted infectious disease-associated genes from our study were also found in experimentally-verified human-pathogen protein-protein interactions (PPIs). Finally, we validated the highly-predicted infectious disease-associated genes by disease and gene ontology enrichment analysis and found that many of them are shared by one or more of the other diseases, such as cancer, metabolic and immune related diseases. Conclusions To the best of our knowledge, this is the first computational method to identify infectious disease-associated host genes. The proposed method will help large-scale prediction of host genes associated with infectious-diseases. However, our results indicated that for small datasets, advanced DNN-based method does not offer significant advantage over the simpler supervised machine learning techniques, such as Support Vector Machine (SVM) or Random Forest (RF) for the prediction of infectious disease-associated host genes. Significant overlap of infectious disease with cancer and metabolic disease on disease and gene ontology enrichment analysis suggests that these diseases perturb the functions of the same cellular signaling pathways and may be treated by drugs that tend to reverse these perturbations. Moreover, identification of novel candidate genes associated with infectious diseases would help us to explain disease pathogenesis further and develop novel therapeutics.
Collapse
Affiliation(s)
- Ranjan Kumar Barman
- Biomedical Informatics Centre, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India.,Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Anirban Mukhopadhyay
- Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Santasabuj Das
- Biomedical Informatics Centre, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India. .,Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, P-33, C.I.T.Road Scheme XM, Beliaghata-700010, Kolkata, West Bengal, India.
| |
Collapse
|
22
|
Bonetta R, Valentino G. Machine learning techniques for protein function prediction. Proteins 2019; 88:397-413. [PMID: 31603244 DOI: 10.1002/prot.25832] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 07/05/2019] [Accepted: 09/17/2019] [Indexed: 12/17/2022]
Abstract
Proteins play important roles in living organisms, and their function is directly linked with their structure. Due to the growing gap between the number of proteins being discovered and their functional characterization (in particular as a result of experimental limitations), reliable prediction of protein function through computational means has become crucial. This paper reviews the machine learning techniques used in the literature, following their evolution from simple algorithms such as logistic regression to more advanced methods like support vector machines and modern deep neural networks. Hyperparameter optimization methods adopted to boost prediction performance are presented. In parallel, the metamorphosis in the features used by these algorithms from classical physicochemical properties and amino acid composition, up to text-derived features from biomedical literature and learned feature representations using autoencoders, together with feature selection and dimensionality reduction techniques, are also reviewed. The success stories in the application of these techniques to both general and specific protein function prediction are discussed.
Collapse
Affiliation(s)
- Rosalin Bonetta
- Centre for Molecular Medicine and Biobanking, University of Malta, Msida, Malta
| | - Gianluca Valentino
- Department of Communications and Computer Engineering, University of Malta, Msida, Malta
| |
Collapse
|
23
|
Han GS, Yu ZG. ML-rRBF-ECOC: A Multi-Label Learning Classifier for Predicting Protein Subcellular Localization with Both Single and Multiple Sites. CURR PROTEOMICS 2019. [DOI: 10.2174/1570164616666190103143945] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
The subcellular localization of a protein is closely related with its functions
and interactions. More and more evidences show that proteins may simultaneously exist at, or move
between, two or more different subcellular localizations. Therefore, predicting protein subcellular localization
is an important but challenging problem.
Observation:
Most of the existing methods for predicting protein subcellular localization assume that a
protein locates at a single site. Although a few methods have been proposed to deal with proteins with
multiple sites, correlations between subcellular localization are not efficiently taken into account. In
this paper, we propose an integrated method for predicting protein subcellular localizations with both
single site and multiple sites.
Methods:
Firstly, we extend the Multi-Label Radial Basis Function (ML-RBF) method to the regularized
version, and augment the first layer of ML-RBF to take local correlations between subcellular localization
into account. Secondly, we embed the modified ML-RBF into a multi-label Error-Correcting
Output Codes (ECOC) method in order to further consider the subcellular localization dependency. We
name our method ML-rRBF-ECOC. Finally, the performance of ML-rRBF-ECOC is evaluated on
three benchmark datasets.
Results:
The results demonstrate that ML-rRBF-ECOC has highly competitive performance to the related
multi-label learning method and some state-of-the-art methods for predicting protein subcellular
localizations with multiple sites. Considering dependency between subcellular localizations can contribute
to the improvement of prediction performance.
Conclusion:
This also indicates that correlations between different subcellular localizations really exist.
Our method at least plays a complementary role to existing methods for predicting protein subcellular
localizations with multiple sites.
Collapse
Affiliation(s)
- Guo-Sheng Han
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan 411105, China
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan 411105, China
| |
Collapse
|
24
|
Chou KC, Cheng X, Xiao X. pLoc_bal-mEuk: Predict Subcellular Localization of Eukaryotic Proteins by General PseAAC and Quasi-balancing Training Dataset. Med Chem 2019; 15:472-485. [DOI: 10.2174/1573406415666181218102517] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2018] [Revised: 10/23/2018] [Accepted: 12/12/2018] [Indexed: 12/24/2022]
Abstract
<P>Background/Objective: Information of protein subcellular localization is crucially important for both basic research and drug development. With the explosive growth of protein sequences discovered in the post-genomic age, it is highly demanded to develop powerful bioinformatics tools for timely and effectively identifying their subcellular localization purely based on the sequence information alone. Recently, a predictor called “pLoc-mEuk” was developed for identifying the subcellular localization of eukaryotic proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems where many proteins, called “multiplex proteins”, may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mEuk was trained by an extremely skewed dataset where some subset was about 200 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven training dataset. </P><P> Methods: To alleviate such bias, we have developed a new predictor called pLoc_bal-mEuk by quasi-balancing the training dataset. Cross-validation tests on exactly the same experimentconfirmed dataset have indicated that the proposed new predictor is remarkably superior to pLocmEuk, the existing state-of-the-art predictor in identifying the subcellular localization of eukaryotic proteins. It has not escaped our notice that the quasi-balancing treatment can also be used to deal with many other biological systems. </P><P> Results: To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mEuk/. </P><P> Conclusion: It is anticipated that the pLoc_bal-Euk predictor holds very high potential to become a useful high throughput tool in identifying the subcellular localization of eukaryotic proteins, particularly for finding multi-target drugs that is currently a very hot trend trend in drug development.</P>
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| | - Xiang Cheng
- Gordon Life Science Institute, Boston, MA 02478, United States
| | - Xuan Xiao
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
25
|
Gao F, Peng C, Li J, Zhuang R, Guo Z, Xu D, Su X, Zhang X. Radioiodinated progesterone derivative for progesterone receptor targeting with enhanced nucleus uptake via phenylboronic acid conjugation. J Labelled Comp Radiopharm 2019; 62:301-309. [PMID: 31032992 DOI: 10.1002/jlcr.3741] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Revised: 03/13/2019] [Accepted: 03/26/2019] [Indexed: 11/06/2022]
Abstract
A novel 131 I-radiolabeled probe with aromatic boronate motif (131 I-EIPBA) was designed to target progesterone receptor (PR)-positive breast cancer with enhanced nucleus uptake. Acetylene progesterone was conjugated with pegylated phenylboronic acid via click reaction and radiolabeled with 131 I to afford 131 I-EIPBA. Meanwhile, 131 I-EIPB without boronate was prepared as control agent. After determination of the lipophilicity and stability of these tracers, in vitro cell uptake studies and in vivo biodistribution in rats were performed to verify the enhanced nucleus uptake and PR targeting ability of 131 I-EIPBA. 131 I-EIPBA was obtained with moderate radiochemical yield (40.35 ± 3.52%) and high radiochemical purity (>98%). As expected, the high binding affinity (39.58 nM) of 131 I-EIPBA for PR was determined by cell binding assay. The internalization ratio of 131 I-EIPBA was remarkably higher than that of 131 I-EIPB in PR-positive MCF-7 cells. Furthermore, the enhanced nucleus uptake of 131 I-EIPBA (0.59 ± 0.02%) was found to be significantly higher than that of 131 I-EIPB (0.13 ± 0.01%) in MCF-7 cells. A novel 131 I-EIPBA compound was developed for PR targeting with improved cellular nucleus uptake. Furthermore, the introduction of aromatic boronate motif provides a worthwhile strategy for enhancing the nuclear receptor targeting of tracers.
Collapse
Affiliation(s)
- Fei Gao
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics & Center for Molecular Imaging and Translational Medicine, School of Public Health, Xiamen University, Xiamen, China
| | - Chenyu Peng
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics & Center for Molecular Imaging and Translational Medicine, School of Public Health, Xiamen University, Xiamen, China
| | - Jindian Li
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics & Center for Molecular Imaging and Translational Medicine, School of Public Health, Xiamen University, Xiamen, China
| | - Rongqiang Zhuang
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics & Center for Molecular Imaging and Translational Medicine, School of Public Health, Xiamen University, Xiamen, China
| | - Zhide Guo
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics & Center for Molecular Imaging and Translational Medicine, School of Public Health, Xiamen University, Xiamen, China
| | - Duo Xu
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics & Center for Molecular Imaging and Translational Medicine, School of Public Health, Xiamen University, Xiamen, China
| | - Xinhui Su
- Department of Nuclear Medicine, Zhongshan Hospital affiliated to Xiamen University, Xiamen, China
| | - Xianzhong Zhang
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics & Center for Molecular Imaging and Translational Medicine, School of Public Health, Xiamen University, Xiamen, China
| |
Collapse
|
26
|
Abstract
Ever since the signal hypothesis was proposed in 1971, the exact nature of signal peptides has been a focus point of research. The prediction of signal peptides and protein subcellular location from amino acid sequences has been an important problem in bioinformatics since the dawn of this research field, involving many statistical and machine learning technologies. In this review, we provide a historical account of how position-weight matrices, artificial neural networks, hidden Markov models, support vector machines and, lately, deep learning techniques have been used in the attempts to predict where proteins go. Because the secretory pathway was the first one to be studied both experimentally and through bioinformatics, our main focus is on the historical development of prediction methods for signal peptides that target proteins for secretion; prediction methods to identify targeting signals for other cellular compartments are treated in less detail.
Collapse
Affiliation(s)
- Henrik Nielsen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kgs. Lyngby, Denmark.
| | - Konstantinos D Tsirigos
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Søren Brunak
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kgs. Lyngby, Denmark
- Faculty of Health and Medical Sciences, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Gunnar von Heijne
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- Science for Life Laboratory, Stockholm University, Solna, Sweden
| |
Collapse
|
27
|
Bentley SJ, Jamabo M, Boshoff A. The Hsp70/J-protein machinery of the African trypanosome, Trypanosoma brucei. Cell Stress Chaperones 2019; 24:125-148. [PMID: 30506377 PMCID: PMC6363631 DOI: 10.1007/s12192-018-0950-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Revised: 11/06/2018] [Accepted: 11/12/2018] [Indexed: 12/28/2022] Open
Abstract
The etiological agent of the neglected tropical disease African trypanosomiasis, Trypanosoma brucei, possesses an expanded and diverse repertoire of heat shock proteins, which have been implicated in cytoprotection, differentiation, as well as progression and transmission of the disease. Hsp70 plays a crucial role in proteostasis, and inhibition of its interactions with co-chaperones is emerging as a potential therapeutic target for numerous diseases. In light of genome annotations and the release of the genome sequence of the human infective subspecies, an updated and current in silico overview of the Hsp70/J-protein machinery in both T. brucei brucei and T. brucei gambiense was conducted. Functional, structural, and evolutionary analyses of the T. brucei Hsp70 and J-protein families were performed. The Hsp70 and J-proteins from humans and selected kinetoplastid parasites were used to assist in identifying proteins from T. brucei, as well as the prediction of potential Hsp70-J-protein partnerships. The Hsp70 and J-proteins were mined from numerous genome-wide proteomics studies, which included different lifecycle stages and subcellular localisations. In this study, 12 putative Hsp70 proteins and 67 putative J-proteins were identified to be encoded on the genomes of both T. brucei subspecies. Interestingly there are 6 type III J-proteins that possess tetratricopeptide repeat-containing (TPR) motifs. Overall, it is envisioned that the results of this study will provide a future context for studying the biology of the African trypanosome and evaluating Hsp70 and J-protein interactions as potential drug targets.
Collapse
Affiliation(s)
| | - Miebaka Jamabo
- Biotechnology Innovation Centre, Rhodes University, Grahamstown, South Africa
| | - Aileen Boshoff
- Biotechnology Innovation Centre, Rhodes University, Grahamstown, South Africa.
| |
Collapse
|
28
|
Overexpression of ScMYBAS1 alternative splicing transcripts differentially impacts biomass accumulation and drought tolerance in rice transgenic plants. PLoS One 2018; 13:e0207534. [PMID: 30517137 PMCID: PMC6281192 DOI: 10.1371/journal.pone.0207534] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 11/01/2018] [Indexed: 02/05/2023] Open
Abstract
Drought is the most significant environmental stress for agricultural production worldwide, and tremendous efforts have been made to improve crop yield under the increasing water scarcity. Transcription factors are major players in the regulation of water stress-related genes in plants. Recently, different MYB transcription factors were characterized for their involvement in drought response. A sugarcane R2R3-MYB gene (ScMYBAS1) and its four alternative forms of transcript (ScMYAS1-2, ScMYBAS1-3, ScMYBAS1-4 and ScMYBAS1-5) were identified in this study. The subcellular localization, in Nicotiniana benthamiana, of the TFs fused in frame with GFP revealed that ScMYBAS1-2-GFP and ScMYBAS1-3-GFP were observed in the nucleus. The overexpression of ScMYBAS1-2 and ScMYBAS1-3 spliced transcripts in rice promoted change in plant growth under both well-watered and drought conditions. The ScMYBAS1-2 and ScMYBAS1-3 transgenic lines revealed a higher relative water content (RWC) compared to the wild type before maximum stress under drought conditions. The ScMYBAS1-2 transgenic lines showed a reduction in biomass (total dry weight). Conversely, ScMYBAS1-3 showed an increased biomass (total dry weight) relative to the wild-type. The overexpression of ScMYBAS1-3 in rice transgenic lines showed involvement with drought tolerance and biomass and, for this reason, was considered a good target for plant transformation, particularly for use in developing genotypes with drought tolerance and biomass accumulation.
Collapse
|
29
|
Cheng X, Xiao X, Chou KC. pLoc_bal-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC. J Theor Biol 2018; 458:92-102. [DOI: 10.1016/j.jtbi.2018.09.005] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Revised: 09/05/2018] [Accepted: 09/07/2018] [Indexed: 01/03/2023]
|
30
|
Sharma M, Bennewitz B, Klösgen RB. Rather rule than exception? How to evaluate the relevance of dual protein targeting to mitochondria and chloroplasts. PHOTOSYNTHESIS RESEARCH 2018; 138:335-343. [PMID: 29946965 DOI: 10.1007/s11120-018-0543-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 06/20/2018] [Indexed: 05/11/2023]
Abstract
Dual targeting of a nuclearly encoded protein into two different cell organelles is an exceptional event in eukaryotic cells. Yet, the frequency of such dual targeting is remarkably high in case of mitochondria and chloroplasts, the two endosymbiotic organelles of plant cells. In most instances, it is mediated by "ambiguous" transit peptides, which recognize both organelles as the target. A number of different approaches including in silico, in organello as well as both transient and stable in vivo assays are established to determine the targeting specificity of such transit peptides. In this review, we will describe and compare these approaches and discuss the potential role of this unusual targeting process. Furthermore, we will present a hypothetical scenario how dual targeting might have arisen during evolution.
Collapse
Affiliation(s)
- Mayank Sharma
- Institute of Biology - Plant Physiology, Martin Luther University Halle-Wittenberg, Weinbergweg 10, 06120, Halle/Saale, Germany
| | - Bationa Bennewitz
- Institute of Biology - Plant Physiology, Martin Luther University Halle-Wittenberg, Weinbergweg 10, 06120, Halle/Saale, Germany
| | - Ralf Bernd Klösgen
- Institute of Biology - Plant Physiology, Martin Luther University Halle-Wittenberg, Weinbergweg 10, 06120, Halle/Saale, Germany.
| |
Collapse
|
31
|
Gudenas BL, Wang L. Prediction of LncRNA Subcellular Localization with Deep Learning from Sequence Features. Sci Rep 2018; 8:16385. [PMID: 30401954 PMCID: PMC6219567 DOI: 10.1038/s41598-018-34708-w] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 10/19/2018] [Indexed: 12/20/2022] Open
Abstract
Long non-coding RNAs are involved in biological processes throughout the cell including the nucleus, chromatin and cytosol. However, most lncRNAs remain unannotated and functional annotation of lncRNAs is difficult due to their low conservation and their tissue and developmentally specific expression. LncRNA subcellular localization is highly informative regarding its biological function, although it is difficult to discover because few prediction methods currently exist. While protein subcellular localization prediction is a well-established research field, lncRNA localization prediction is a novel research problem. We developed DeepLncRNA, a deep learning algorithm which predicts lncRNA subcellular localization directly from lncRNA transcript sequences. We analyzed 93 strand-specific RNA-seq samples of nuclear and cytosolic fractions from multiple cell types to identify differentially localized lncRNAs. We then extracted sequence-based features from the lncRNAs to construct our DeepLncRNA model, which achieved an accuracy of 72.4%, sensitivity of 83%, specificity of 62.4% and area under the receiver operating characteristic curve of 0.787. Our results suggest that primary sequence motifs are a major driving force in the subcellular localization of lncRNAs.
Collapse
Affiliation(s)
- Brian L Gudenas
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
| | - Liangjiang Wang
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA.
| |
Collapse
|
32
|
Dayan FE, Barker A, Tranel PJ. Origins and structure of chloroplastic and mitochondrial plant protoporphyrinogen oxidases: implications for the evolution of herbicide resistance. PEST MANAGEMENT SCIENCE 2018; 74:2226-2234. [PMID: 28967179 DOI: 10.1002/ps.4744] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Revised: 09/05/2017] [Accepted: 09/23/2017] [Indexed: 05/25/2023]
Abstract
Protoporphyrinogen IX oxidase (PPO)-inhibiting herbicides are effective tools to control a broad spectrum of weeds, including those that have evolved resistance to glyphosate. Their utility is being threatened by the appearance of biotypes that are resistant to PPO inhibitors. While the chloroplastic PPO1 isoform is thought to be the primary target of PPO herbicides, evolved resistance mechanisms elucidated to date are associated with changes to the mitochondrial PPO2 isoform, suggesting that the importance of PPO2 has been underestimated. Our investigation of the evolutionary and structural biology of plant PPOs provides some insight into the potential reasons why PPO2 is the preferred target for evolution of resistance. The most common target-site mutation imparting resistance involved the deletion of a key glycine codon. The genetic environment that facilitates this deletion is apparently only present in the gene encoding PPO2 in a few species. Additionally, both species with this mutation (Amaranthus tuberculatus and Amaranthus palmeri) have dual targeting of PPO2 to both the chloroplast and the mitochondria, which might be a prerequisite to impart herbicide resistance. The most recent target-site mutations have substituted a key arginine residue involved in stabilizing the substrate in the catalytic domain of PPO2. This arginine is highly conserved across all plant PPOs, suggesting that its substitution could be equally likely on PPO1 and PPO2, yet it has only occurred on PPO2, underscoring the importance of this isoform for the evolution of herbicide resistance. © 2017 Society of Chemical Industry.
Collapse
Affiliation(s)
- Franck E Dayan
- Department of Bioagricultural Sciences and Pest Management, Colorado State University, Fort Collins, CO, USA
| | - Abigail Barker
- Department of Bioagricultural Sciences and Pest Management, Colorado State University, Fort Collins, CO, USA
| | - Patrick J Tranel
- Department of Crop Sciences, University of Illinois, Urbana, IL, USA
| |
Collapse
|
33
|
Kang MK, Tullman-Ercek D. Engineering expression and function of membrane proteins. Methods 2018; 147:66-72. [DOI: 10.1016/j.ymeth.2018.04.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2018] [Revised: 04/03/2018] [Accepted: 04/16/2018] [Indexed: 01/18/2023] Open
|
34
|
Mirzaei Mehrabad E, Hassanzadeh R, Eslahchi C. PMLPR: A novel method for predicting subcellular localization based on recommender systems. Sci Rep 2018; 8:12006. [PMID: 30104743 PMCID: PMC6089892 DOI: 10.1038/s41598-018-30394-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Accepted: 07/30/2018] [Indexed: 12/16/2022] Open
Abstract
The importance of protein subcellular localization problem is due to the importance of protein's functions in different cell parts. Moreover, prediction of subcellular locations helps to identify the potential molecular targets for drugs and has an important role in genome annotation. Most of the existing prediction methods assign only one location for each protein. But, since some proteins move between different subcellular locations, they can have multiple locations. In recent years, some multiple location predictors have been introduced. However, their performances are not accurate enough and there is much room for improvement. In this paper, we introduced a method, PMLPR, to predict locations for a protein. PMLPR predicts a list of locations for each protein based on recommender systems and it can properly overcome the multiple location prediction problem. For evaluating the performance of PMLPR, we considered six datasets RAT, FLY, HUMAN, Du et al., DBMLoc and Höglund. The performance of this algorithm is compared with six state-of-the-art algorithms, YLoc, WOLF-PSORT, prediction channel, MDLoc, Du et al. and MultiLoc2-HighRes. The results indicate that our proposed method is significantly superior on RAT and Fly proteins, and decent on HUMAN proteins. Moreover, on the datasets introduced by Du et al., DBMLoc and Höglund, PMLPR has comparable results. For the case study, we applied the algorithms on 8 proteins which are important in cancer research. The results of comparison with other methods indicate the efficiency of PMLPR.
Collapse
Affiliation(s)
- Elnaz Mirzaei Mehrabad
- Department of Computer Science, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| | - Reza Hassanzadeh
- Department of Engineering Sciences, Faculty of Advanced Technologies, University of Mohaghegh Ardabili, Namin, Iran
- Department of Bioinformatics, Faculty of Computer Engineering and Information Technology, Sabalan University of Advanced Technologies (SUAT), Namin, Iran
| | - Changiz Eslahchi
- Department of Computer Science, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran.
- School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| |
Collapse
|
35
|
Jurtz VI, Johansen AR, Nielsen M, Almagro Armenteros JJ, Nielsen H, Sønderby CK, Winther O, Sønderby SK. An introduction to deep learning on biological sequence data: examples and solutions. Bioinformatics 2018; 33:3685-3690. [PMID: 28961695 DOI: 10.1093/bioinformatics/btx531] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Accepted: 08/22/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. Results Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. Availability and implementation All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio. Contact skaaesonderby@gmail.com. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Morten Nielsen
- Department of Bio and Health Informatics.,Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, Buenos Aires, Argentina
| | | | | | | | - Ole Winther
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, Lyngby, Denmark.,Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | |
Collapse
|
36
|
Almagro Armenteros JJ, Sønderby CK, Sønderby SK, Nielsen H, Winther O. DeepLoc: prediction of protein subcellular localization using deep learning. Bioinformatics 2018; 33:3387-3395. [PMID: 29036616 DOI: 10.1093/bioinformatics/btx431] [Citation(s) in RCA: 607] [Impact Index Per Article: 101.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 07/03/2017] [Indexed: 01/12/2023] Open
Abstract
Motivation The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only. Results Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information. Availability and implementation The method is available as a web server at http://www.cbs.dtu.dk/services/DeepLoc. Example code is available at https://github.com/JJAlmagro/subcellular_localization. The dataset is available at http://www.cbs.dtu.dk/services/DeepLoc/data.php. Contact jjalma@dtu.dk.
Collapse
Affiliation(s)
- José Juan Almagro Armenteros
- Department of Bio and Health Informatics, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark.,The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Casper Kaae Sønderby
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Søren Kaae Sønderby
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Henrik Nielsen
- Department of Bio and Health Informatics, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Ole Winther
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark.,DTU Compute, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
37
|
Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou's pseudo-amino acid composition. J Theor Biol 2018; 450:86-103. [DOI: 10.1016/j.jtbi.2018.04.026] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 04/10/2018] [Accepted: 04/16/2018] [Indexed: 01/16/2023]
|
38
|
Cheng X, Lin WZ, Xiao X, Chou KC. pLoc_bal-mAnimal: predict subcellular localization of animal proteins by balancing training dataset and PseAAC. Bioinformatics 2018; 35:398-406. [DOI: 10.1093/bioinformatics/bty628] [Citation(s) in RCA: 79] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Accepted: 07/11/2018] [Indexed: 12/25/2022] Open
Affiliation(s)
- Xiang Cheng
- Computer Science, Jingdezhen Ceramic Institute, Jingdezhen, China
- Computational Biology, Gordon Life Science Institute, Boston, MA, USA
| | - Wei-Zhong Lin
- Computer Science, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- Computer Science, Jingdezhen Ceramic Institute, Jingdezhen, China
- Computational Biology, Gordon Life Science Institute, Boston, MA, USA
| | - Kuo-Chen Chou
- Computational Biology, Gordon Life Science Institute, Boston, MA, USA
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
39
|
Kikegawa T, Yamaguchi T, Nambu R, Etchuya K, Ikeda M, Mukai Y. Signal-anchor sequences are an essential factor for the Golgi-plasma membrane localization of type II membrane proteins. Biosci Biotechnol Biochem 2018; 82:1708-1714. [PMID: 29912671 DOI: 10.1080/09168451.2018.1484272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Abstract
Despite studies of the mechanism underlying the intracellular localization of membrane proteins, the specific mechanisms by which each membrane protein localizes to the endoplasmic reticulum, Golgi apparatus, and plasma membrane in the secretory pathway are unclear. In this study, a discriminant analysis of endoplasmic reticulum, Golgi apparatus and plasma membrane-localized type II membrane proteins was performed using a position-specific scoring matrix derived from the amino acid propensity of the sequences around signal-anchors. The possibility that the sequence around the signal-anchor is a factor for identifying each localization group was evaluated. The discrimination accuracy between the Golgi apparatus and plasma membrane-localized type II membrane proteins was as high as 90%, indicating that, in addition to other factors, the sequence around signal-anchor is an essential component of the selection mechanism for the Golgi and plasma membrane localization. These results may improve the use of membrane proteins for drug delivery and therapeutic applications.
Collapse
Affiliation(s)
- Tatsuki Kikegawa
- a Department of Electronics, Graduate School of Science and Technology , Meiji University , Kanagawa , Japan
| | - Takuya Yamaguchi
- a Department of Electronics, Graduate School of Science and Technology , Meiji University , Kanagawa , Japan
| | - Ryohei Nambu
- a Department of Electronics, Graduate School of Science and Technology , Meiji University , Kanagawa , Japan
| | - Kenji Etchuya
- b Molecular Neurobiology Research Group , Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology (AIST) , Ibaraki , Japan.,c Department of Electronics and Bioinformatics, School of Science and Technology , Meiji University , Kanagawa , Japan
| | - Masami Ikeda
- d Artificial Intelligence Research Center (AIRC) , National Institute of Advanced Industrial Science and Technology (AIST) , Tokyo , Japan
| | - Yuri Mukai
- a Department of Electronics, Graduate School of Science and Technology , Meiji University , Kanagawa , Japan.,c Department of Electronics and Bioinformatics, School of Science and Technology , Meiji University , Kanagawa , Japan
| |
Collapse
|
40
|
Bakhtiarizadeh MR, Rahimi M, Mohammadi-Sangcheshmeh A, Shariati J V, Salami SA. PrESOgenesis: A two-layer multi-label predictor for identifying fertility-related proteins using support vector machine and pseudo amino acid composition approach. Sci Rep 2018; 8:9025. [PMID: 29899414 PMCID: PMC5998058 DOI: 10.1038/s41598-018-27338-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Accepted: 05/25/2018] [Indexed: 11/08/2022] Open
Abstract
Successful spermatogenesis and oogenesis are the two genetically independent processes preceding embryo development. To date, several fertility-related proteins have been described in mammalian species. Nevertheless, further studies are required to discover more proteins associated with the development of germ cells and embryogenesis in order to shed more light on the processes. This work builds on our previous software (OOgenesis_Pred), mainly focusing on algorithms beyond what was previously done, in particular new fertility-related proteins and their classes (embryogenesis, spermatogenesis and oogenesis) based on the support vector machine according to the concept of Chou's pseudo-amino acid composition features. The results of five-fold cross validation, as well as the independent test demonstrated that this method is capable of predicting the fertility-related proteins and their classes with accuracy of more than 80%. Moreover, by using feature selection methods, important properties of fertility-related proteins were identified that allowed for their accurate classification. Based on the proposed method, a two-layer classifier software, named as "PrESOgenesis" ( https://github.com/mrb20045/PrESOgenesis ) was developed. The tool identified a query sequence (protein or transcript) as fertility or non-fertility-related protein at the first layer and then classified the predicted fertility-related protein into different classes of embryogenesis, spermatogenesis or oogenesis at the second layer.
Collapse
Affiliation(s)
| | - Maryam Rahimi
- Department of Animal and Poultry Science, College of Aburaihan, University of Tehran, Tehran, Iran
| | | | - Vahid Shariati J
- Genome Center, National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| | | |
Collapse
|
41
|
Abstract
Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global-property-based and homology-based prediction. In this chapter, the strengths and drawbacks of each of these approaches is described through many examples of methods that predict secretion, integration into membranes, or subcellular locations in general. The aim of this chapter is to provide a user-level introduction to the field with a minimum of computational theory.
Collapse
Affiliation(s)
- Henrik Nielsen
- Technical University of Denmark, Kemitorvet, Building 208, DK-2800, Kgs. Lyngby, Denmark.
| |
Collapse
|
42
|
A Novel Modeling in Mathematical Biology for Classification of Signal Peptides. Sci Rep 2018; 8:1039. [PMID: 29348418 PMCID: PMC5773712 DOI: 10.1038/s41598-018-19491-y] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Accepted: 01/02/2018] [Indexed: 11/17/2022] Open
Abstract
The molecular structure of macromolecules in living cells is ambiguous unless we classify them in a scientific manner. Signal peptides are of vital importance in determining the behavior of newly formed proteins towards their destined path in cellular and extracellular location in both eukaryotes and prokaryotes. In the present research work, a novel method is offered to foreknow the behavior of signal peptides and determine their cleavage site. The proposed model employs neural networks using isolated sets of prokaryote and eukaryote primary sequences. Protein sequences are classified as secretory or non-secretory in order to investigate secretory proteins and their signal peptides. In comparison with the previous prediction tools, the proposed algorithm is more rigorous, well-organized, significantly appropriate and highly accurate for the examination of signal peptides even in extensive collection of protein sequences.
Collapse
|
43
|
Olmedo P, Moreno AA, Sanhueza D, Balic I, Silva-Sanzana C, Zepeda B, Verdonk JC, Arriagada C, Meneses C, Campos-Vargas R. A catechol oxidase AcPPO from cherimoya (Annona cherimola Mill.) is localized to the Golgi apparatus. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2018; 266:46-54. [PMID: 29241566 DOI: 10.1016/j.plantsci.2017.10.012] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Revised: 10/18/2017] [Accepted: 10/20/2017] [Indexed: 06/07/2023]
Abstract
Cherimoya (Annona cherimola) is an exotic fruit with attractive organoleptic characteristics. However, it is highly perishable and susceptible to postharvest browning. In fresh fruit, browning is primarily caused by the polyphenol oxidase (PPO) enzyme catalyzing the oxidation of o-diphenols to quinones, which polymerize to form brown melanin pigment. There is no consensus in the literature regarding a specific role of PPO, and its subcellular localization in different plant species is mainly described within plastids. The present work determined the subcellular localization of a PPO protein from cherimoya (AcPPO). The obtained results revealed that the AcPPO- green fluorescent protein co-localized with a Golgi apparatus marker, and AcPPO activity was present in Golgi apparatus-enriched fractions. Likewise, transient expression assays revealed that AcPPO remained active in Golgi apparatus-enriched fractions obtained from tobacco leaves. These results suggest a putative function of AcPPO in the Golgi apparatus of cherimoya, providing new perspectives on PPO functionality in the secretory pathway, its effects on cherimoya physiology, and the evolution of this enzyme.
Collapse
Affiliation(s)
- Patricio Olmedo
- Centro de Biotecnología Vegetal, Facultad de Ciencias Biológicas, Universidad Andres Bello, República 217, Santiago, Chile.
| | - Adrián A Moreno
- Centro de Biotecnología Vegetal, Facultad de Ciencias Biológicas, Universidad Andres Bello, República 217, Santiago, Chile.
| | - Dayan Sanhueza
- Centro de Biotecnología Vegetal, Facultad de Ciencias Biológicas, Universidad Andres Bello, República 217, Santiago, Chile.
| | - Iván Balic
- Departamento de Acuicultura y Recursos Agroalimentarios, Universidad de Los Lagos, Fuchslocher 1305, Osorno, Chile.
| | - Christian Silva-Sanzana
- Centro de Biotecnología Vegetal, Facultad de Ciencias Biológicas, Universidad Andres Bello, República 217, Santiago, Chile.
| | - Baltasar Zepeda
- Centro de Biotecnología Vegetal, Facultad de Ciencias Biológicas, Universidad Andres Bello, República 217, Santiago, Chile.
| | - Julian C Verdonk
- Horticulture and Product Physiology, Wageningen University, Droevendaalsesteeg 1, 6708 PD Wageningen, The Netherlands.
| | - César Arriagada
- Laboratorio Biorremediación, Departamento de Ciencias Forestales, Facultad de Ciencias Agropecuarias y Forestales, Universidad de La Frontera, Francisco Salazar1145, Temuco, Chile.
| | - Claudio Meneses
- Centro de Biotecnología Vegetal, Facultad de Ciencias Biológicas, Universidad Andres Bello, República 217, Santiago, Chile.
| | - Reinaldo Campos-Vargas
- Centro de Biotecnología Vegetal, Facultad de Ciencias Biológicas, Universidad Andres Bello, República 217, Santiago, Chile.
| |
Collapse
|
44
|
Kunze M. Predicting Peroxisomal Targeting Signals to Elucidate the Peroxisomal Proteome of Mammals. Subcell Biochem 2018; 89:157-199. [PMID: 30378023 DOI: 10.1007/978-981-13-2233-4_7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Peroxisomes harbor a plethora of proteins, but the peroxisomal proteome as the entirety of all peroxisomal proteins is still unknown for mammalian species. Computational algorithms can be used to predict the subcellular localization of proteins based on their amino acid sequence and this method has been amply used to forecast the intracellular fate of individual proteins. However, when applying such algorithms systematically to all proteins of an organism the prediction of its peroxisomal proteome in silico should be possible. Therefore, a reliable detection of peroxisomal targeting signals (PTS ) acting as postal codes for the intracellular distribution of the encoding protein is crucial. Peroxisomal proteins can utilize different routes to reach their destination depending on the type of PTS. Accordingly, independent prediction algorithms have been developed for each type of PTS, but only those for type-1 motifs (PTS1) have so far reached a satisfying predictive performance. This is partially due to the low number of peroxisomal proteins limiting the power of statistical analyses and partially due to specific properties of peroxisomal protein import, which render functional PTS motifs inactive in specific contexts. Moreover, the prediction of the peroxisomal proteome is limited by the high number of proteins encoded in mammalian genomes, which causes numerous false positive predictions even when using reliable algorithms and buries the few yet unidentified peroxisomal proteins. Thus, the application of prediction algorithms to identify all peroxisomal proteins is currently ineffective as stand-alone method, but can display its full potential when combined with other methods.
Collapse
Affiliation(s)
- Markus Kunze
- Department of Pathobiology of the Nervous System, Center for Brain Research, Medical University of Vienna, Vienna, Austria.
| |
Collapse
|
45
|
pLoc-mEuk: Predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. Genomics 2018; 110:50-58. [DOI: 10.1016/j.ygeno.2017.08.005] [Citation(s) in RCA: 180] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 08/10/2017] [Accepted: 08/11/2017] [Indexed: 11/22/2022]
|
46
|
Zhou H, Yang Y, Shen HB. Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features. Bioinformatics 2017; 33:843-853. [PMID: 27993784 DOI: 10.1093/bioinformatics/btw723] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Accepted: 11/17/2016] [Indexed: 11/13/2022] Open
Abstract
Motivation Protein subcellular localization prediction has been an important research topic in computational biology over the last decade. Various automatic methods have been proposed to predict locations for large scale protein datasets, where statistical machine learning algorithms are widely used for model construction. A key step in these predictors is encoding the amino acid sequences into feature vectors. Many studies have shown that features extracted from biological domains, such as gene ontology and functional domains, can be very useful for improving the prediction accuracy. However, domain knowledge usually results in redundant features and high-dimensional feature spaces, which may degenerate the performance of machine learning models. Results In this paper, we propose a new amino acid sequence-based human protein subcellular location prediction approach Hum-mPLoc 3.0, which covers 12 human subcellular localizations. The sequences are represented by multi-view complementary features, i.e. context vocabulary annotation-based gene ontology (GO) terms, peptide-based functional domains, and residue-based statistical features. To systematically reflect the structural hierarchy of the domain knowledge bases, we propose a novel feature representation protocol denoted as HCM (Hidden Correlation Modeling), which will create more compact and discriminative feature vectors by modeling the hidden correlations between annotation terms. Experimental results on four benchmark datasets show that HCM improves prediction accuracy by 5-11% and F 1 by 8-19% compared with conventional GO-based methods. A large-scale application of Hum-mPLoc 3.0 on the whole human proteome reveals proteins co-localization preferences in the cell. Availability and Implementation www.csbio.sjtu.edu.cn/bioinf/Hum-mPLoc3/. Contacts hbshen@sjtu.edu.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hang Zhou
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Ministry of Education of China, Shanghai, China.,Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| | - Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China.,Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Ministry of Education of China, Shanghai, China.,Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| |
Collapse
|
47
|
Bentley SJ, Boshoff A. Hsp70/J-protein machinery from Glossina morsitans morsitans, vector of African trypanosomiasis. PLoS One 2017; 12:e0183858. [PMID: 28902917 PMCID: PMC5597180 DOI: 10.1371/journal.pone.0183858] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 08/11/2017] [Indexed: 11/18/2022] Open
Abstract
Tsetse flies (Glossina spp.) are the sole vectors of the protozoan parasites of the genus Trypanosoma, the causative agents of African Trypanosomiasis. Species of Glossina differ in vector competence and Glossina morsitans morsitans is associated with transmission of Trypanosoma brucei rhodesiense, which causes an acute and often fatal form of African Trypanosomiasis. Heat shock proteins are evolutionarily conserved proteins that play critical roles in proteostasis. The activity of heat shock protein 70 (Hsp70) is regulated by interactions with its J-protein (Hsp40) co-chaperones. Inhibition of these interactions are emerging as potential therapeutic targets. The assembly and annotation of the G. m. morsitans genome provided a platform to identify and characterize the Hsp70s and J-proteins, and carry out an evolutionary comparison to its well-studied eukaryotic counterparts, Drosophila melanogaster and Homo sapiens, as well as Stomoxys calcitrans, a comparator species. In our study, we identified 9 putative Hsp70 proteins and 37 putative J-proteins in G. m. morsitans. Phylogenetic analyses revealed three evolutionarily distinct groups of Hsp70s, with a closer relationship to orthologues from its blood-feeding dipteran relative Stomoxys calcitrans. G. m. morsitans also lacked the high number of heat inducible Hsp70s found in D. melanogaster. The potential localisations, functions, domain organisations and Hsp70/J-protein partnerships were also identified. A greater understanding of the heat shock 70 (Hsp70) and J-protein (Hsp40) families in G. m. morsitans could enhance our understanding of the cell biology of the tsetse fly.
Collapse
Affiliation(s)
- Stephen J. Bentley
- Biotechnology Innovation Centre, Rhodes University, Grahamstown, South Africa
| | - Aileen Boshoff
- Biotechnology Innovation Centre, Rhodes University, Grahamstown, South Africa
- * E-mail:
| |
Collapse
|
48
|
pLoc-mVirus: Predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAAC. Gene 2017; 628:315-321. [DOI: 10.1016/j.gene.2017.07.036] [Citation(s) in RCA: 135] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Revised: 07/08/2017] [Accepted: 07/11/2017] [Indexed: 12/25/2022]
|
49
|
GLUT10-Lacking in Arterial Tortuosity Syndrome-Is Localized to the Endoplasmic Reticulum of Human Fibroblasts. Int J Mol Sci 2017; 18:ijms18081820. [PMID: 28829359 PMCID: PMC5578206 DOI: 10.3390/ijms18081820] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Revised: 08/13/2017] [Accepted: 08/13/2017] [Indexed: 01/02/2023] Open
Abstract
GLUT10 belongs to a family of transporters that catalyze the uptake of sugars/polyols by facilitated diffusion. Loss-of-function mutations in the SLC2A10 gene encoding GLUT10 are responsible for arterial tortuosity syndrome (ATS). Since subcellular distribution of the transporter is dubious, we aimed to clarify the localization of GLUT10. In silico GLUT10 localization prediction suggested its presence in the endoplasmic reticulum (ER). Immunoblotting showed the presence of GLUT10 protein in the microsomal, but not in mitochondrial fractions of human fibroblasts and liver tissue. An even cytosolic distribution with an intense perinuclear decoration of GLUT10 was demonstrated by immunofluorescence in human fibroblasts, whilst mitochondrial markers revealed a fully different decoration pattern. GLUT10 decoration was fully absent in fibroblasts from three ATS patients. Expression of exogenous, tagged GLUT10 in fibroblasts from an ATS patient revealed a strict co-localization with the ER marker protein disulfide isomerase (PDI). The results demonstrate that GLUT10 is present in the ER.
Collapse
|
50
|
Nielsen H. Predicting Subcellular Localization of Proteins by Bioinformatic Algorithms. Curr Top Microbiol Immunol 2017; 404:129-158. [PMID: 26728066 DOI: 10.1007/82_2015_5006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
When predicting the subcellular localization of proteins from their amino acid sequences, there are basically three approaches: signal-based, global property-based, and homology-based. Each of these has its advantages and drawbacks, and it is important when comparing methods to know which approach was used. Various statistical and machine learning algorithms are used with all three approaches, and various measures and standards are employed when reporting the performances of the developed methods. This chapter presents a number of available methods for prediction of sorting signals and subcellular localization, but rather than providing a checklist of which predictors to use, it aims to function as a guide for critical assessment of prediction methods.
Collapse
Affiliation(s)
- Henrik Nielsen
- Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, Kemitorvet building 208, 2800, Lyngby, Denmark.
| |
Collapse
|