1
|
Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024; 23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| |
Collapse
|
2
|
Gillani M, Pollastri G. SCLpred-ECL: Subcellular Localization Prediction by Deep N-to-1 Convolutional Neural Networks. Int J Mol Sci 2024; 25:5440. [PMID: 38791479 PMCID: PMC11121631 DOI: 10.3390/ijms25105440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 05/09/2024] [Accepted: 05/11/2024] [Indexed: 05/26/2024] Open
Abstract
The subcellular location of a protein provides valuable insights to bioinformaticians in terms of drug designs and discovery, genomics, and various other aspects of medical research. Experimental methods for protein subcellular localization determination are time-consuming and expensive, whereas computational methods, if accurate, would represent a much more efficient alternative. This article introduces an ab initio protein subcellular localization predictor based on an ensemble of Deep N-to-1 Convolutional Neural Networks. Our predictor is trained and tested on strict redundancy-reduced datasets and achieves 63% accuracy for the diverse number of classes. This predictor is a step towards bridging the gap between a protein sequence and the protein's function. It can potentially provide information about protein-protein interaction to facilitate drug design and processes like vaccine production that are essential to disease prevention.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), D04 V1W8 Dublin, Ireland;
| | | |
Collapse
|
3
|
Yue T, Wang Y, Zhang L, Gu C, Xue H, Wang W, Lyu Q, Dun Y. Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models. Int J Mol Sci 2023; 24:15858. [PMID: 37958843 PMCID: PMC10649223 DOI: 10.3390/ijms242115858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 10/24/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open
Abstract
The data explosion driven by advancements in genomic research, such as high-throughput sequencing techniques, is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in various fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning, since we expect a superhuman intelligence that explores beyond our knowledge to interpret the genome from deep learning. A powerful deep learning model should rely on the insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with proper deep learning-based architecture, and we remark on practical considerations of developing deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research and point out current challenges and potential research directions for future genomics applications. We believe the collaborative use of ever-growing diverse data and the fast iteration of deep learning models will continue to contribute to the future of genomics.
Collapse
Affiliation(s)
- Tianwei Yue
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Yuanxin Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Longxiang Zhang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Chunming Gu
- Department of Biomedical Engineering, School of Medicine, Johns Hopkins University, Baltimore, MD 21218, USA;
| | - Haoru Xue
- The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA;
| | - Wenping Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Qi Lyu
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI 48824, USA;
| | - Yujie Dun
- School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an 710049, China;
| |
Collapse
|
4
|
Agoni C, Stavropoulos I, Kirwan A, Mysior MM, Holton T, Kranjc T, Simpson JC, Roche HM, Shields DC. Cell-Penetrating Milk-Derived Peptides with a Non-Inflammatory Profile. Molecules 2023; 28:6999. [PMID: 37836842 PMCID: PMC10574647 DOI: 10.3390/molecules28196999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 09/24/2023] [Accepted: 09/25/2023] [Indexed: 10/15/2023] Open
Abstract
Milk-derived peptides are known to confer anti-inflammatory effects. We hypothesised that milk-derived cell-penetrating peptides might modulate inflammation in useful ways. Using computational techniques, we identified and synthesised peptides from the milk protein Alpha-S1-casein that were predicted to be cell-penetrating using a machine learning predictor. We modified the interpretation of the prediction results to consider the effects of histidine. Peptides were then selected for testing to determine their cell penetrability and anti-inflammatory effects using HeLa cells and J774.2 mouse macrophage cell lines. The selected peptides all showed cell penetrating behaviour, as judged using confocal microscopy of fluorescently labelled peptides. None of the peptides had an effect on either the NF-κB transcription factor or TNFα and IL-1β secretion. Thus, the identified milk-derived sequences have the ability to be internalised into the cell without affecting cell homeostatic mechanisms such as NF-κB activation. These peptides are worthy of further investigation for other potential bioactivities or as a naturally derived carrier to promote the cellular internalisation of other active peptides.
Collapse
Affiliation(s)
- Clement Agoni
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- School of Medicine, University College Dublin, Belfield, D04 W6F6 Dublin 4, Ireland
- Discipline of Pharmaceutical Sciences, University of KwaZulu Natal, Durban 4041, South Africa
| | - Ilias Stavropoulos
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- School of Medicine, University College Dublin, Belfield, D04 W6F6 Dublin 4, Ireland
| | - Anna Kirwan
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- School of Biology and Environmental Science, University College Dublin, Belfield, D04 N2E5 Dublin 4, Ireland
| | - Margharitha M. Mysior
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- Institute of Food and Health, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland
| | - Therese Holton
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- Institute of Food and Health, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland
| | - Tilen Kranjc
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- Institute of Food and Health, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland
| | - Jeremy C. Simpson
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- School of Biology and Environmental Science, University College Dublin, Belfield, D04 N2E5 Dublin 4, Ireland
| | - Helen M. Roche
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- Institute for Global Food Security, Queens University Belfast, Belfast BT9 5DL, UK
| | - Denis C. Shields
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin 4, Ireland (M.M.M.); (J.C.S.)
- School of Medicine, University College Dublin, Belfield, D04 W6F6 Dublin 4, Ireland
| |
Collapse
|
5
|
Abstract
Immune principles formulated by Jenner, Pasteur, and early immunologists served as fundamental propositions for vaccine discovery against many dreadful pathogens. However, decisive success in the form of an efficacious vaccine still eludes for diseases such as tuberculosis, leishmaniasis, and trypanosomiasis. Several antileishmanial vaccine trials have been undertaken in past decades incorporating live, attenuated, killed, or subunit vaccination, but the goal remains unmet. In light of the above facts, we have to reassess the principles of vaccination by dissecting factors associated with the hosts' immune response. This chapter discusses the pathogen-associated perturbations at various junctures during the generation of the immune response which inhibits antigenic processing, presentation, or remodels memory T cell repertoire. This can lead to ineffective priming or inappropriate activation of memory T cells during challenge infection. Thus, despite a protective primary response, vaccine failure can occur due to altered immune environments in the presence of pathogens.
Collapse
Affiliation(s)
| | - Sunil Kumar
- National Centre for Cell Science, Pune, Maharashtra, India
| | | | - Bhaskar Saha
- National Centre for Cell Science, Pune, Maharashtra, India.
- Trident Academy of Creative Technology, Bhubaneswar, Odisha, India.
| |
Collapse
|
6
|
Passi A, Tibocha-Bonilla JD, Kumar M, Tec-Campos D, Zengler K, Zuniga C. Genome-Scale Metabolic Modeling Enables In-Depth Understanding of Big Data. Metabolites 2021; 12:14. [PMID: 35050136 PMCID: PMC8778254 DOI: 10.3390/metabo12010014] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/18/2021] [Accepted: 12/20/2021] [Indexed: 11/16/2022] Open
Abstract
Genome-scale metabolic models (GEMs) enable the mathematical simulation of the metabolism of archaea, bacteria, and eukaryotic organisms. GEMs quantitatively define a relationship between genotype and phenotype by contextualizing different types of Big Data (e.g., genomics, metabolomics, and transcriptomics). In this review, we analyze the available Big Data useful for metabolic modeling and compile the available GEM reconstruction tools that integrate Big Data. We also discuss recent applications in industry and research that include predicting phenotypes, elucidating metabolic pathways, producing industry-relevant chemicals, identifying drug targets, and generating knowledge to better understand host-associated diseases. In addition to the up-to-date review of GEMs currently available, we assessed a plethora of tools for developing new GEMs that include macromolecular expression and dynamic resolution. Finally, we provide a perspective in emerging areas, such as annotation, data managing, and machine learning, in which GEMs will play a key role in the further utilization of Big Data.
Collapse
Affiliation(s)
- Anurag Passi
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0760, USA; (A.P.); (M.K.); (D.T.-C.); (K.Z.)
| | - Juan D. Tibocha-Bonilla
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0760, USA;
| | - Manish Kumar
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0760, USA; (A.P.); (M.K.); (D.T.-C.); (K.Z.)
| | - Diego Tec-Campos
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0760, USA; (A.P.); (M.K.); (D.T.-C.); (K.Z.)
- Facultad de Ingeniería Química, Campus de Ciencias Exactas e Ingenierías, Universidad Autónoma de Yucatán, Merida 97203, Yucatan, Mexico
| | - Karsten Zengler
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0760, USA; (A.P.); (M.K.); (D.T.-C.); (K.Z.)
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093-0412, USA
- Center for Microbiome Innovation, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0403, USA
| | - Cristal Zuniga
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0760, USA; (A.P.); (M.K.); (D.T.-C.); (K.Z.)
| |
Collapse
|
7
|
Timmons PB, Hewage CM. APPTEST is a novel protocol for the automatic prediction of peptide tertiary structures. Brief Bioinform 2021; 22:bbab308. [PMID: 34396417 PMCID: PMC8575040 DOI: 10.1093/bib/bbab308] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 07/05/2021] [Accepted: 07/16/2021] [Indexed: 01/29/2023] Open
Abstract
Good knowledge of a peptide's tertiary structure is important for understanding its function and its interactions with its biological targets. APPTEST is a novel computational protocol that employs a neural network architecture and simulated annealing methods for the prediction of peptide tertiary structure from the primary sequence. APPTEST works for both linear and cyclic peptides of 5-40 natural amino acids. APPTEST is computationally efficient, returning predicted structures within a number of minutes. APPTEST performance was evaluated on a set of 356 test peptides; the best structure predicted for each peptide deviated by an average of 1.9Å from its experimentally determined backbone conformation, and a native or near-native structure was predicted for 97% of the target sequences. A comparison of APPTEST performance with PEP-FOLD, PEPstrMOD and PepLook across benchmark datasets of short, long and cyclic peptides shows that on average APPTEST produces structures more native than the existing methods in all three categories. This innovative, cutting-edge peptide structure prediction method is available as an online web server at https://research.timmons.eu/apptest, facilitating in silico study and design of peptides by the wider research community.
Collapse
Affiliation(s)
- Patrick Brendan Timmons
- UCD School of Biomolecular and Biomedical Science, UCD Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Dublin 4, Ireland
| | - Chandralal M Hewage
- UCD School of Biomolecular and Biomedical Science, UCD Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Dublin 4, Ireland
| |
Collapse
|
8
|
Timmons PB, Hewage CM. ENNAVIA is a novel method which employs neural networks for antiviral and anti-coronavirus activity prediction for therapeutic peptides. Brief Bioinform 2021; 22:bbab258. [PMID: 34297817 PMCID: PMC8575049 DOI: 10.1093/bib/bbab258] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 06/09/2021] [Accepted: 06/18/2021] [Indexed: 11/14/2022] Open
Abstract
Viruses represent one of the greatest threats to human health, necessitating the development of new antiviral drug candidates. Antiviral peptides often possess excellent biological activity and a favourable toxicity profile, and therefore represent a promising field of novel antiviral drugs. As the quantity of sequencing data grows annually, the development of an accurate in silico method for the prediction of peptide antiviral activities is important. This study leverages advances in deep learning and cheminformatics to produce a novel sequence-based deep neural network classifier for the prediction of antiviral peptide activity. The method outperforms the existent best-in-class, with an external test accuracy of 93.9%, Matthews correlation coefficient of 0.87 and an Area Under the Curve of 0.93 on the dataset of experimentally validated peptide activities. This cutting-edge classifier is available as an online web server at https://research.timmons.eu/ennavia, facilitating in silico screening and design of peptide antiviral drugs by the wider research community.
Collapse
Affiliation(s)
- Patrick Brendan Timmons
- UCD School of Biomolecular and Biomedical Science, UCD Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Dublin 4, Ireland
| | - Chandralal M Hewage
- UCD School of Biomolecular and Biomedical Science, UCD Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Dublin 4, Ireland
| |
Collapse
|
9
|
Jiang Y, Wang D, Wang W, Xu D. Computational methods for protein localization prediction. Comput Struct Biotechnol J 2021; 19:5834-5844. [PMID: 34765098 PMCID: PMC8564054 DOI: 10.1016/j.csbj.2021.10.023] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 10/12/2021] [Accepted: 10/13/2021] [Indexed: 12/16/2022] Open
Abstract
The accurate annotation of protein localization is crucial in understanding protein function in tandem with a broad range of applications such as pathological analysis and drug design. Since most proteins do not have experimentally-determined localization information, the computational prediction of protein localization has been an active research area for more than two decades. In particular, recent machine-learning advancements have fueled the development of new methods in protein localization prediction. In this review paper, we first categorize the main features and algorithms used for protein localization prediction. Then, we summarize a list of protein localization prediction tools in terms of their coverage, characteristics, and accessibility to help users find suitable tools based on their needs. Next, we evaluate some of these tools on a benchmark dataset. Finally, we provide an outlook on the future exploration of protein localization methods.
Collapse
Affiliation(s)
- Yuexu Jiang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Duolin Wang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Weiwei Wang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| |
Collapse
|
10
|
Ahmad HM, Rahman MU, Ahmar S, Fiaz S, Azeem F, Shaheen T, Ijaz M, Anwer Bukhari S, Khan SA, Mora-Poblete F. Comparative genomic analysis of MYB transcription factors for cuticular wax biosynthesis and drought stress tolerance in Helianthus annuus L. Saudi J Biol Sci 2021; 28:5693-5703. [PMID: 34588881 PMCID: PMC8459054 DOI: 10.1016/j.sjbs.2021.06.009] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 05/19/2021] [Accepted: 06/02/2021] [Indexed: 11/26/2022] Open
Abstract
Sunflower is an important oil-seed crop in Pakistan, it is mainly cultivated in the spring season. It is severely affected by drought stress resulting in lower yield. Cuticular wax acts as the first defense line to protect plants from drought stress condition. It seals the aerial parts of plants and reduce the water loss from leaf surfaces. Various myeloblastosis (MYB) transcription factors (TFs) are involved in biosynthesis of epicuticular waxes under drought-stress. However, less information is available for MYB, TFs in drought stress and wax biosynthesis in sunflower. We used different computational tools to compare the Arabidopsis MYB, TFs involved in cuticular wax biosynthesis and drought stress tolerance with sunflower genome. We identified three putative MYB genes (MYB16, MYB94 and MYB96) in sunflower along with their seven homologs in Arabidopsis. Phylogenetic association of MYB TFs in Arabidopsis and sunflower indicated strong conservation of TFs in plant species. From gene structure analysis, it was observed that intron and exon organization was family-specific. MYB TFs were unevenly distributed on sunflower chromosomes. Evolutionary analysis indicated the segmental duplication of the MYB gene family in sunflower. Quantitative Real-Time PCR revealed the up-regulation of three MYB genes under drought stress. The gene expression of MYB16, MYB94 and MYB96 were found many folds higher in experimental plants than control. The present study provided the first insight into MYB TFs family's characterization in sunflower under drought stress conditions and wax biosynthesis TFs.
Collapse
Affiliation(s)
- Hafiz Muhammad Ahmad
- Department of Bioinformatics and Biotechnology, GC University, Faisalabad, Pakistan
| | - Mahmood-ur Rahman
- Department of Bioinformatics and Biotechnology, GC University, Faisalabad, Pakistan
- Corresponding authors.
| | - Sunny Ahmar
- Institute of Biological Sciences, Campus Talca, Universidad deTalca, Talca 3465548, Chile
| | - Sajid Fiaz
- Department of Plant Breeding and Genetics, The University of Haripur, 22620 Khyber Pakhtunkhwa, Pakistan
| | - Farrukh Azeem
- Department of Bioinformatics and Biotechnology, GC University, Faisalabad, Pakistan
| | - Tayyaba Shaheen
- Department of Bioinformatics and Biotechnology, GC University, Faisalabad, Pakistan
| | - Munazza Ijaz
- Department of Bioinformatics and Biotechnology, GC University, Faisalabad, Pakistan
| | | | - Sher Aslam Khan
- Department of Plant Breeding and Genetics, The University of Haripur, 22620 Khyber Pakhtunkhwa, Pakistan
| | - Freddy Mora-Poblete
- Institute of Biological Sciences, Campus Talca, Universidad deTalca, Talca 3465548, Chile
- Corresponding authors.
| |
Collapse
|
11
|
Kaleel M, Ellinger L, Lalor C, Pollastri G, Mooney C. SCLpred-MEM: Subcellular localization prediction of membrane proteins by deep N-to-1 convolutional neural networks. Proteins 2021; 89:1233-1239. [PMID: 33983651 DOI: 10.1002/prot.26144] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 02/22/2021] [Accepted: 05/06/2021] [Indexed: 11/11/2022]
Abstract
The knowledge of the subcellular location of a protein is a valuable source of information in genomics, drug design, and various other theoretical and analytical perspectives of bioinformatics. Due to the expensive and time-consuming nature of experimental methods of protein subcellular location determination, various computational methods have been developed for subcellular localization prediction. We introduce "SCLpred-MEM," an ab initio protein subcellular localization predictor, powered by an ensemble of Deep N-to-1 Convolutional Neural Networks (N1-NN) trained and tested on strict redundancy reduced datasets. SCLpred-MEM is available as a web-server predicting query proteins into two classes, membrane and non-membrane proteins. SCLpred-MEM achieves a Matthews correlation coefficient of 0.52 on a strictly homology-reduced independent test set and 0.62 on a less strict homology reduced independent test set, surpassing or matching other state-of-the-art subcellular localization predictors.
Collapse
Affiliation(s)
- Manaz Kaleel
- School of Computer Science, University College Dublin, Dublin, Ireland.,UCD Institute for Discovery, University College Dublin, Dublin, Ireland
| | - Liam Ellinger
- Whitacre College of Engineering, Texas Tech University, Lubbock, Texas, USA
| | - Clodagh Lalor
- School of Computer Science, University College Dublin, Dublin, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin, Dublin, Ireland.,UCD Institute for Discovery, University College Dublin, Dublin, Ireland
| | - Catherine Mooney
- School of Computer Science, University College Dublin, Dublin, Ireland
| |
Collapse
|
12
|
Van Oort CM, Ferrell JB, Remington JM, Wshah S, Li J. AMPGAN v2: Machine Learning-Guided Design of Antimicrobial Peptides. J Chem Inf Model 2021; 61:2198-2207. [PMID: 33787250 DOI: 10.1021/acs.jcim.0c01441] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Antibiotic resistance is a critical public health problem. Each year ∼2.8 million resistant infections lead to more than 35 000 deaths in the U.S. alone. Antimicrobial peptides (AMPs) show promise in treating resistant infections. However, applications of known AMPs have encountered issues in development, production, and shelf-life. To drive the development of AMP-based treatments, it is necessary to create design approaches with higher precision and selectivity toward resistant targets. Previously, we developed AMPGAN and obtained proof-of-concept evidence for the generative approach to design AMPs with experimental validation. Building on the success of AMPGAN, we present AMPGAN v2, a bidirectional conditional generative adversarial network (BiCGAN)-based approach for rational AMP design. AMPGAN v2 uses generator-discriminator dynamics to learn data-driven priors and controls generation using conditioning variables. The bidirectional component, implemented using a learned encoder to map data samples into the latent space of the generator, aids iterative manipulation of candidate peptides. These elements allow AMPGAN v2 to generate candidates that are novel, diverse, and tailored for specific applications, making it an efficient AMP design tool.
Collapse
Affiliation(s)
- Colin M Van Oort
- Department of Computer Science, University of Vermont, Burlington, Vermont 05405, United States
| | - Jonathon B Ferrell
- Department of Chemistry, University of Vermont, Burlington, Vermont 05405, United States
| | - Jacob M Remington
- Department of Chemistry, University of Vermont, Burlington, Vermont 05405, United States
| | - Safwan Wshah
- Department of Computer Science, University of Vermont, Burlington, Vermont 05405, United States
| | - Jianing Li
- Department of Chemistry, University of Vermont, Burlington, Vermont 05405, United States
| |
Collapse
|
13
|
Computational prediction of secreted proteins in gram-negative bacteria. Comput Struct Biotechnol J 2021; 19:1806-1828. [PMID: 33897982 PMCID: PMC8047123 DOI: 10.1016/j.csbj.2021.03.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 03/18/2021] [Accepted: 03/18/2021] [Indexed: 12/29/2022] Open
Abstract
Gram-negative bacteria harness multiple protein secretion systems and secrete a large proportion of the proteome. Proteins can be exported to periplasmic space, integrated into membrane, transported into extracellular milieu, or translocated into cytoplasm of contacting cells. It is important for accurate, genome-wide annotation of the secreted proteins and their secretion pathways. In this review, we systematically classified the secreted proteins according to the types of secretion systems in Gram-negative bacteria, summarized the known features of these proteins, and reviewed the algorithms and tools for their prediction.
Collapse
|
14
|
Muggia L, Ametrano CG, Sterflinger K, Tesei D. An Overview of Genomics, Phylogenomics and Proteomics Approaches in Ascomycota. Life (Basel) 2020; 10:E356. [PMID: 33348904 PMCID: PMC7765829 DOI: 10.3390/life10120356] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 12/10/2020] [Accepted: 12/12/2020] [Indexed: 12/26/2022] Open
Abstract
Fungi are among the most successful eukaryotes on Earth: they have evolved strategies to survive in the most diverse environments and stressful conditions and have been selected and exploited for multiple aims by humans. The characteristic features intrinsic of Fungi have required evolutionary changes and adaptations at deep molecular levels. Omics approaches, nowadays including genomics, metagenomics, phylogenomics, transcriptomics, metabolomics, and proteomics have enormously advanced the way to understand fungal diversity at diverse taxonomic levels, under changeable conditions and in still under-investigated environments. These approaches can be applied both on environmental communities and on individual organisms, either in nature or in axenic culture and have led the traditional morphology-based fungal systematic to increasingly implement molecular-based approaches. The advent of next-generation sequencing technologies was key to boost advances in fungal genomics and proteomics research. Much effort has also been directed towards the development of methodologies for optimal genomic DNA and protein extraction and separation. To date, the amount of proteomics investigations in Ascomycetes exceeds those carried out in any other fungal group. This is primarily due to the preponderance of their involvement in plant and animal diseases and multiple industrial applications, and therefore the need to understand the biological basis of the infectious process to develop mechanisms for biologic control, as well as to detect key proteins with roles in stress survival. Here we chose to present an overview as much comprehensive as possible of the major advances, mainly of the past decade, in the fields of genomics (including phylogenomics) and proteomics of Ascomycota, focusing particularly on those reporting on opportunistic pathogenic, extremophilic, polyextremotolerant and lichenized fungi. We also present a review of the mostly used genome sequencing technologies and methods for DNA sequence and protein analyses applied so far for fungi.
Collapse
Affiliation(s)
- Lucia Muggia
- Department of Life Sciences, University of Trieste, 34127 Trieste, Italy
| | - Claudio G. Ametrano
- Grainger Bioinformatics Center, Department of Science and Education, The Field Museum, Chicago, IL 60605, USA;
| | - Katja Sterflinger
- Academy of Fine Arts Vienna, Institute of Natual Sciences and Technology in the Arts, 1090 Vienna, Austria;
| | - Donatella Tesei
- Department of Biotechnology, University of Natural Resources and Life Sciences, 1190 Vienna, Austria;
| |
Collapse
|
15
|
Kumar R, Dhanda SK. Bird Eye View of Protein Subcellular Localization Prediction. Life (Basel) 2020; 10:E347. [PMID: 33327400 PMCID: PMC7764902 DOI: 10.3390/life10120347] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 12/11/2020] [Accepted: 12/11/2020] [Indexed: 12/12/2022] Open
Abstract
Proteins are made up of long chain of amino acids that perform a variety of functions in different organisms. The activity of the proteins is determined by the nucleotide sequence of their genes and by its 3D structure. In addition, it is essential for proteins to be destined to their specific locations or compartments to perform their structure and functions. The challenge of computational prediction of subcellular localization of proteins is addressed in various in silico methods. In this review, we reviewed the progress in this field and offered a bird eye view consisting of a comprehensive listing of tools, types of input features explored, machine learning approaches employed, and evaluation matrices applied. We hope the review will be useful for the researchers working in the field of protein localization predictions.
Collapse
Affiliation(s)
- Ravindra Kumar
- Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, NIH, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Sandeep Kumar Dhanda
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| |
Collapse
|
16
|
Li GP, Du PF, Shen ZA, Liu HY, Luo T. DPPN-SVM: Computational Identification of Mis-Localized Proteins in Cancers by Integrating Differential Gene Expressions With Dynamic Protein-Protein Interaction Networks. Front Genet 2020; 11:600454. [PMID: 33193746 PMCID: PMC7644922 DOI: 10.3389/fgene.2020.600454] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Accepted: 10/07/2020] [Indexed: 12/29/2022] Open
Abstract
Eukaryotic cells contain numerous components, which are known as subcellular compartments or subcellular organelles. Proteins must be sorted to proper subcellular compartments to carry out their molecular functions. Mis-localized proteins are related to various cancers. Identifying mis-localized proteins is important in understanding the pathology of cancers and in developing therapies. However, experimental methods, which are used to determine protein subcellular locations, are always costly and time-consuming. We tried to identify cancer-related mis-localized proteins in three different cancers using computational approaches. By integrating gene expression profiles and dynamic protein-protein interaction networks, we established DPPN-SVM (Dynamic Protein-Protein Network with Support Vector Machine), a predictive model using the SVM classifier with diffusion kernels. With this predictive model, we identified a number of mis-localized proteins. Since we introduced the dynamic protein-protein network, which has never been considered in existing works, our model is capable of identifying more mis-localized proteins than existing studies. As far as we know, this is the first study to incorporate dynamic protein-protein interaction network in identifying mis-localized proteins in cancers.
Collapse
Affiliation(s)
- Guang-Ping Li
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Zi-Ang Shen
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Hang-Yu Liu
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Tao Luo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
17
|
Cong H, Liu H, Chen Y, Cao Y. Self-evoluting framework of deep convolutional neural network for multilocus protein subcellular localization. Med Biol Eng Comput 2020; 58:3017-3038. [PMID: 33078303 DOI: 10.1007/s11517-020-02275-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 10/14/2020] [Indexed: 12/12/2022]
Abstract
In the present paper, deep convolutional neural network (DCNN) is applied to multilocus protein subcellular localization as it is more suitable for multi-class classification. There are two main problems with this application. First, the appropriate features for correlation between multiple sites are hard to find. Second, the classifier structure is difficult to determine as it is greatly affected by the distribution of classified data. To solve these problems, a self-evoluting framework using DCNNs for multilocus protein subcellular localization is proposed. It has three characteristics that the previous algorithms do not. The first is that it combines the ant colony algorithm with the DCNN to form a self-evoluting algorithm for multilocus protein subcellular localization. The second is that it randomly groups subcellular sites using a limited random k-labelsets multi-label classification method. It also solves complex problems in a divide-and-conquer approach and proposes a flexible expansion model. The third is that it realizes the random selection feature extraction method in the positioning process and avoids the defects in individual feature extraction methods. The algorithm in the present paper is tested on the human database, and the overall correct rate is 67.17%, which is higher than that for the stacked self-encoder (SAE), support vector machine (SVM), random forest classifier (RF), or single deep convolutional neural network.Graphical abstract The algorithm mentioned in the present paper mainly includes four parts. They are protein sequence data preprocessing, integrated DCNN model construction, finding optimal DCNN combination by ant colony optimization, and protein subcellular localization for sequences. These parts are sequential relationships and the data obtained in the previous part is the basis for the latter part of the function. In the part of data preprocessing, the limited RAkEL multi-label classification method is used to randomly group subcellular sites. At the same time, the feature fusion of protein sequences is carried out by using multiple feature extraction methods. Each combination including features and sites information corresponds to a DCNN model. In the part of finding optimal DCNN combination by ant colony optimization, the main purpose is to find the best combination of DCNN models through the global optimization ability of the ant colony algorithm. The positioning of sequences is mainly to obtain multilocus subcellular localization by the optimal model combination.
Collapse
Affiliation(s)
- Hanhan Cong
- School of Information Science and Engineering, Shandong Normal University, No. 88, Wenhua East Road, Jinan City, China.,Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Shandong Normal University, Jinan, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, No. 88, Wenhua East Road, Jinan City, China. .,Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Shandong Normal University, Jinan, China.
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China
| | - Yi Cao
- School of Information Science and Engineering, University of Jinan, Jinan, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China
| |
Collapse
|
18
|
Abstract
Background:
Revealing the subcellular location of a newly discovered protein can
bring insight into their function and guide research at the cellular level. The experimental methods
currently used to identify the protein subcellular locations are both time-consuming and expensive.
Thus, it is highly desired to develop computational methods for efficiently and effectively identifying
the protein subcellular locations. Especially, the rapidly increasing number of protein sequences
entering the genome databases has called for the development of automated analysis methods.
Methods:
In this review, we will describe the recent advances in predicting the protein subcellular
locations with machine learning from the following aspects: i) Protein subcellular location benchmark
dataset construction, ii) Protein feature representation and feature descriptors, iii) Common
machine learning algorithms, iv) Cross-validation test methods and assessment metrics, v) Web
servers.
Result & Conclusion:
Concomitant with a large number of protein sequences generated by highthroughput
technologies, four future directions for predicting protein subcellular locations with
machine learning should be paid attention. One direction is the selection of novel and effective features
(e.g., statistics, physical-chemical, evolutional) from the sequences and structures of proteins.
Another is the feature fusion strategy. The third is the design of a powerful predictor and the fourth
one is the protein multiple location sites prediction.
Collapse
Affiliation(s)
- Ting-He Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Shao-Wu Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| |
Collapse
|
19
|
Choudhary P, Chakdar H, Singh A, Kumar S, Singh SK, Aarthy M, Goswami SK, Srivastava AK, Saxena AK. Computational identification and antifungal bioassay reveals phytosterols as potential inhibitor of Alternaria arborescens. J Biomol Struct Dyn 2019; 38:1143-1157. [PMID: 30898083 DOI: 10.1080/07391102.2019.1597767] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Alternaria arborescens is a major pathogen for crops like tomato, tangerine and so on and its control is mostly dependent on the application of chemical agents. Plants as the sources of natural products are very attractive option for developing eco-friendly and natural antifungal agents. In this study, we modeled three-dimensional structure of chorismate synthase (CS) enzyme from A. arborescens. Docking studies of phytosterols, namely, γ-sitosterol and β-sitosterol, with CS showed them to be potential inhibitor of CS. To explore the stability and conformational flexibility of all the AaCS complex systems, molecular dynamics simulations were performed. None of the putative inhibitors as well as β- and γ-sitosterol showed interaction with the FMNH2 binding pocket of the tomato CS (major host of A. arborescens) indicating their suitability as antifungal compounds inhibiting the shikimate pathway without causing any harm to the host. An in vivo antifungal bioassay showed a significant reduction in fungal growth in the presence of β-sitosterol (500 ppm) which resulted in ∼23% and ∼17% reduction in fungal fresh and dry weight, respectively, at 8 days after inoculation. This study provides experimental evidence establishing natural sterols like β-sitosterol can be useful in curbing A. arborescens damage in an eco-friendly manner.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Prassan Choudhary
- Microbial Technology Unit, ICAR-National Bureau of Agriculturally Important Microorganisms, Mau, Uttar Pradesh, India
| | - Hillol Chakdar
- Microbial Technology Unit, ICAR-National Bureau of Agriculturally Important Microorganisms, Mau, Uttar Pradesh, India
| | - Arjun Singh
- Microbial Technology Unit, ICAR-National Bureau of Agriculturally Important Microorganisms, Mau, Uttar Pradesh, India
| | - Sunil Kumar
- Microbial Technology Unit, ICAR-National Bureau of Agriculturally Important Microorganisms, Mau, Uttar Pradesh, India
| | - Sanjeev Kumar Singh
- Department of Bioinformatics, Algappa University, Karaikudi, Tamil Nadu, India
| | - Murali Aarthy
- Department of Bioinformatics, Algappa University, Karaikudi, Tamil Nadu, India
| | - Sanjay Kumar Goswami
- Microbial Technology Unit, ICAR-National Bureau of Agriculturally Important Microorganisms, Mau, Uttar Pradesh, India
| | - Alok Kumar Srivastava
- Microbial Technology Unit, ICAR-National Bureau of Agriculturally Important Microorganisms, Mau, Uttar Pradesh, India
| | - Anil Kumar Saxena
- Microbial Technology Unit, ICAR-National Bureau of Agriculturally Important Microorganisms, Mau, Uttar Pradesh, India
| |
Collapse
|
20
|
Sharma V, Goel P, Kumar S, Singh AK. An apple transcription factor, MdDREB76, confers salt and drought tolerance in transgenic tobacco by activating the expression of stress-responsive genes. PLANT CELL REPORTS 2019; 38:221-241. [PMID: 30511183 DOI: 10.1007/s00299-018-2364-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Accepted: 11/27/2018] [Indexed: 06/09/2023]
Abstract
KEY MESSAGE An apple gene, MdDREB76 encodes a functional transcription factor and imparts salinity and drought stress endurance to transgenic tobacco by activating expression of stress-responsive genes. The dehydration-responsive element (DRE)-binding protein (DREB) transcription factors are well known to be involved in regulating abiotic stress-mediated gene expression in plants. In this study, MdDREB76 gene was isolated from apple (Malus x domestica), which encodes a functional transcription factor protein. Overexpression of MdDREB76 in tobacco conferred salt and drought stress tolerance to transgenic lines by inducing antioxidant enzymes, such as superoxide dismutase, ascorbate peroxidase and catalase. The higher membrane stability index, relative water content, proline, total soluble sugar content and lesser H2O2content, electrolyte leakage and lipid peroxidation in transgenics support the improved physiological status of transgenic plants as compared to WT plants under salinity and drought stresses. The MdDREB76 overexpression upregulated the expression of stress-responsive genes that provide salinity and drought stress endurance to the plants. Compared to WT plants, transgenic lines exhibited healthy growth and higher yield under stress conditions. The present study reports MdDREB76 as a key regulator that switches on the battery of downstream genes which impart salt and osmotic stress endurance to the transgenic plants and can be used for genetic engineering of crop plants to combat salinity and drought stresses.
Collapse
Affiliation(s)
- Vishal Sharma
- Department of Biotechnology, CSIR-Institute of Himalayan Bioresource Technology, Palampur, 176 061, India
- Academy of Scientific and Innovative Research, New Delhi, India
| | - Parul Goel
- Department of Biotechnology, CSIR-Institute of Himalayan Bioresource Technology, Palampur, 176 061, India
- Academy of Scientific and Innovative Research, New Delhi, India
| | - Sanjay Kumar
- Department of Biotechnology, CSIR-Institute of Himalayan Bioresource Technology, Palampur, 176 061, India
- Academy of Scientific and Innovative Research, New Delhi, India
| | - Anil Kumar Singh
- Department of Biotechnology, CSIR-Institute of Himalayan Bioresource Technology, Palampur, 176 061, India.
- Academy of Scientific and Innovative Research, New Delhi, India.
- ICAR-Indian Institute of Agricultural Biotechnology, Ranchi, 834 010, India.
| |
Collapse
|
21
|
Iyama T, Okur MN, Golato T, McNeill DR, Lu H, Hamilton R, Raja A, Bohr VA, Wilson DM. Regulation of the Intranuclear Distribution of the Cockayne Syndrome Proteins. Sci Rep 2018; 8:17490. [PMID: 30504782 PMCID: PMC6269539 DOI: 10.1038/s41598-018-36027-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 11/01/2018] [Indexed: 12/04/2022] Open
Abstract
Cockayne syndrome (CS) is an inherited disorder that involves photosensitivity, developmental defects, progressive degeneration and characteristics of premature aging. Evidence indicates primarily nuclear roles for the major CS proteins, CSA and CSB, specifically in DNA repair and RNA transcription. We reveal herein a complex regulation of CSB targeting that involves three major consensus signals: NLS1 (aa467-481), which directs nuclear and nucleolar localization in cooperation with NoLS1 (aa302-341), and NLS2 (aa1038-1055), which seemingly optimizes nuclear enrichment. CSB localization to the nucleolus was also found to be important for full UVC resistance. CSA, which does not contain any obvious targeting sequences, was adversely affected (i.e. presumably destabilized) by any form of truncation. No inter-coordination between the subnuclear localization of CSA and CSB was observed, implying that this aspect does not underlie the clinical features of CS. The E3 ubiquitin ligase binding partner of CSA, DDB1, played an important role in CSA stability (as well as DDB2), and facilitated CSA association with chromatin following UV irradiation; yet did not affect CSB chromatin binding. We also observed that initial recruitment of CSB to DNA interstrand crosslinks is similar in the nucleoplasm and nucleolus, although final accumulation is greater in the former. Whereas assembly of CSB at sites of DNA damage in the nucleolus was not affected by RNA polymerase I inhibition, stable retention at these sites of presumed repair was abrogated. Our studies reveal a multi-faceted regulation of the intranuclear dynamics of CSA and CSB that plays a role in mediating their cellular functions.
Collapse
Affiliation(s)
- Teruaki Iyama
- Laboratory of Molecular Gerontology, National Institute on Aging, Intramural Research Program, National Institutes of Health, 251 Bayview Blvd., Ste. 100, Baltimore, MD, 21224, USA
| | - Mustafa N Okur
- Laboratory of Molecular Gerontology, National Institute on Aging, Intramural Research Program, National Institutes of Health, 251 Bayview Blvd., Ste. 100, Baltimore, MD, 21224, USA
| | - Tyler Golato
- Laboratory of Molecular Gerontology, National Institute on Aging, Intramural Research Program, National Institutes of Health, 251 Bayview Blvd., Ste. 100, Baltimore, MD, 21224, USA
| | - Daniel R McNeill
- Laboratory of Molecular Gerontology, National Institute on Aging, Intramural Research Program, National Institutes of Health, 251 Bayview Blvd., Ste. 100, Baltimore, MD, 21224, USA
| | - Huiming Lu
- Laboratory of Molecular Gerontology, National Institute on Aging, Intramural Research Program, National Institutes of Health, 251 Bayview Blvd., Ste. 100, Baltimore, MD, 21224, USA
| | - Royce Hamilton
- Laboratory of Molecular Gerontology, National Institute on Aging, Intramural Research Program, National Institutes of Health, 251 Bayview Blvd., Ste. 100, Baltimore, MD, 21224, USA
| | - Aishwarya Raja
- Laboratory of Molecular Gerontology, National Institute on Aging, Intramural Research Program, National Institutes of Health, 251 Bayview Blvd., Ste. 100, Baltimore, MD, 21224, USA
| | - Vilhelm A Bohr
- Laboratory of Molecular Gerontology, National Institute on Aging, Intramural Research Program, National Institutes of Health, 251 Bayview Blvd., Ste. 100, Baltimore, MD, 21224, USA
| | - David M Wilson
- Laboratory of Molecular Gerontology, National Institute on Aging, Intramural Research Program, National Institutes of Health, 251 Bayview Blvd., Ste. 100, Baltimore, MD, 21224, USA.
| |
Collapse
|
22
|
Characterization of an Insecticidal Protein from Withania somnifera Against Lepidopteran and Hemipteran Pest. Mol Biotechnol 2018; 60:290-301. [PMID: 29492788 DOI: 10.1007/s12033-018-0070-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Lectins are carbohydrate-binding proteins with wide array of functions including plant defense against pathogens and insect pests. In the present study, a putative mannose-binding lectin (WsMBP1) of 1124 bp was isolated from leaves of Withania somnifera. The gene was expressed in E. coli, and the recombinant WsMBP1 with a predicted molecular weight of 31 kDa was tested for its insecticidal properties against Hyblaea puera (Lepidoptera: Hyblaeidae) and Probergrothius sanguinolens (Hemiptera: Pyrrhocoridae). Delay in growth and metamorphosis, decreased larval body mass and increased mortality was recorded in recombinant WsMBP1-fed larvae. Histological studies on the midgut of lectin-treated insects showed disrupted and diffused secretory cells surrounding the gut lumen in larvae of H. puera and P. sanguinolens, implicating its role in disruption of the digestive process and nutrient assimilation in the studied insect pests. The present study indicates that WsMBP1 can act as a potential gene resource in future transformation programs for incorporating insect pest tolerance in susceptible plant genotypes.
Collapse
|
23
|
Abstract
Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.
Collapse
Affiliation(s)
- Pierre Baldi
- Department of Computer Science, Institute for Genomics and Bioinformatics, and Center for Machine Learning and Intelligent Systems, University of California, Irvine, California 92697, USA
| |
Collapse
|
24
|
Khowal S, Naqvi SH, Monga S, Jain SK, Wajid S. Assessment of cellular and serum proteome from tongue squamous cell carcinoma patient lacking addictive proclivities for tobacco, betel nut, and alcohol: Case study. J Cell Biochem 2018; 119:5186-5221. [PMID: 29236289 DOI: 10.1002/jcb.26554] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 11/30/2017] [Indexed: 02/06/2023]
Abstract
The intriguing molecular pathways involved in oral carcinogenesis are still ambiguous. The oral squamous cell carcinoma (OSCC) ranks as the most common type constituting more than 90% of the globally diagnosed oral cancers cases. The elevation in the OSCC incidence rate during past 10 years has an alarming impression on human healthcare. The major challenges associated with OSCC include delayed diagnosis, high metastatic rates, and low 5-year survival rates. The present work foundations on reverse genetic strategy and involves the identification of genes showing expressional variability in an OSCC case lacking addictive proclivities for tobacco, betel nut, and/or alcohol, major etiologies. The expression modulations in the identified genes were analyzed in 16 patients comprising oral pre-cancer and cancer histo-pathologies. The genes SCCA1 and KRT1 were found to down regulate while DNAJC13, GIPC2, MRPL17, IG-Vreg, SSFA2, and UPF0415 upregulated in the oral pre-cancer and cancer pathologies, implicating the genes as crucial players in oral carcinogenesis.
Collapse
Affiliation(s)
- Sapna Khowal
- Department of Biotechnology, School of Chemical and Life Sciences, Jamia Hamdard, New Delhi, India
| | - Samar H Naqvi
- Molecular Diagnostics, Genetix Biotech Asia (P) Ltd., New Delhi, India
| | - Seema Monga
- Department of ENT, Hamdard Institute of Medical Sciences and Research, Jamia Hamdard, New Delhi, India
| | - Swatantra K Jain
- Department of Biotechnology, School of Chemical and Life Sciences, Jamia Hamdard, New Delhi, India
- Department of Biochemistry, Hamdard Institute of Medical Sciences and Research, Jamia Hamdard, New Delhi, India
| | - Saima Wajid
- Department of Biotechnology, School of Chemical and Life Sciences, Jamia Hamdard, New Delhi, India
| |
Collapse
|
25
|
Champagne A, Boutry M. A comprehensive proteome map of glandular trichomes of hop (Humulus lupulus
L.) female cones: Identification of biosynthetic pathways of the major terpenoid-related compounds and possible transport proteins. Proteomics 2017; 17. [DOI: 10.1002/pmic.201600411] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Revised: 01/23/2017] [Accepted: 02/09/2017] [Indexed: 11/06/2022]
Affiliation(s)
- Antoine Champagne
- Institut des Sciences de la Vie; Université catholique de Louvain; Louvain-la-Neuve Belgium
| | - Marc Boutry
- Institut des Sciences de la Vie; Université catholique de Louvain; Louvain-la-Neuve Belgium
| |
Collapse
|
26
|
Abstract
In sessile plants, the dynamic protein secretion pathways orchestrate the cellular responses to internal signals and external environmental changes in almost every aspect of plant developmental events. The cohort of plant proteins, secreted from the plant cells into the extracellular matrix, has been annotated as plant secretome. Therefore, the identification and characterization of secreted proteins will discover novel secretory potentials and establish the functional connection between cellular protein secretion and plant physiological phenomena. Noteworthy, an increasing number of bioinformatics databases and tools have been developed for computational predictions on either secreted proteins or secretory pathways. This chapter summarizes current accessible databases and tools for protein secretion analysis in Arabidopsis thaliana and higher plants, and provides feasible methodologies for bioinformatics analysis of secretome studies for the plant research community.
Collapse
Affiliation(s)
- Liyuan Chen
- RGC-AoE Centre for Organelle Biogenesis and Function, School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China.
| |
Collapse
|
27
|
Katt WP, Lukey MJ, Cerione RA. A tale of two glutaminases: homologous enzymes with distinct roles in tumorigenesis. Future Med Chem 2017; 9:223-243. [PMID: 28111979 PMCID: PMC5558546 DOI: 10.4155/fmc-2016-0190] [Citation(s) in RCA: 102] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Accepted: 12/01/2016] [Indexed: 01/17/2023] Open
Abstract
Many cancer cells exhibit an altered metabolic phenotype, in which glutamine consumption is upregulated relative to healthy cells. This metabolic reprogramming often depends upon mitochondrial glutaminase activity, which converts glutamine to glutamate, a key precursor for biosynthetic and bioenergetic processes. Two isozymes of glutaminase exist, a kidney-type (GLS) and a liver-type enzyme (GLS2 or LGA). While a majority of studies have focused on GLS, here we summarize key findings on both glutaminases, describing their structure and function, their roles in cancer and pharmacological approaches to inhibiting their activities.
Collapse
Affiliation(s)
- William P Katt
- Department of Molecular Medicine, Cornell University, Ithaca, NY 14853, USA
| | - Michael J Lukey
- Department of Molecular Medicine, Cornell University, Ithaca, NY 14853, USA
| | - Richard A Cerione
- Department of Molecular Medicine, Cornell University, Ithaca, NY 14853, USA
- Department of Chemistry & Chemical Biology, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
28
|
Thakur A, Rajput A, Kumar M. MSLVP: prediction of multiple subcellular localization of viral proteins using a support vector machine. MOLECULAR BIOSYSTEMS 2016; 12:2572-86. [DOI: 10.1039/c6mb00241b] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Knowledge of the subcellular location (SCL) of viral proteins in the host cell is important for understanding their function in depth.
Collapse
Affiliation(s)
- Anamika Thakur
- Bioinformatics Centre
- Institute of Microbial Technology
- Council of Scientific and Industrial Research
- Chandigarh-160036
- India
| | - Akanksha Rajput
- Bioinformatics Centre
- Institute of Microbial Technology
- Council of Scientific and Industrial Research
- Chandigarh-160036
- India
| | - Manoj Kumar
- Bioinformatics Centre
- Institute of Microbial Technology
- Council of Scientific and Industrial Research
- Chandigarh-160036
- India
| |
Collapse
|
29
|
Zhu X, Yang K, Wei X, Zhang Q, Rong W, Du L, Ye X, Qi L, Zhang Z. The wheat AGC kinase TaAGC1 is a positive contributor to host resistance to the necrotrophic pathogen Rhizoctonia cerealis. JOURNAL OF EXPERIMENTAL BOTANY 2015; 66:6591-603. [PMID: 26220083 PMCID: PMC4623678 DOI: 10.1093/jxb/erv367] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Considerable progress has been made in understanding the roles of AGC kinases in mammalian systems. However, very little is known about the roles of AGC kinases in wheat (Triticum aestivum). The necrotrophic fungus Rhizoctonia cerealis is the major pathogen of the destructive disease sharp eyespot of wheat. In this study, the wheat AGC kinase gene TaAGC1, responding to R. cerealis infection, was isolated, and its properties and role in wheat defence were characterized. R. cerealis-resistant wheat lines expressed TaAGC1 at higher levels than susceptible wheat lines. Sequence and phylogenetic analyses showed that the TaAGC1 protein is a serine/threonine kinase belonging to the NDR (nuclear Dbf2-related) subgroup of AGC kinases. Kinase activity assays proved that TaAGC1 is a functional kinase and the Asp-239 residue located in the conserved serine/threonine kinase domain of TaAGC1 is required for the kinase activity. Subcellular localization assays indicated that TaAGC1 localized in the cytoplasm and nucleus. Virus-induced TaAGC1 silencing revealed that the down-regulation of TaAGC1 transcripts significantly impaired wheat resistance to R. cerealis. The molecular characterization and responses of TaAGC1 overexpressing transgenic wheat plants indicated that TaAGC1 overexpression significantly enhanced resistance to sharp eyespot and reduced the accumulation of reactive oxygen species (ROS) in wheat plants challenged with R. cerealis. Furthermore, ROS-scavenging and certain defence-associated genes were up-regulated in resistant plants overexpressing TaAGC1 but down-regulated in susceptible knock-down plants. These results suggested that the kinase TaAGC1 positively contributes to wheat immunity to R. cerealis through regulating expression of ROS-related and defence-associated genes.
Collapse
Affiliation(s)
- Xiuliang Zhu
- The National Key Facility for Crop Gene Resources and Genetic Improvement, Key Laboratory of Biology and Genetic Improvement of Triticeae Crops, Ministry of Agriculture, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Kun Yang
- The National Key Facility for Crop Gene Resources and Genetic Improvement, Key Laboratory of Biology and Genetic Improvement of Triticeae Crops, Ministry of Agriculture, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Xuening Wei
- The National Key Facility for Crop Gene Resources and Genetic Improvement, Key Laboratory of Biology and Genetic Improvement of Triticeae Crops, Ministry of Agriculture, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Qiaofeng Zhang
- Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Wei Rong
- The National Key Facility for Crop Gene Resources and Genetic Improvement, Key Laboratory of Biology and Genetic Improvement of Triticeae Crops, Ministry of Agriculture, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Lipu Du
- The National Key Facility for Crop Gene Resources and Genetic Improvement, Key Laboratory of Biology and Genetic Improvement of Triticeae Crops, Ministry of Agriculture, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Xingguo Ye
- The National Key Facility for Crop Gene Resources and Genetic Improvement, Key Laboratory of Biology and Genetic Improvement of Triticeae Crops, Ministry of Agriculture, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Lin Qi
- The National Key Facility for Crop Gene Resources and Genetic Improvement, Key Laboratory of Biology and Genetic Improvement of Triticeae Crops, Ministry of Agriculture, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Zengyan Zhang
- The National Key Facility for Crop Gene Resources and Genetic Improvement, Key Laboratory of Biology and Genetic Improvement of Triticeae Crops, Ministry of Agriculture, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
30
|
Liu Z, Hu J. Mislocalization-related disease gene discovery using gene expression based computational protein localization prediction. Methods 2015; 93:119-27. [PMID: 26416496 DOI: 10.1016/j.ymeth.2015.09.022] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Revised: 09/17/2015] [Accepted: 09/21/2015] [Indexed: 01/09/2023] Open
Abstract
Protein sorting is an important mechanism for transporting proteins to their target subcellular locations after their synthesis. Mutations on genes may disrupt the well regulated protein sorting process, leading to a variety of mislocation related diseases. This paper proposes a methodology to discover such disease genes based on gene expression data and computational protein localization prediction. A kernel logistic regression based algorithm is used to successfully identify several candidate cancer genes which may cause cancers due to their mislocation within the cell. Our results also showed that compared to the gene co-expression network defined on Pearson correlation coefficients, the nonlinear Maximum Correlation Coefficients (MIC) based co-expression network give better results for subcellular localization prediction.
Collapse
Affiliation(s)
- Zhonghao Liu
- Department of Computer Science & Engineering, University of South Carolina, 301 Main Street, Columbia, SC 29208, United States
| | - Jianjun Hu
- Department of Computer Science & Engineering, University of South Carolina, 301 Main Street, Columbia, SC 29208, United States.
| |
Collapse
|
31
|
Volpato V, Alshomrani B, Pollastri G. Accurate Ab Initio and Template-Based Prediction of Short Intrinsically-Disordered Regions by Bidirectional Recurrent Neural Networks Trained on Large-Scale Datasets. Int J Mol Sci 2015; 16:19868-85. [PMID: 26307973 PMCID: PMC4581330 DOI: 10.3390/ijms160819868] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 07/28/2015] [Accepted: 07/29/2015] [Indexed: 12/02/2022] Open
Abstract
Intrinsically-disordered regions lack a well-defined 3D structure, but play key roles in determining the function of many proteins. Although predictors of disorder have been shown to achieve relatively high rates of correct classification of these segments, improvements over the the years have been slow, and accurate methods are needed that are capable of accommodating the ever-increasing amount of structurally-determined protein sequences to try to boost predictive performances. In this paper, we propose a predictor for short disordered regions based on bidirectional recurrent neural networks and tested by rigorous five-fold cross-validation on a large, non-redundant dataset collected from MobiDB, a new comprehensive source of protein disorder annotations. The system exploits sequence and structural information in the forms of frequency profiles, predicted secondary structure and solvent accessibility and direct disorder annotations from homologous protein structures (templates) deposited in the Protein Data Bank. The contributions of sequence, structure and homology information result in large improvements in predictive accuracy. Additionally, the large scale of the training set leads to low false positive rates, making our systems a robust and efficient way to address high-throughput disorder prediction.
Collapse
Affiliation(s)
- Viola Volpato
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.
- Adaptive and Complex Systems Laboratory, University College Dublin, Belfield, Dublin 4, Ireland.
| | - Badr Alshomrani
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.
- Adaptive and Complex Systems Laboratory, University College Dublin, Belfield, Dublin 4, Ireland.
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.
- Adaptive and Complex Systems Laboratory, University College Dublin, Belfield, Dublin 4, Ireland.
| |
Collapse
|
32
|
Wu Q, Wang Z, Li C, Ye Y, Li Y, Sun N. Protein functional properties prediction in sparsely-label PPI networks through regularized non-negative matrix factorization. BMC SYSTEMS BIOLOGY 2015; 9 Suppl 1:S9. [PMID: 25708164 PMCID: PMC4331684 DOI: 10.1186/1752-0509-9-s1-s9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Background Predicting functional properties of proteins in protein-protein interaction (PPI) networks presents a challenging problem and has important implication in computational biology. Collective classification (CC) that utilizes both attribute features and relational information to jointly classify related proteins in PPI networks has been shown to be a powerful computational method for this problem setting. Enabling CC usually increases accuracy when given a fully-labeled PPI network with a large amount of labeled data. However, such labels can be difficult to obtain in many real-world PPI networks in which there are usually only a limited number of labeled proteins and there are a large amount of unlabeled proteins. In this case, most of the unlabeled proteins may not connected to the labeled ones, the supervision knowledge cannot be obtained effectively from local network connections. As a consequence, learning a CC model in sparsely-labeled PPI networks can lead to poor performance. Results We investigate a latent graph approach for finding an integration latent graph by exploiting various latent linkages and judiciously integrate the investigated linkages to link (separate) the proteins with similar (different) functions. We develop a regularized non-negative matrix factorization (RNMF) algorithm for CC to make protein functional properties prediction by utilizing various data sources that are available in this problem setting, including attribute features, latent graph, and unlabeled data information. In RNMF, a label matrix factorization term and a network regularization term are incorporated into the non-negative matrix factorization (NMF) objective function to seek a matrix factorization that respects the network structure and label information for classification prediction. Conclusion Experimental results on KDD Cup tasks predicting the localization and functions of proteins to yeast genes demonstrate the effectiveness of the proposed RNMF method for predicting the protein properties. In the comparison, we find that the performance of the new method is better than those of the other compared CC algorithms especially in paucity of labeled proteins.
Collapse
|
33
|
Sormanni P, Camilloni C, Fariselli P, Vendruscolo M. The s2D method: simultaneous sequence-based prediction of the statistical populations of ordered and disordered regions in proteins. J Mol Biol 2014; 427:982-996. [PMID: 25534081 DOI: 10.1016/j.jmb.2014.12.007] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Revised: 12/10/2014] [Accepted: 12/12/2014] [Indexed: 11/18/2022]
Abstract
Extensive amounts of information about protein sequences are becoming available, as demonstrated by the over 79 million entries in the UniProt database. Yet, it is still challenging to obtain proteome-wide experimental information on the structural properties associated with these sequences. Fast computational predictors of secondary structure and of intrinsic disorder of proteins have been developed in order to bridge this gap. These two types of predictions, however, have remained largely separated, often preventing a clear characterization of the structure and dynamics of proteins. Here, we introduce a computational method to predict secondary-structure populations from amino acid sequences, which simultaneously characterizes structure and disorder in a unified statistical mechanics framework. To develop this method, called s2D, we exploited recent advances made in the analysis of NMR chemical shifts that provide quantitative information about the probability distributions of secondary-structure elements in disordered states. The results that we discuss show that the s2D method predicts secondary-structure populations with an average error of about 14%. A validation on three datasets of mostly disordered, mostly structured and partly structured proteins, respectively, shows that its performance is comparable to or better than that of existing predictors of intrinsic disorder and of secondary structure. These results indicate that it is possible to perform rapid and quantitative sequence-based characterizations of the structure and dynamics of proteins through the predictions of the statistical distributions of their ordered and disordered regions.
Collapse
Affiliation(s)
- Pietro Sormanni
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, UK
| | - Carlo Camilloni
- Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, UK
| | - Piero Fariselli
- Department of Computer Science, University of Bologna, 40127 Bologna, Italy
| | | |
Collapse
|
34
|
Wu Q, Ye Y, Ho SS, Zhou S. Semi-supervised multi-label collective classification ensemble for functional genomics. BMC Genomics 2014; 15 Suppl 9:S17. [PMID: 25521242 PMCID: PMC4290603 DOI: 10.1186/1471-2164-15-s9-s17] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND With the rapid accumulation of proteomic and genomic datasets in terms of genome-scale features and interaction networks through high-throughput experimental techniques, the process of manual predicting functional properties of the proteins has become increasingly cumbersome, and computational methods to automate this annotation task are urgently needed. Most of the approaches in predicting functional properties of proteins require to either identify a reliable set of labeled proteins with similar attribute features to unannotated proteins, or to learn from a fully-labeled protein interaction network with a large amount of labeled data. However, acquiring such labels can be very difficult in practice, especially for multi-label protein function prediction problems. Learning with only a few labeled data can lead to poor performance as limited supervision knowledge can be obtained from similar proteins or from connections between them. To effectively annotate proteins even in the paucity of labeled data, it is important to take advantage of all data sources that are available in this problem setting, including interaction networks, attribute feature information, correlations of functional labels, and unlabeled data. RESULTS In this paper, we show that the underlying nature of predicting functional properties of proteins using various data sources of relational data is a typical collective classification (CC) problem in machine learning. The protein functional prediction task with limited annotation is then cast into a semi-supervised multi-label collective classification (SMCC) framework. As such, we propose a novel generative model based SMCC algorithm, called GM-SMCC, to effectively compute the label probability distributions of unannotated protein instances and predict their functional properties. To further boost the predicting performance, we extend the method in an ensemble manner, called EGM-SMCC, by utilizing multiple heterogeneous networks with various latent linkages constructed to explicitly model the relationships among the nodes for effectively propagate the supervision knowledge from labeled to unlabeled nodes. CONCLUSION Experimental results on a yeast gene dataset predicting the functions and localization of proteins demonstrate the effectiveness of the proposed method. In the comparison, we find that the performances of the proposed algorithms are better than the other compared algorithms.
Collapse
|
35
|
Yu CS, Cheng CW, Su WC, Chang KC, Huang SW, Hwang JK, Lu CH. CELLO2GO: a web server for protein subCELlular LOcalization prediction with functional gene ontology annotation. PLoS One 2014; 9:e99368. [PMID: 24911789 PMCID: PMC4049835 DOI: 10.1371/journal.pone.0099368] [Citation(s) in RCA: 276] [Impact Index Per Article: 27.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2013] [Accepted: 05/14/2014] [Indexed: 01/15/2023] Open
Abstract
CELLO2GO (http://cello.life.nctu.edu.tw/cello2go/) is a publicly available, web-based system for screening various properties of a targeted protein and its subcellular localization. Herein, we describe how this platform is used to obtain a brief or detailed gene ontology (GO)-type categories, including subcellular localization(s), for the queried proteins by combining the CELLO localization-predicting and BLAST homology-searching approaches. Given a query protein sequence, CELLO2GO uses BLAST to search for homologous sequences that are GO annotated in an in-house database derived from the UniProt KnowledgeBase database. At the same time, CELLO attempts predict at least one subcellular localization on the basis of the species in which the protein is found. When homologs for the query sequence have been identified, the number of terms found for each of their GO categories, i.e., cellular compartment, molecular function, and biological process, are summed and presented as pie charts representing possible functional annotations for the queried protein. Although the experimental subcellular localization of a protein may not be known, and thus not annotated, CELLO can confidentially suggest a subcellular localization. CELLO2GO should be a useful tool for research involving complex subcellular systems because it combines CELLO and BLAST into one platform and its output is easily manipulated such that the user-specific questions may be readily addressed.
Collapse
Affiliation(s)
- Chin-Sheng Yu
- Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan
- Master's Program in Biomedical Informatics and Biomedical Engineering, Feng Chia University, Taichung, Taiwan
| | - Chih-Wen Cheng
- Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan
| | - Wen-Chi Su
- Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan
| | - Kuei-Chung Chang
- Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan
| | - Shao-Wei Huang
- Department of Medical Informatics, Tzu Chi University, Hualien, Taiwan
| | - Jenn-Kang Hwang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Center of Bioinformatics Research, National Chiao Tung University, Hsinchu, Taiwan
| | - Chih-Hao Lu
- Graduate Institute of Basic Medical Science, China Medical University, Taichung, Taiwan
- * E-mail:
| |
Collapse
|
36
|
Talukdar S, Zutshi S, Prashanth KS, Saikia KK, Kumar P. Identification of potential vaccine candidates against Streptococcus pneumoniae by reverse vaccinology approach. Appl Biochem Biotechnol 2014; 172:3026-41. [PMID: 24482282 PMCID: PMC7090528 DOI: 10.1007/s12010-014-0749-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Accepted: 01/20/2014] [Indexed: 11/06/2022]
Abstract
In the past few decades, genome-based approaches have contributed significantly to vaccine development. Our aim was to identify the most conserved and immunogenic antigens of Streptococcus pneumoniae, which can be potential vaccine candidates in the future. BLASTn was done to identify the most conserved antigens. PSORTb 3.0.2 was run to predict the subcellular localization of the proteins. B cell epitope prediction was done for the immunogenicity testing. Finally, BLASTp was done for verifying the extent of similarity to human proteome to exclude the possibility of autoimmunity. Proteins failing to comply with the set parameters were filtered at each step. Based on the above criteria, out of the initial 22 pneumococcal proteins selected for screening, pavB and pullulanase were the most promising candidate proteins.
Collapse
Affiliation(s)
- Sandipan Talukdar
- Department of Biotechnology & Bioengineering, IST, Gauhati University, Jalukbari, Guwahati, Assam, India, 781014
| | | | | | | | | |
Collapse
|
37
|
Adelfio A, Volpato V, Pollastri G. SCLpredT: Ab initio and homology-based prediction of subcellular localization by N-to-1 neural networks. SPRINGERPLUS 2013; 2:502. [PMID: 24133649 PMCID: PMC3795874 DOI: 10.1186/2193-1801-2-502] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/13/2013] [Accepted: 09/25/2013] [Indexed: 01/20/2023]
Abstract
Abstract The prediction of protein subcellular localization is a important step towards the prediction of protein function, and considerable effort has gone over the last decade into the development of computational predictors of protein localization. In this article we design a new predictor of protein subcellular localization, based on a Machine Learning model (N-to-1 Neural Networks) which we have recently developed. This system, in three versions specialised, respectively, on Plants, Fungi and Animals, has a rich output which incorporates the class “organelle” alongside cytoplasm, nucleus, mitochondria and extracellular, and, additionally, chloroplast in the case of Plants. We investigate the information gain of introducing additional inputs, including predicted secondary structure, and localization information from homologous sequences. To accommodate the latter we design a new algorithm which we present here for the first time. While we do not observe any improvement when including predicted secondary structure, we measure significant overall gains when adding homology information. The final predictor including homology information correctly predicts 74%, 79% and 60% of all proteins in the case of Fungi, Animals and Plants, respectively, and outperforms our previous, state-of-the-art predictor SCLpred, and the popular predictor BaCelLo. We also observe that the contribution of homology information becomes dominant over sequence information for sequence identity values exceeding 50% for Animals and Fungi, and 60% for Plants, confirming that subcellular localization is less conserved than structure. SCLpredT is publicly available at http://distillf.ucd.ie/sclpredt/. Sequence- or template-based predictions can be obtained, and up to 32kbytes of input can be processed in a single submission.
Collapse
Affiliation(s)
- Alessandro Adelfio
- School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4 Ireland ; Complex and Adaptive Systems Laboratory, University College Dublin, Belfield, Dublin 4 Ireland
| | | | | |
Collapse
|
38
|
Holton TA, Pollastri G, Shields DC, Mooney C. CPPpred: prediction of cell penetrating peptides. ACTA ACUST UNITED AC 2013; 29:3094-6. [PMID: 24064418 DOI: 10.1093/bioinformatics/btt518] [Citation(s) in RCA: 103] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Cell penetrating peptides (CPPs) are attracting much attention as a means of overcoming the inherently poor cellular uptake of various bioactive molecules. Here, we introduce CPPpred, a web server for the prediction of CPPs using a N-to-1 neural network. The server takes one or more peptide sequences, between 5 and 30 amino acids in length, as input and returns a prediction of how likely each peptide is to be cell penetrating. CPPpred was developed with redundancy reduced training and test sets, offering an advantage over the only other currently available CPP prediction method.
Collapse
Affiliation(s)
- Thérèse A Holton
- Complex and Adaptive Systems Laboratory, Conway Institute of Biomolecular and Biomedical Science, School of Medicine and Medical Science, Food For Health Ireland and School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland
| | | | | | | |
Collapse
|
39
|
Mooney C, Cessieux A, Shields DC, Pollastri G. SCL-Epred: a generalised de novo eukaryotic protein subcellular localisation predictor. Amino Acids 2013; 45:291-9. [PMID: 23568340 DOI: 10.1007/s00726-013-1491-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 03/26/2013] [Indexed: 11/26/2022]
Abstract
Knowledge of the subcellular location of a protein provides valuable information about its function, possible interaction with other proteins and drug targetability, among other things. The experimental determination of a protein's location in the cell is expensive, time consuming and open to human error. Fast and accurate predictors of subcellular location have an important role to play if the abundance of sequence data which is now available is to be fully exploited. In the post-genomic era, genomes in many diverse organisms are available. Many of these organisms are important in human and veterinary disease and fall outside of the well-studied plant, animal and fungi groups. We have developed a general eukaryotic subcellular localisation predictor (SCL-Epred) which predicts the location of eukaryotic proteins into three classes which are important, in particular, for determining the drug targetability of a protein-secreted proteins, membrane proteins and proteins that are neither secreted nor membrane. The algorithm powering SCL-Epred is a N-to-1 neural network and is trained on very large non-redundant sets of protein sequences. SCL-Epred performs well on training data achieving a Q of 86 % and a generalised correlation of 0.75 when tested in tenfold cross-validation on a set of 15,202 redundancy reduced protein sequences. The three class accuracy of SCL-Epred and LocTree2, and in particular a consensus predictor comprising both methods, surpasses that of other widely used predictors when benchmarked using a large redundancy reduced independent test set of 562 proteins. SCL-Epred is publicly available at http://distillf.ucd.ie/distill/ .
Collapse
Affiliation(s)
- Catherine Mooney
- Complex and Adaptive Systems Laboratory, Conway Institute of Biomolecular and Biomedical Science, School of Medicine and Medical Science, University College Dublin, Ireland.
| | | | | | | |
Collapse
|
40
|
Volpato V, Adelfio A, Pollastri G. Accurate prediction of protein enzymatic class by N-to-1 Neural Networks. BMC Bioinformatics 2013; 14 Suppl 1:S11. [PMID: 23368876 PMCID: PMC3548677 DOI: 10.1186/1471-2105-14-s1-s11] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
We present a novel ab initio predictor of protein enzymatic class. The predictor can classify proteins, solely based on their sequences, into one of six classes extracted from the enzyme commission (EC) classification scheme and is trained on a large, curated database of over 6,000 non-redundant proteins which we have assembled in this work. The predictor is powered by an ensemble of N-to-1 Neural Network, a novel architecture which we have recently developed. N-to-1 Neural Networks operate on the full sequence and not on predefined features. All motifs of a predefined length (31 residues in this work) are considered and are compressed by an N-to-1 Neural Network into a feature vector which is automatically determined during training. We test our predictor in 10-fold cross-validation and obtain state of the art results, with a 96% correct classification and 86% generalized correlation. All six classes are predicted with a specificity of at least 80% and false positive rates never exceeding 7%. We are currently investigating enhanced input encoding schemes which include structural information, and are analyzing trained networks to mine motifs that are most informative for the prediction, hence, likely, functionally relevant.
Collapse
Affiliation(s)
- Viola Volpato
- School of Computer Science and Informatics, University College Dublin, Ireland
| | | | | |
Collapse
|
41
|
BETAWARE: a machine-learning tool to detect and predict transmembrane beta-barrel proteins in prokaryotes. Bioinformatics 2013; 29:504-5. [DOI: 10.1093/bioinformatics/bts728] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
42
|
Carrera M, Cañas B, Gallardo JM. The sarcoplasmic fish proteome: pathways, metabolic networks and potential bioactive peptides for nutritional inferences. J Proteomics 2012. [PMID: 23201118 DOI: 10.1016/j.jprot.2012.11.016] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
This paper presents the first proteome network map for the sarcoplasmic fish proteome. A total of 183 non-redundant annotated proteins were identified in a shotgun proteome-wide analysis from 15 different fish species. The final protein compilation was investigated by integrated in-silico studies, including functional GO term enrichment, pathways studies and networks analysis. An in-silico interactomics map was built up merging all the identified proteins. The whole confidence network contains 84 nodes and 279 interactions. Most of the sarcoplasmic fish proteins were grouped under pathways and networks referring to energy, catabolism and lipid metabolism. As a new potential nutritional ingredient valuable bioactive peptides were also predicted after an in-silico human gastrointestinal digestion. As is presented in this study, the integrated global proteomics results and the bioinformatics analysis of the sarcoplasmic fish proteome show the feasibility of this approach to provide a comprehensive knowledge of this fraction since a functional and nutritional point of view.
Collapse
Affiliation(s)
- Mónica Carrera
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland.
| | | | | |
Collapse
|
43
|
Towards the improved discovery and design of functional peptides: common features of diverse classes permit generalized prediction of bioactivity. PLoS One 2012; 7:e45012. [PMID: 23056189 PMCID: PMC3466233 DOI: 10.1371/journal.pone.0045012] [Citation(s) in RCA: 299] [Impact Index Per Article: 24.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2012] [Accepted: 08/15/2012] [Indexed: 11/19/2022] Open
Abstract
The conventional wisdom is that certain classes of bioactive peptides have specific structural features that endow their particular functions. Accordingly, predictions of bioactivity have focused on particular subgroups, such as antimicrobial peptides. We hypothesized that bioactive peptides may share more general features, and assessed this by contrasting the predictive power of existing antimicrobial predictors as well as a novel general predictor, PeptideRanker, across different classes of peptides.We observed that existing antimicrobial predictors had reasonable predictive power to identify peptides of certain other classes i.e. toxin and venom peptides. We trained two general predictors of peptide bioactivity, one focused on short peptides (4-20 amino acids) and one focused on long peptides (> 20 amino acids). These general predictors had performance that was typically as good as, or better than, that of specific predictors. We noted some striking differences in the features of short peptide and long peptide predictions, in particular, high scoring short peptides favour phenylalanine. This is consistent with the hypothesis that short and long peptides have different functional constraints, perhaps reflecting the difficulty for typical short peptides in supporting independent tertiary structure.We conclude that there are general shared features of bioactive peptides across different functional classes, indicating that computational prediction may accelerate the discovery of novel bioactive peptides and aid in the improved design of existing peptides, across many functional classes. An implementation of the predictive method, PeptideRanker, may be used to identify among a set of peptides those that may be more likely to be bioactive.
Collapse
|