1
|
Sousa TF, Vieira Reça BNP, Castro GS, da Silva IJS, Caniato FF, de Araújo Júnior MB, Yamagishi MEB, Koolen HHF, Bataglion GA, Hanada RE, da Silva GF. Trichoderma agriamazonicum sp. nov. (Hypocreaceae), a new ally in the control of phytopathogens. Microbiol Res 2023; 275:127469. [PMID: 37543005 DOI: 10.1016/j.micres.2023.127469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 07/23/2023] [Accepted: 08/01/2023] [Indexed: 08/07/2023]
Abstract
The genus Trichoderma comprises more than 500 valid species and is commonly used in agriculture for the control of plant diseases. In the present study, a Trichoderma species isolated from Scleronema micranthum (Malvaceae) has been extensively characterized and the morphological and phylogenetic data support the proposition of a new fungal species herein named Trichoderma agriamazonicum. This species inhibited the mycelial growth of all the nine phytopathogens tested both by mycoparasitism and by the production of VOCs, with a highlight for the inhibition of Corynespora cassiicola and Colletotrichum spp. The VOCs produced by T. agriamazonicum were able to control Capsicum chinense fruit rot caused by Colletotrichum scovillei and no symptoms were observed after seven days of phytopathogen inoculation. GC-MS revealed the production of mainly 6-amyl-α-pyrone, 1-octen-3-ol and 3-octanone during interaction with C. scovillei in C. chinense fruit. The HLPC-MS/MS analysis allowed us to annotate trikoningin KBII, hypocrenone C, 5-hydroxy-de-O-methyllasiodiplodin and unprecedented 7-mer peptaibols and lipopeptaibols. Comparative genomic analysis of five related Trichoderma species reveals a high number of proteins shared only with T. koningiopsis, mainly the enzymes related to oxidative stress. Regarding the CAZyme composition, T. agriamazonicum is most closely related to T. atroviride. A high protein copy number related to lignin and chitin degradation is observed for all Trichoderma spp. analyzed, while the presence of licheninase GH12 was observed only in T. agriamazonicum. Genome mining analysis identified 33 biosynthetic gene clusters (BGCs) of which 27 are new or uncharacterized, and the main BGCs are related to the production of polyketides. These results demonstrate the potential of this newly described species for agriculture and biotechnology.
Collapse
Affiliation(s)
- Thiago Fernandes Sousa
- Programa de Pós-graduação em Biotecnologia, Universidade Federal do Amazonas (UFAM), 69080-900 Manaus, Brazil; Embrapa Amazônia Ocidental, 69010-970 Manaus, Brazil
| | - Bruna Nayara Pantoja Vieira Reça
- Programa de Pós-graduação em Agricultura no Trópico Úmido (ATU), Instituto Nacional de Pesquisas da Amazônia (INPA), 69067-375 Manaus, Brazil
| | - Gleucinei Santos Castro
- Grupo de Pesquisas em Metabolômica e Espectrometria de Massas, Universidade do Estado do Amazonas (UEA), 690065-130 Manaus, Brazil
| | - Ingride Jarline Santos da Silva
- Programa de Pós-graduação em Biotecnologia, Universidade Federal do Amazonas (UFAM), 69080-900 Manaus, Brazil; Embrapa Amazônia Ocidental, 69010-970 Manaus, Brazil
| | - Fernanda Fátima Caniato
- Departamento de Ciências Fundamentais e Desenvolvimento Agrícola, Faculdade de Ciências Agrárias, Universidade Federal do Amazonas (UFAM), 69080-900 Manaus, Brazil
| | | | | | - Hector Henrique Ferreira Koolen
- Grupo de Pesquisas em Metabolômica e Espectrometria de Massas, Universidade do Estado do Amazonas (UEA), 690065-130 Manaus, Brazil
| | - Giovana Anceski Bataglion
- Departamento de Química do Instituto de Ciências Exatas, Universidade Federal do Amazonas (UFAM), 69080-900 Manaus, Brazil
| | - Rogério Eiji Hanada
- Instituto Nacional de Pesquisas da Amazônia (INPA), 69067-375 Manaus, Brazil.
| | | |
Collapse
|
2
|
Sequeira AM, Lousa D, Rocha M. ProPythia: A Python package for protein classification based on machine and deep learning. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.07.102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
3
|
Brandes N, Ofer D, Peleg Y, Rappoport N, Linial M. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics 2022; 38:2102-2110. [PMID: 35020807 PMCID: PMC9386727 DOI: 10.1093/bioinformatics/btac020] [Citation(s) in RCA: 136] [Impact Index Per Article: 68.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 12/27/2021] [Accepted: 01/07/2022] [Indexed: 02/03/2023] Open
Abstract
SUMMARY Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme combines language modeling with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to long sequences. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains near state-of-the-art performance, and sometimes exceeds it, on multiple benchmarks covering diverse protein properties (including protein structure, post-translational modifications and biophysical attributes), despite using a far smaller and faster model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data. AVAILABILITY AND IMPLEMENTATION Code and pretrained model weights are available at https://github.com/nadavbra/protein_bert. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Yam Peleg
- Deep Trading Ltd., Haifa 3508401, Israel
| | - Nadav Rappoport
- Department of Software and Information Systems Engineering, Faculty of Engineering Sciences, Ben-Gurion University of the Negev, Beer Sheva 8410501, Israel
| | - Michal Linial
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| |
Collapse
|
4
|
Ma EJ, Siirola E, Moore C, Kummer A, Stoeckli M, Faller M, Bouquet C, Eggimann F, Ligibel M, Huynh D, Cutler G, Siegrist L, Lewis RA, Acker AC, Freund E, Koch E, Vogel M, Schlingensiepen H, Oakeley EJ, Snajdrova R. Machine-Directed Evolution of an Imine Reductase for Activity and Stereoselectivity. ACS Catal 2021. [DOI: 10.1021/acscatal.1c02786] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Eric J. Ma
- NIBR Informatics, Novartis Institutes for BioMedical Research (NIBR), 181 Massachusetts Ave, Cambridge, Massachusetts 02139, United States
| | - Elina Siirola
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Charles Moore
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Arkadij Kummer
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Markus Stoeckli
- Analytical Sciences and Imaging, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Michael Faller
- Chemical Biology and Therapeutics, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Caroline Bouquet
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Fabian Eggimann
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Mathieu Ligibel
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Dan Huynh
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Geoffrey Cutler
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Luca Siegrist
- NIBR Biologics Center, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Richard A. Lewis
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Anne-Christine Acker
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Ernst Freund
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Elke Koch
- Chemical Biology and Therapeutics, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Markus Vogel
- NIBR Biologics Center, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Holger Schlingensiepen
- Chemical Biology and Therapeutics, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Edward J. Oakeley
- Chemical Biology and Therapeutics, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Radka Snajdrova
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| |
Collapse
|
5
|
Griffith D, Holehouse AS. PARROT is a flexible recurrent neural network framework for analysis of large protein datasets. eLife 2021; 10:e70576. [PMID: 34533455 PMCID: PMC8448528 DOI: 10.7554/elife.70576] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 09/06/2021] [Indexed: 11/29/2022] Open
Abstract
The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.
Collapse
Affiliation(s)
- Daniel Griffith
- Department of Biochemistry and Molecular Biophysics, Washington University School of MedicineSt LouisUnited States
- Center for Science and Engineering Living Systems, Washington UniversitySt LouisUnited States
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of MedicineSt LouisUnited States
- Center for Science and Engineering Living Systems, Washington UniversitySt LouisUnited States
| |
Collapse
|
6
|
Ofer D, Brandes N, Linial M. The language of proteins: NLP, machine learning & protein sequences. Comput Struct Biotechnol J 2021; 19:1750-1758. [PMID: 33897979 PMCID: PMC8050421 DOI: 10.1016/j.csbj.2021.03.022] [Citation(s) in RCA: 97] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 03/19/2021] [Accepted: 03/19/2021] [Indexed: 12/12/2022] Open
Abstract
Natural language processing (NLP) is a field of computer science concerned with automated text and language analysis. In recent years, following a series of breakthroughs in deep and machine learning, NLP methods have shown overwhelming progress. Here, we review the success, promise and pitfalls of applying NLP algorithms to the study of proteins. Proteins, which can be represented as strings of amino-acid letters, are a natural fit to many NLP methods. We explore the conceptual similarities and differences between proteins and language, and review a range of protein-related tasks amenable to machine learning. We present methods for encoding the information of proteins as text and analyzing it with NLP methods, reviewing classic concepts such as bag-of-words, k-mers/n-grams and text search, as well as modern techniques such as word embedding, contextualized embedding, deep learning and neural language models. In particular, we focus on recent innovations such as masked language modeling, self-supervised learning and attention-based models. Finally, we discuss trends and challenges in the intersection of NLP and protein research.
Collapse
Affiliation(s)
| | - Nadav Brandes
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
7
|
Brandes N, Linial N, Linial M. Quantifying gene selection in cancer through protein functional alteration bias. Nucleic Acids Res 2020; 47:6642-6655. [PMID: 31334812 PMCID: PMC6649814 DOI: 10.1093/nar/gkz546] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 06/03/2019] [Accepted: 06/16/2019] [Indexed: 11/14/2022] Open
Abstract
Compiling the catalogue of genes actively involved in cancer is an ongoing endeavor, with profound implications to the understanding and treatment of the disease. An abundance of computational methods have been developed to screening the genome for candidate driver genes based on genomic data of somatic mutations in tumors. Existing methods make many implicit and explicit assumptions about the distribution of random mutations. We present FABRIC, a new framework for quantifying the selection of genes in cancer by assessing the effects of de-novo somatic mutations on protein-coding genes. Using a machine-learning model, we quantified the functional effects of ∼3M somatic mutations extracted from over 10 000 human cancerous samples, and compared them against the effects of all possible single-nucleotide mutations in the coding human genome. We detected 593 protein-coding genes showing statistically significant bias towards harmful mutations. These genes, discovered without any prior knowledge, show an overwhelming overlap with known cancer genes, but also include many overlooked genes. FABRIC is designed to avoid false discoveries by comparing each gene to its own background model using rigorous statistics, making minimal assumptions about the distribution of random somatic mutations. The framework is an open-source project with a simple command-line interface.
Collapse
Affiliation(s)
- Nadav Brandes
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel
| | - Nathan Linial
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel
| | - Michal Linial
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Israel
| |
Collapse
|
8
|
Lo Monte M, Manelfi C, Gemei M, Corda D, Beccari AR. ADPredict: ADP-ribosylation site prediction based on physicochemical and structural descriptors. Bioinformatics 2019; 34:2566-2574. [PMID: 29554239 PMCID: PMC6061869 DOI: 10.1093/bioinformatics/bty159] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Accepted: 03/14/2018] [Indexed: 01/27/2023] Open
Abstract
Motivation ADP-ribosylation is a post-translational modification (PTM) implicated in several crucial cellular processes, ranging from regulation of DNA repair and chromatin structure to cell metabolism and stress responses. To date, a complete understanding of ADP-ribosylation targets and their modification sites in different tissues and disease states is still lacking. Identification of ADP-ribosylation sites is required to discern the molecular mechanisms regulated by this modification. This motivated us to develop a computational tool for the prediction of ADP-ribosylated sites. Results Here, we present ADPredict, the first dedicated computational tool for the prediction of ADP-ribosylated aspartic and glutamic acids. This predictive algorithm is based on (i) physicochemical properties, (ii) in-house designed secondary structure-related descriptors and (iii) three-dimensional features of a set of human ADP-ribosylated proteins that have been reported in the literature. ADPredict was developed using principal component analysis and machine learning techniques; its performance was evaluated both internally via intensive bootstrapping and in predicting two external experimental datasets. It outperformed the only other available ADP-ribosylation prediction tool, ModPred. Moreover, a novel secondary structure descriptor, HM-ratio, was introduced and successfully contributed to the model development, thus representing a promising tool for bioinformatics studies, such as PTM prediction. Availability and implementation ADPredict is freely available at www.ADPredict.net. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matteo Lo Monte
- Institute of Protein Biochemistry, National Research Council, Naples, Italy
| | | | - Marica Gemei
- Dompé Farmaceutici SpA, L'Aquila.,Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Milano, Italy
| | - Daniela Corda
- Institute of Protein Biochemistry, National Research Council, Naples, Italy
| | - Andrea Rosario Beccari
- Institute of Protein Biochemistry, National Research Council, Naples, Italy.,Dompé Farmaceutici SpA, L'Aquila
| |
Collapse
|
9
|
Hong J, Luo Y, Zhang Y, Ying J, Xue W, Xie T, Tao L, Zhu F. Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning. Brief Bioinform 2019; 21:1437-1447. [PMID: 31504150 PMCID: PMC7412958 DOI: 10.1093/bib/bbz081] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 05/27/2019] [Accepted: 06/10/2019] [Indexed: 11/12/2022] Open
Abstract
Functional annotation of protein sequence with high accuracy has become one of the most important issues in modern biomedical studies, and computational approaches of significantly accelerated analysis process and enhanced accuracy are greatly desired. Although a variety of methods have been developed to elevate protein annotation accuracy, their ability in controlling false annotation rates remains either limited or not systematically evaluated. In this study, a protein encoding strategy, together with a deep learning algorithm, was proposed to control the false discovery rate in protein function annotation, and its performances were systematically compared with that of the traditional similarity-based and de novo approaches. Based on a comprehensive assessment from multiple perspectives, the proposed strategy and algorithm were found to perform better in both prediction stability and annotation accuracy compared with other de novo methods. Moreover, an in-depth assessment revealed that it possessed an improved capacity of controlling the false discovery rate compared with traditional methods. All in all, this study not only provided a comprehensive analysis on the performances of the newly proposed strategy but also provided a tool for the researcher in the fields of protein function annotation.
Collapse
Affiliation(s)
- Jiajun Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yang Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Junbiao Ying
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China
| | - Feng Zhu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| |
Collapse
|
10
|
Chandra A, Sharma A, Dehzangi A, Ranganathan S, Jokhan A, Chou KC, Tsunoda T. PhoglyStruct: Prediction of phosphoglycerylated lysine residues using structural properties of amino acids. Sci Rep 2018; 8:17923. [PMID: 30560923 PMCID: PMC6299098 DOI: 10.1038/s41598-018-36203-8] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Accepted: 11/16/2018] [Indexed: 12/22/2022] Open
Abstract
The biological process known as post-translational modification (PTM) contributes to diversifying the proteome hence affecting many aspects of normal cell biology and pathogenesis. There have been many recently reported PTMs, but lysine phosphoglycerylation has emerged as the most recent subject of interest. Despite a large number of proteins being sequenced, the experimental method for detection of phosphoglycerylated residues remains an expensive, time-consuming and inefficient endeavor in the post-genomic era. Instead, the computational methods are being proposed for accurately predicting phosphoglycerylated lysines. Though a number of predictors are available, performance in detecting phosphoglycerylated lysine residues is still limited. In this paper, we propose a new predictor called PhoglyStruct that utilizes structural information of amino acids alongside a multilayer perceptron classifier for predicting phosphoglycerylated and non-phosphoglycerylated lysine residues. For the experiment, we located phosphoglycerylated and non-phosphoglycerylated lysines in our employed benchmark. We then derived and integrated properties such as accessible surface area, backbone torsion angles, and local structure conformations. PhoglyStruct showed significant improvement in the ability to detect phosphoglycerylated residues from non-phosphoglycerylated ones when compared to previous predictors. The sensitivity, specificity, accuracy, Mathews correlation coefficient and AUC were 0.8542, 0.7597, 0.7834, 0.5468 and 0.8077, respectively. The data and Matlab/Octave software packages are available at https://github.com/abelavit/PhoglyStruct .
Collapse
Affiliation(s)
- Abel Chandra
- School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji.
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD-4111, Australia.
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, 113-8510, Japan.
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Kanagawa, Japan.
- School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji.
- CREST, JST, Tokyo, 113-8510, Japan.
| | - Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, Maryland, USA
| | - Shoba Ranganathan
- Department of Molecular Sciences, Macquarie University, Sydney, NSW, 2109, Australia
| | - Anjeela Jokhan
- Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji
| | - Kuo-Chen Chou
- The Gordon Life Science Institute, Boston, MA, 02478, USA
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, 113-8510, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Kanagawa, Japan
- CREST, JST, Tokyo, 113-8510, Japan
| |
Collapse
|
11
|
Jinnelov A, Ali L, Tinti M, Güther MLS, Ferguson MAJ. Single-subunit oligosaccharyltransferases of Trypanosoma brucei display different and predictable peptide acceptor specificities. J Biol Chem 2017; 292:20328-20341. [PMID: 28928222 PMCID: PMC5724017 DOI: 10.1074/jbc.m117.810945] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Revised: 09/13/2017] [Indexed: 11/10/2022] Open
Abstract
Trypanosoma brucei causes African trypanosomiasis and contains three full-length oligosaccharyltransferase (OST) genes; two of which, TbSTT3A and TbSTT3B, are expressed in the bloodstream form of the parasite. These OSTs have different peptide acceptor and lipid-linked oligosaccharide donor specificities, and trypanosomes do not follow many of the canonical rules developed for other eukaryotic N-glycosylation pathways, raising questions as to the basic architecture and detailed function of trypanosome OSTs. Here, we show by blue-native gel electrophoresis and stable isotope labeling in cell culture proteomics that the TbSTT3A and TbSTT3B proteins associate with each other in large complexes that contain no other detectable protein subunits. We probed the peptide acceptor specificities of the OSTs in vivo using a transgenic glycoprotein reporter system and performed glycoproteomics on endogenous parasite glycoproteins using sequential endoglycosidase H and peptide:N-glycosidase-F digestions. This allowed us to assess the relative occupancies of numerous N-glycosylation sites by endoglycosidase H-resistant N-glycans originating from Man5GlcNAc2-PP-dolichol transferred by TbSTT3A, and endoglycosidase H-sensitive N-glycans originating from Man9GlcNAc2-PP-dolichol transferred by TbSTT3B. Using machine learning, we assessed the features that best define TbSTT3A and TbSTT3B substrates in vivo and built an algorithm to predict the types of N-glycan most likely to predominate at all the putative N-glycosylation sites in the parasite proteome. Finally, molecular modeling was used to suggest why TbSTT3A has a distinct preference for sequons containing and/or flanked by acidic amino acid residues. Together, these studies provide insights into how a highly divergent eukaryote has re-wired protein N-glycosylation to provide protein sequence-specific N-glycan modifications. Data are available via ProteomeXchange with identifiers PXD007236, PXD007267, and PXD007268.
Collapse
Affiliation(s)
- Anders Jinnelov
- Wellcome Centre for Anti-Infectives Research, School of Life Sciences, University of Dundee, Dundee DD1 5EH, Scotland, United Kingdom
| | - Liaqat Ali
- Wellcome Centre for Anti-Infectives Research, School of Life Sciences, University of Dundee, Dundee DD1 5EH, Scotland, United Kingdom
| | - Michele Tinti
- Wellcome Centre for Anti-Infectives Research, School of Life Sciences, University of Dundee, Dundee DD1 5EH, Scotland, United Kingdom
| | - Maria Lucia S Güther
- Wellcome Centre for Anti-Infectives Research, School of Life Sciences, University of Dundee, Dundee DD1 5EH, Scotland, United Kingdom
| | - Michael A J Ferguson
- Wellcome Centre for Anti-Infectives Research, School of Life Sciences, University of Dundee, Dundee DD1 5EH, Scotland, United Kingdom.
| |
Collapse
|
12
|
Carvalho TFM, Silva JCF, Calil IP, Fontes EPB, Cerqueira FR. Rama: a machine learning approach for ribosomal protein prediction in plants. Sci Rep 2017; 7:16273. [PMID: 29176736 PMCID: PMC5701237 DOI: 10.1038/s41598-017-16322-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 10/30/2017] [Indexed: 12/04/2022] Open
Abstract
Ribosomal proteins (RPs) play a fundamental role within all type of cells, as they are major components of ribosomes, which are essential for translation of mRNAs. Furthermore, these proteins are involved in various physiological and pathological processes. The intrinsic biological relevance of RPs motivated advanced studies for the identification of unrevealed RPs. In this work, we propose a new computational method, termed Rama, for the prediction of RPs, based on machine learning techniques, with a particular interest in plants. To perform an effective classification, Rama uses a set of fundamental attributes of the amino acid side chains and applies a two-step procedure to classify proteins with unknown function as RPs. The evaluation of the resultant predictive models showed that Rama could achieve mean sensitivity, precision, and specificity of 0.91, 0.91, and 0.82, respectively. Furthermore, a list of proteins that have no annotation in Phytozome v.10, and are annotated as RPs in Phytozome v.12, were correctly classified by our models. Additional computational experiments have also shown that Rama presents high accuracy to differentiate ribosomal proteins from RNA-binding proteins. Finally, two novel proteins of Arabidopsis thaliana were validated in biological experiments. Rama is freely available at http://inctipp.bioagro.ufv.br:8080/Rama .
Collapse
Affiliation(s)
| | - José Cleydson F Silva
- Computer Science Department, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil
- National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil
| | - Iara Pinheiro Calil
- National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil
| | - Elizabeth Pacheco Batista Fontes
- National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil.
| | - Fabio Ribeiro Cerqueira
- Computer Science Department, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil.
- Department of Production Engineering, Universidade Federal Fluminense, Petrópolis, 25650-050, Rio de Janeiro, Brazil.
| |
Collapse
|