1
|
Zandawala M, Bilal Amir M, Shin J, Yim WC, Alfonso Yañez Guerra L. Proteome-wide neuropeptide identification using NeuroPeptide-HMMer (NP-HMMer). Gen Comp Endocrinol 2024; 357:114597. [PMID: 39084320 DOI: 10.1016/j.ygcen.2024.114597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 07/20/2024] [Accepted: 07/27/2024] [Indexed: 08/02/2024]
Abstract
Neuropeptides are essential neuronal signaling molecules that orchestrate animal behavior and physiology via actions within the nervous system and on peripheral tissues. Due to the small size of biologically active mature peptides, their identification on a proteome-wide scale poses a significant challenge using existing bioinformatics tools like BLAST. To address this, we have developed NeuroPeptide-HMMer (NP-HMMer), a hidden Markov model (HMM)-based tool to facilitate neuropeptide discovery, especially in underexplored invertebrates. NP-HMMer utilizes manually curated HMMs for 46 neuropeptide families, enabling rapid and accurate identification of neuropeptides. Validation of NP-HMMer on Drosophila melanogaster, Daphnia pulex, Tribolium castaneum and Tenebrio molitor demonstrated its effectiveness in identifying known neuropeptides across diverse arthropods. Additionally, we showcase the utility of NP-HMMer by discovering novel neuropeptides in Priapulida and Rotifera, identifying 22 and 19 new peptides, respectively. This tool represents a significant advancement in neuropeptide research, offering a robust method for annotating neuropeptides across diverse proteomes and providing insights into the evolutionary conservation of neuropeptide signaling pathways.
Collapse
Affiliation(s)
- Meet Zandawala
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, NV 89557, USA; Integrative Neuroscience Program, University of Nevada, Reno, NV 89557, USA; Neurobiology and Genetics, Theodor-Boveri-Institute, Biocenter, Julius-Maximilians-University of Würzburg, Am Hubland, 97074 Würzburg, Germany.
| | - Muhammad Bilal Amir
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, NV 89557, USA
| | - Joel Shin
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, NV 89557, USA
| | - Won C Yim
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, NV 89557, USA
| | - Luis Alfonso Yañez Guerra
- School of Biological Sciences, University of Southampton, University Road, SO17 1BJ Southampton, UK; Institute for Life Sciences, University of Southampton, University Road SO17 1BJ, Southampton, UK.
| |
Collapse
|
2
|
Thiel D, Yañez Guerra LA, Kieswetter A, Cole AG, Temmerman L, Technau U, Jékely G. Large-scale deorphanization of Nematostella vectensis neuropeptide G protein-coupled receptors supports the independent expansion of bilaterian and cnidarian peptidergic systems. eLife 2024; 12:RP90674. [PMID: 38727714 PMCID: PMC11087051 DOI: 10.7554/elife.90674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2024] Open
Abstract
Neuropeptides are ancient signaling molecules in animals but only few peptide receptors are known outside bilaterians. Cnidarians possess a large number of G protein-coupled receptors (GPCRs) - the most common receptors of bilaterian neuropeptides - but most of these remain orphan with no known ligands. We searched for neuropeptides in the sea anemone Nematostella vectensis and created a library of 64 peptides derived from 33 precursors. In a large-scale pharmacological screen with these peptides and 161 N. vectensis GPCRs, we identified 31 receptors specifically activated by 1 to 3 of 14 peptides. Mapping GPCR and neuropeptide expression to single-cell sequencing data revealed how cnidarian tissues are extensively connected by multilayer peptidergic networks. Phylogenetic analysis identified no direct orthology to bilaterian peptidergic systems and supports the independent expansion of neuropeptide signaling in cnidarians from a few ancestral peptide-receptor pairs.
Collapse
Affiliation(s)
- Daniel Thiel
- Living Systems Institute, University of ExeterExeterUnited Kingdom
| | | | - Amanda Kieswetter
- Animal Physiology & Neurobiology, Department of Biology, University of LeuvenLeuvenBelgium
| | - Alison G Cole
- Department of Neurosciences and Developmental Biology, Faculty of Life Sciences, University of ViennaViennaAustria
| | - Liesbet Temmerman
- Animal Physiology & Neurobiology, Department of Biology, University of LeuvenLeuvenBelgium
| | - Ulrich Technau
- Department of Neurosciences and Developmental Biology, Faculty of Life Sciences, University of ViennaViennaAustria
| | - Gáspár Jékely
- Living Systems Institute, University of ExeterExeterUnited Kingdom
- Centre for Organismal Studies (COS), Heidelberg UniversityHeidelbergGermany
| |
Collapse
|
3
|
Dotan E, Jaschek G, Pupko T, Belinkov Y. Effect of tokenization on transformers for biological sequences. Bioinformatics 2024; 40:btae196. [PMID: 38608190 PMCID: PMC11055402 DOI: 10.1093/bioinformatics/btae196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 02/20/2024] [Accepted: 04/11/2024] [Indexed: 04/14/2024] Open
Abstract
MOTIVATION Deep-learning models are transforming biological research, including many bioinformatics and comparative genomics algorithms, such as sequence alignments, phylogenetic tree inference, and automatic classification of protein functions. Among these deep-learning algorithms, models for processing natural languages, developed in the natural language processing (NLP) community, were recently applied to biological sequences. However, biological sequences are different from natural languages, such as English, and French, in which segmentation of the text to separate words is relatively straightforward. Moreover, biological sequences are characterized by extremely long sentences, which hamper their processing by current machine-learning models, notably the transformer architecture. In NLP, one of the first processing steps is to transform the raw text to a list of tokens. Deep-learning applications to biological sequence data mostly segment proteins and DNA to single characters. In this work, we study the effect of alternative tokenization algorithms on eight different tasks in biology, from predicting the function of proteins and their stability, through nucleotide sequence alignment, to classifying proteins to specific families. RESULTS We demonstrate that applying alternative tokenization algorithms can increase accuracy and at the same time, substantially reduce the input length compared to the trivial tokenizer in which each character is a token. Furthermore, applying these tokenization algorithms allows interpreting trained models, taking into account dependencies among positions. Finally, we trained these tokenizers on a large dataset of protein sequences containing more than 400 billion amino acids, which resulted in over a 3-fold decrease in the number of tokens. We then tested these tokenizers trained on large-scale data on the above specific tasks and showed that for some tasks it is highly beneficial to train database-specific tokenizers. Our study suggests that tokenizers are likely to be a critical component in future deep-network analysis of biological sequence data. AVAILABILITY AND IMPLEMENTATION Code, data, and trained tokenizers are available on https://github.com/technion-cs-nlp/BiologicalTokenizers.
Collapse
Affiliation(s)
- Edo Dotan
- The Henry and Marilyn Taub Faculty of Computer Science, Technion – Israel Institute of Technology, Haifa 3200003, Israel
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Gal Jaschek
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06510, United States
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Yonatan Belinkov
- The Henry and Marilyn Taub Faculty of Computer Science, Technion – Israel Institute of Technology, Haifa 3200003, Israel
| |
Collapse
|
4
|
Liu D, Lin Z, Jia C. NeuroCNN_GNB: an ensemble model to predict neuropeptides based on a convolution neural network and Gaussian naive Bayes. Front Genet 2023; 14:1226905. [PMID: 37576553 PMCID: PMC10414792 DOI: 10.3389/fgene.2023.1226905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 06/30/2023] [Indexed: 08/15/2023] Open
Abstract
Neuropeptides contain more chemical information than other classical neurotransmitters and have multiple receptor recognition sites. These characteristics allow neuropeptides to have a correspondingly higher selectivity for nerve receptors and fewer side effects. Traditional experimental methods, such as mass spectrometry and liquid chromatography technology, still need the support of a complete neuropeptide precursor database and the basic characteristics of neuropeptides. Incomplete neuropeptide precursor and information databases will lead to false-positives or reduce the sensitivity of recognition. In recent years, studies have proven that machine learning methods can rapidly and effectively predict neuropeptides. In this work, we have made a systematic attempt to create an ensemble tool based on four convolution neural network models. These baseline models were separately trained on one-hot encoding, AAIndex, G-gap dipeptide encoding and word2vec and integrated using Gaussian Naive Bayes (NB) to construct our predictor designated NeuroCNN_GNB. Both 5-fold cross-validation tests using benchmark datasets and independent tests showed that NeuroCNN_GNB outperformed other state-of-the-art methods. Furthermore, this novel framework provides essential interpretations that aid the understanding of model success by leveraging the powerful Shapley Additive exPlanation (SHAP) algorithm, thereby highlighting the most important features relevant for predicting neuropeptides.
Collapse
Affiliation(s)
- Di Liu
- Information Science and Technology College, Dalian Maritime University, Dalian, China
| | - Zhengkui Lin
- Information Science and Technology College, Dalian Maritime University, Dalian, China
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian, China
| |
Collapse
|
5
|
Thoma V, Sakai S, Nagata K, Ishii Y, Maruyama S, Abe A, Kondo S, Kawata M, Hamada S, Deguchi R, Tanimoto H. On the origin of appetite: GLWamide in jellyfish represents an ancestral satiety neuropeptide. Proc Natl Acad Sci U S A 2023; 120:e2221493120. [PMID: 37011192 PMCID: PMC10104569 DOI: 10.1073/pnas.2221493120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 01/20/2023] [Indexed: 04/05/2023] Open
Abstract
Food intake is regulated by internal state. This function is mediated by hormones and neuropeptides, which are best characterized in popular model species. However, the evolutionary origins of such feeding-regulating neuropeptides are poorly understood. We used the jellyfish Cladonema to address this question. Our combined transcriptomic, behavioral, and anatomical approaches identified GLWamide as a feeding-suppressing peptide that selectively inhibits tentacle contraction in this jellyfish. In the fruit fly Drosophila, myoinhibitory peptide (MIP) is a related satiety peptide. Surprisingly, we found that GLWamide and MIP were fully interchangeable in these evolutionarily distant species for feeding suppression. Our results suggest that the satiety signaling systems of diverse animals share an ancient origin.
Collapse
Affiliation(s)
- Vladimiros Thoma
- Graduate School of Life Sciences, Tohoku University, Sendai980-8577, Japan
- Department of Biology, Miyagi University of Education, Aoba-ku, Sendai980-0845, Japan
| | - Shuhei Sakai
- Graduate School of Life Sciences, Tohoku University, Sendai980-8577, Japan
| | - Koki Nagata
- Graduate School of Life Sciences, Tohoku University, Sendai980-8577, Japan
| | - Yuu Ishii
- Department of Biology, Miyagi University of Education, Aoba-ku, Sendai980-0845, Japan
- Department of Ecological Developmental Adaptability Life Sciences, Graduate School of Life Sciences, Tohoku University, Aobaku, Sendai980-8578, Japan
| | - Shinichiro Maruyama
- Department of Ecological Developmental Adaptability Life Sciences, Graduate School of Life Sciences, Tohoku University, Aobaku, Sendai980-8578, Japan
- Department of Life Science, Graduate School of Humanities and Sciences, Ochanomizu University, Bunkyo-ku, Tokyo112-8610, Japan
| | - Ayako Abe
- Graduate School of Life Sciences, Tohoku University, Sendai980-8577, Japan
| | - Shu Kondo
- Department of Biological Science and Technology, Faculty of Advanced Engineering, Tokyo University of Science, Katsushika-ku, Tokyo125-8585, Japan
- Invertebrate Genetics Laboratory, National Institute of Genetics, Mishima, Shizuoka411-8540, Japan
| | - Masakado Kawata
- Department of Ecological Developmental Adaptability Life Sciences, Graduate School of Life Sciences, Tohoku University, Aobaku, Sendai980-8578, Japan
| | - Shun Hamada
- Department of Food and Health Sciences, International College of Arts and Sciences, Fukuoka Women’s University, Fukuoka813-8529, Japan
| | - Ryusaku Deguchi
- Department of Biology, Miyagi University of Education, Aoba-ku, Sendai980-0845, Japan
| | - Hiromu Tanimoto
- Graduate School of Life Sciences, Tohoku University, Sendai980-8577, Japan
| |
Collapse
|
6
|
Liu Y, Wang S, Li X, Liu Y, Zhu X. NeuroPpred-SVM: A New Model for Predicting Neuropeptides Based on Embeddings of BERT. J Proteome Res 2023; 22:718-728. [PMID: 36749151 DOI: 10.1021/acs.jproteome.2c00363] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Neuropeptides play pivotal roles in different physiological processes and are related to different kinds of diseases. Identification of neuropeptides is of great benefit for studying the mechanism of these physiological processes and the treatment of neurological disorders. Several state-of-the-art neuropeptide predictors have been developed by using a two-layer stacking ensemble algorithm. Although the two-layer stacking ensemble algorithm can improve the feature representability, these models are complex, which are not as efficient as the models based on one classifier. In this study, we proposed a new model, NeuroPpred-SVM, to predict neuropeptides based on the embeddings of Bidirectional Encoder Representations from Transformers and other sequential features by using a support vector machine (SVM). The experimental results indicate that our model achieved a cross-validation area under the receiver operating characteristic (AUROC) curve of 0.969 on the training data set and an AUROC of 0.966 on the independent test set. By comparing our model with the other four state-of-the-art models including NeuroPIpred, PredNeuroP, NeuroPpred-Fuse, and NeuroPpred-FRL on the independent test set, our model achieved the highest AUROC, Matthews correlation coefficient, accuracy, and specificity, which indicate that our model outperforms the existing models. We believed that NeuroPpred-SVM could be a useful tool for identifying neuropeptides with high accuracy and low cost. The data sets and Python code are available at https://github.com/liuyf-a/NeuroPpred-SVM.
Collapse
Affiliation(s)
- Yufeng Liu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Shuyu Wang
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Xiang Li
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Yinbo Liu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| |
Collapse
|
7
|
Phetsanthad A, Vu NQ, Yu Q, Buchberger AR, Chen Z, Keller C, Li L. Recent advances in mass spectrometry analysis of neuropeptides. MASS SPECTROMETRY REVIEWS 2023; 42:706-750. [PMID: 34558119 PMCID: PMC9067165 DOI: 10.1002/mas.21734] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 08/22/2021] [Accepted: 08/28/2021] [Indexed: 05/08/2023]
Abstract
Due to their involvement in numerous biochemical pathways, neuropeptides have been the focus of many recent research studies. Unfortunately, classic analytical methods, such as western blots and enzyme-linked immunosorbent assays, are extremely limited in terms of global investigations, leading researchers to search for more advanced techniques capable of probing the entire neuropeptidome of an organism. With recent technological advances, mass spectrometry (MS) has provided methodology to gain global knowledge of a neuropeptidome on a spatial, temporal, and quantitative level. This review will cover key considerations for the analysis of neuropeptides by MS, including sample preparation strategies, instrumental advances for identification, structural characterization, and imaging; insightful functional studies; and newly developed absolute and relative quantitation strategies. While many discoveries have been made with MS, the methodology is still in its infancy. Many of the current challenges and areas that need development will also be highlighted in this review.
Collapse
Affiliation(s)
- Ashley Phetsanthad
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Nhu Q. Vu
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Qing Yu
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Amanda R. Buchberger
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Zhengwei Chen
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Caitlin Keller
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Lingjun Li
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, WI 53705, USA
| |
Collapse
|
8
|
Aleotti A, Wilkie IC, Yañez-Guerra LA, Gattoni G, Rahman TA, Wademan RF, Ahmad Z, Ivanova DA, Semmens DC, Delroisse J, Cai W, Odekunle E, Egertová M, Ferrario C, Sugni M, Bonasoro F, Elphick MR. Discovery and functional characterization of neuropeptides in crinoid echinoderms. Front Neurosci 2022; 16:1006594. [PMID: 36583101 PMCID: PMC9793003 DOI: 10.3389/fnins.2022.1006594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 11/09/2022] [Indexed: 12/14/2022] Open
Abstract
Neuropeptides are one of the largest and most diverse families of signaling molecules in animals and, accordingly, they regulate many physiological processes and behaviors. Genome and transcriptome sequencing has enabled the identification of genes encoding neuropeptide precursor proteins in species from a growing variety of taxa, including bilaterian and non-bilaterian animals. Of particular interest are deuterostome invertebrates such as the phylum Echinodermata, which occupies a phylogenetic position that has facilitated reconstruction of the evolution of neuropeptide signaling systems in Bilateria. However, our knowledge of neuropeptide signaling in echinoderms is largely based on bioinformatic and experimental analysis of eleutherozoans-Asterozoa (starfish and brittle stars) and Echinozoa (sea urchins and sea cucumbers). Little is known about neuropeptide signaling in crinoids (feather stars and sea lilies), which are a sister clade to the Eleutherozoa. Therefore, we have analyzed transcriptome/genome sequence data from three feather star species, Anneissia japonica, Antedon mediterranea, and Florometra serratissima, to produce the first comprehensive identification of neuropeptide precursors in crinoids. These include representatives of bilaterian neuropeptide precursor families and several predicted crinoid neuropeptide precursors. Using A. mediterranea as an experimental model, we have investigated the expression of selected neuropeptides in larvae (doliolaria), post-metamorphic pentacrinoids and adults, providing new insights into the cellular architecture of crinoid nervous systems. Thus, using mRNA in situ hybridization F-type SALMFamide precursor transcripts were revealed in a previously undescribed population of peptidergic cells located dorso-laterally in doliolaria. Furthermore, using immunohistochemistry a calcitonin-type neuropeptide was revealed in the aboral nerve center, circumoral nerve ring and oral tube feet in pentacrinoids and in the ectoneural and entoneural compartments of the nervous system in adults. Moreover, functional analysis of a vasopressin/oxytocin-type neuropeptide (crinotocin), which is expressed in the brachial nerve of the arms in A. mediterranea, revealed that this peptide causes a dose-dependent change in the mechanical behavior of arm preparations in vitro-the first reported biological action of a neuropeptide in a crinoid. In conclusion, our findings provide new perspectives on neuropeptide signaling in echinoderms and the foundations for further exploration of neuropeptide expression/function in crinoids as a sister clade to eleutherozoan echinoderms.
Collapse
Affiliation(s)
- Alessandra Aleotti
- Department of Environmental Science and Policy, University of Milan, Milan, Italy,School of Biological & Behavioural Sciences, Queen Mary University of London, London, United Kingdom
| | - Iain C. Wilkie
- Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, Glasgow, United Kingdom
| | - Luis A. Yañez-Guerra
- School of Biological & Behavioural Sciences, Queen Mary University of London, London, United Kingdom
| | - Giacomo Gattoni
- Department of Environmental Science and Policy, University of Milan, Milan, Italy,School of Biological & Behavioural Sciences, Queen Mary University of London, London, United Kingdom
| | - Tahshin A. Rahman
- School of Biological & Behavioural Sciences, Queen Mary University of London, London, United Kingdom
| | - Richard F. Wademan
- School of Biological & Behavioural Sciences, Queen Mary University of London, London, United Kingdom
| | - Zakaryya Ahmad
- School of Biological & Behavioural Sciences, Queen Mary University of London, London, United Kingdom
| | - Deyana A. Ivanova
- School of Biological & Behavioural Sciences, Queen Mary University of London, London, United Kingdom
| | - Dean C. Semmens
- School of Biological & Behavioural Sciences, Queen Mary University of London, London, United Kingdom
| | - Jérôme Delroisse
- School of Biological & Behavioural Sciences, Queen Mary University of London, London, United Kingdom
| | - Weigang Cai
- School of Biological & Behavioural Sciences, Queen Mary University of London, London, United Kingdom
| | - Esther Odekunle
- School of Biological & Behavioural Sciences, Queen Mary University of London, London, United Kingdom
| | - Michaela Egertová
- School of Biological & Behavioural Sciences, Queen Mary University of London, London, United Kingdom
| | - Cinzia Ferrario
- Department of Environmental Science and Policy, University of Milan, Milan, Italy
| | - Michela Sugni
- Department of Environmental Science and Policy, University of Milan, Milan, Italy
| | - Francesco Bonasoro
- Department of Environmental Science and Policy, University of Milan, Milan, Italy
| | - Maurice R. Elphick
- School of Biological & Behavioural Sciences, Queen Mary University of London, London, United Kingdom,*Correspondence: Maurice R. Elphick,
| |
Collapse
|
9
|
Anapindi KDB, Romanova EV, Checco JW, Sweedler JV. Mass Spectrometry Approaches Empowering Neuropeptide Discovery and Therapeutics. Pharmacol Rev 2022; 74:662-679. [PMID: 35710134 DOI: 10.1124/pharmrev.121.000423] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The discovery of insulin in the early 1900s ushered in the era of research related to peptides acting as hormones and neuromodulators, among other regulatory roles. These essential gene products are found in all organisms, from the most primitive to the most evolved, and carry important biologic information that coordinates complex physiology and behavior; their misregulation has been implicated in a variety of diseases. The evolutionary origins of at least 30 neuropeptide signaling systems have been traced to the common ancestor of protostomes and deuterostomes. With the use of relevant animal models and modern technologies, we can gain mechanistic insight into orthologous and paralogous endogenous peptides and translate that knowledge into medically relevant insights and new treatments. Groundbreaking advances in medicine and basic science influence how signaling peptides are defined today. The precise mechanistic pathways for over 100 endogenous peptides in mammals are now known and have laid the foundation for multiple drug development pipelines. Peptide biologics have become valuable drugs due to their unique specificity and biologic activity, lack of toxic metabolites, and minimal undesirable interactions. This review outlines modern technologies that enable neuropeptide discovery and characterization, and highlights lessons from nature made possible by neuropeptide research in relevant animal models that is being adopted by the pharmaceutical industry. We conclude with a brief overview of approaches/strategies for effective development of peptides as drugs. SIGNIFICANCE STATEMENT: Neuropeptides, an important class of cell-cell signaling molecules, are involved in maintaining a range of physiological functions. Since the discovery of insulin's activity, over 100 bioactive peptides and peptide analogs have been used as therapeutics. Because these are complex molecules not easily predicted from a genome and their activity can change with subtle chemical modifications, mass spectrometry (MS) has significantly empowered peptide discovery and characterization. This review highlights contributions of MS-based research towards the development of therapeutic peptides.
Collapse
Affiliation(s)
- Krishna D B Anapindi
- Department of Chemistry and the Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois (K.D.B.A., E.V.R., J.V.S.) and Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska (J.W.C.)
| | - Elena V Romanova
- Department of Chemistry and the Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois (K.D.B.A., E.V.R., J.V.S.) and Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska (J.W.C.)
| | - James W Checco
- Department of Chemistry and the Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois (K.D.B.A., E.V.R., J.V.S.) and Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska (J.W.C.)
| | - Jonathan V Sweedler
- Department of Chemistry and the Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois (K.D.B.A., E.V.R., J.V.S.) and Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska (J.W.C.)
| |
Collapse
|
10
|
Brandes N, Ofer D, Peleg Y, Rappoport N, Linial M. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics 2022; 38:2102-2110. [PMID: 35020807 PMCID: PMC9386727 DOI: 10.1093/bioinformatics/btac020] [Citation(s) in RCA: 202] [Impact Index Per Article: 67.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 12/27/2021] [Accepted: 01/07/2022] [Indexed: 02/03/2023] Open
Abstract
SUMMARY Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme combines language modeling with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to long sequences. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains near state-of-the-art performance, and sometimes exceeds it, on multiple benchmarks covering diverse protein properties (including protein structure, post-translational modifications and biophysical attributes), despite using a far smaller and faster model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data. AVAILABILITY AND IMPLEMENTATION Code and pretrained model weights are available at https://github.com/nadavbra/protein_bert. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nadav Brandes
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Dan Ofer
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Yam Peleg
- Deep Trading Ltd., Haifa 3508401, Israel
| | - Nadav Rappoport
- Department of Software and Information Systems Engineering, Faculty of Engineering Sciences, Ben-Gurion University of the Negev, Beer Sheva 8410501, Israel
| | - Michal Linial
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| |
Collapse
|
11
|
Neuropeptide repertoire and 3D anatomy of the ctenophore nervous system. Curr Biol 2021; 31:5274-5285.e6. [PMID: 34587474 DOI: 10.1016/j.cub.2021.09.005] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Revised: 07/20/2021] [Accepted: 09/02/2021] [Indexed: 11/24/2022]
Abstract
Ctenophores are gelatinous marine animals famous for locomotion by ciliary combs. Due to the uncertainties of the phylogenetic placement of ctenophores and the absence of some key bilaterian neuronal genes, it has been hypothesized that their neurons evolved independently. Additionally, recent whole-body, single-cell RNA sequencing (scRNA-seq) analysis failed to identify ctenophore neurons using any of the known neuronal molecular markers. To reveal the molecular machinery of ctenophore neurons, we have characterized the neuropeptide repertoire of the ctenophore Mnemiopsis leidyi. Using the machine learning NeuroPID tool, we predicted 129 new putative neuropeptide precursors. Sixteen of them were localized to the subepithelial nerve net (SNN), sensory aboral organ (AO), and epithelial sensory cells (ESCs), providing evidence that they are neuropeptide precursors. Four of these putative neuropeptides had a behavioral effect and increased the animals' swimming speed. Intriguingly, these putative neuropeptides finally allowed us to identify neuronal cell types in single-cell transcriptomic data and reveal the molecular identity of ctenophore neurons. High-resolution electron microscopy and 3D reconstructions of the nerve net underlying the comb plates confirmed a more than 100-year-old hypothesis of anastomoses between neurites of the same cell in ctenophores and revealed that they occur through a continuous membrane. Our work demonstrates the unique ultrastructure of the peptidergic nerve net and a rich neuropeptide repertoire of ctenophores, supporting the hypothesis that the first nervous system(s) evolved as nets of peptidergic cells.
Collapse
|
12
|
Jiang M, Zhao B, Luo S, Wang Q, Chu Y, Chen T, Mao X, Liu Y, Wang Y, Jiang X, Wei DQ, Xiong Y. NeuroPpred-Fuse: an interpretable stacking model for prediction of neuropeptides by fusing sequence information and feature selection methods. Brief Bioinform 2021; 22:6350884. [PMID: 34396388 DOI: 10.1093/bib/bbab310] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 07/01/2021] [Accepted: 07/18/2021] [Indexed: 12/13/2022] Open
Abstract
Neuropeptides acting as signaling molecules in the nervous system of various animals play crucial roles in a wide range of physiological functions and hormone regulation behaviors. Neuropeptides offer many opportunities for the discovery of new drugs and targets for the treatment of neurological diseases. In recent years, there have been several data-driven computational predictors of various types of bioactive peptides, but the relevant work about neuropeptides is little at present. In this work, we developed an interpretable stacking model, named NeuroPpred-Fuse, for the prediction of neuropeptides through fusing a variety of sequence-derived features and feature selection methods. Specifically, we used six types of sequence-derived features to encode the peptide sequences and then combined them. In the first layer, we ensembled three base classifiers and four feature selection algorithms, which select non-redundant important features complementarily. In the second layer, the output of the first layer was merged and fed into logistic regression (LR) classifier to train the model. Moreover, we analyzed the selected features and explained the feasibility of the selected features. Experimental results show that our model achieved 90.6% accuracy and 95.8% AUC on the independent test set, outperforming the state-of-the-art models. In addition, we exhibited the distribution of selected features by these tree models and compared the results on the training set to that on the test set. These results fully showed that our model has a certain generalization ability. Therefore, we expect that our model would provide important advances in the discovery of neuropeptides as new drugs for the treatment of neurological diseases.
Collapse
Affiliation(s)
- Mingming Jiang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Bowen Zhao
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shenggan Luo
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Qiankun Wang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Tianhang Chen
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yatong Liu
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xue Jiang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
13
|
Ofer D, Brandes N, Linial M. The language of proteins: NLP, machine learning & protein sequences. Comput Struct Biotechnol J 2021; 19:1750-1758. [PMID: 33897979 PMCID: PMC8050421 DOI: 10.1016/j.csbj.2021.03.022] [Citation(s) in RCA: 123] [Impact Index Per Article: 30.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 03/19/2021] [Accepted: 03/19/2021] [Indexed: 12/12/2022] Open
Abstract
Natural language processing (NLP) is a field of computer science concerned with automated text and language analysis. In recent years, following a series of breakthroughs in deep and machine learning, NLP methods have shown overwhelming progress. Here, we review the success, promise and pitfalls of applying NLP algorithms to the study of proteins. Proteins, which can be represented as strings of amino-acid letters, are a natural fit to many NLP methods. We explore the conceptual similarities and differences between proteins and language, and review a range of protein-related tasks amenable to machine learning. We present methods for encoding the information of proteins as text and analyzing it with NLP methods, reviewing classic concepts such as bag-of-words, k-mers/n-grams and text search, as well as modern techniques such as word embedding, contextualized embedding, deep learning and neural language models. In particular, we focus on recent innovations such as masked language modeling, self-supervised learning and attention-based models. Finally, we discuss trends and challenges in the intersection of NLP and protein research.
Collapse
Affiliation(s)
| | - Nadav Brandes
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
14
|
Alvarado-Delgado A, Martínez-Barnetche J, Téllez-Sosa J, Rodríguez MH, Gutiérrez-Millán E, Zumaya-Estrada FA, Saldaña-Navor V, Rodríguez MC, Tello-López Á, Lanz-Mendoza H. Prediction of neuropeptide precursors and differential expression of adipokinetic hormone/corazonin-related peptide, hugin and corazonin in the brain of malaria vector Nyssorhynchus albimanus during a Plasmodium berghei infection. CURRENT RESEARCH IN INSECT SCIENCE 2021; 1:100014. [PMID: 36003598 PMCID: PMC9387463 DOI: 10.1016/j.cris.2021.100014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 03/18/2021] [Accepted: 03/22/2021] [Indexed: 12/02/2022]
Abstract
We describe precursors that predicted at least sixty neuropeptides in Ny. albimanus. At least 16 precursors are encoded in the Ny. albimanus brain. Myosuppressin neuropeptide precursor was identified in Ny albimanus. acp and hugin transcripts increased in Ny. albimanus brains infected with P. berghei.
Insect neuropeptides, play a central role in the control of many physiological processes. Based on an analysis of Nyssorhynchus albimanus brain transcriptome a neuropeptide precursor database of the mosquito was described. Also, we observed that adipokinetic hormone/corazonin-related peptide (ACP), hugin and corazonin encoding genes were differentially expressed during Plasmodium infection. Transcriptomic data from Ny. albimanus brain identified 29 pre-propeptides deduced from the sequences that allowed the prediction of at least 60 neuropeptides. The predicted peptides include isoforms of allatostatin C, orcokinin, corazonin, adipokinetic hormone (AKH), SIFamide, capa, hugin, pigment-dispersing factor, adipokinetic hormone/corazonin-related peptide (ACP), tachykinin-related peptide, trissin, neuropeptide F, diuretic hormone 31, bursicon, crustacean cardioactive peptide (CCAP), allatotropin, allatostatin A, ecdysis triggering hormone (ETH), diuretic hormone 44 (Dh44), insulin-like peptides (ILPs) and eclosion hormone (EH). The analysis of the genome of An. albimanus and the generated transcriptome, provided evidence for the identification of myosuppressin neuropeptide precursor. A quantitative analysis documented increased expression of precursors encoding ACP peptide, hugin and corazonin in the mosquito brain after Plasmodium berghei infection. This work represents an initial effort to characterize the neuropeptide precursors repertoire of Ny. albimanus and provides information for understanding neuroregulation of the mosquito response during Plasmodium infection.
Collapse
|
15
|
Bin Y, Zhang W, Tang W, Dai R, Li M, Zhu Q, Xia J. Prediction of Neuropeptides from Sequence Information Using Ensemble Classifier and Hybrid Features. J Proteome Res 2020; 19:3732-3740. [DOI: 10.1021/acs.jproteome.0c00276] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Yannan Bin
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
- School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| | - Wei Zhang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
| | - Wending Tang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
| | - Ruyu Dai
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
| | - Menglu Li
- School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| | - Qizhi Zhu
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
| | - Junfeng Xia
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
- School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| |
Collapse
|
16
|
NeuroPIpred: a tool to predict, design and scan insect neuropeptides. Sci Rep 2019; 9:5129. [PMID: 30914676 PMCID: PMC6435694 DOI: 10.1038/s41598-019-41538-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 03/05/2019] [Indexed: 12/15/2022] Open
Abstract
Insect neuropeptides and their associated receptors have been one of the potential targets for the pest control. The present study describes in silico models developed using natural and modified insect neuropeptides for predicting and designing new neuropeptides. Amino acid composition analysis revealed the preference of residues C, D, E, F, G, N, S, and Y in insect neuropeptides The positional residue preference analysis show that in natural neuropeptides residues like A, N, F, D, P, S, and I are preferred at N terminus and residues like L, R, P, F, N, and G are preferred at C terminus. Prediction models were developed using input features like amino acid and dipeptide composition, binary profiles and implementing different machine learning techniques. Dipeptide composition based SVM model performed best among all the models. In case of NeuroPIpred_DS1, model achieved an accuracy of 86.50% accuracy and 0.73 MCC on training dataset and 83.71% accuracy and 0.67 MCC on validation dataset whereas in case of NeuroPIpred_DS2, model achieved 97.47% accuracy and 0.95 MCC on training dataset and 97.93% accuracy and 0.96 MCC on validation dataset. In order to assist researchers, we created standalone and user friendly web server NeuroPIpred, available at (https://webs.iiitd.edu.in/raghava/neuropipred.)
Collapse
|
17
|
Kang J, Fang Y, Yao P, Li N, Tang Q, Huang J. NeuroPP: A Tool for the Prediction of Neuropeptide Precursors Based on Optimal Sequence Composition. Interdiscip Sci 2018. [DOI: 10.1007/s12539-018-0287-2] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
18
|
Linial M, Rappoport N, Ofer D. Overlooked Short Toxin-Like Proteins: A Shortcut to Drug Design. Toxins (Basel) 2017; 9:E350. [PMID: 29109389 PMCID: PMC5705965 DOI: 10.3390/toxins9110350] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Revised: 10/22/2017] [Accepted: 10/25/2017] [Indexed: 12/22/2022] Open
Abstract
Short stable peptides have huge potential for novel therapies and biosimilars. Cysteine-rich short proteins are characterized by multiple disulfide bridges in a compact structure. Many of these metazoan proteins are processed, folded, and secreted as soluble stable folds. These properties are shared by both marine and terrestrial animal toxins. These stable short proteins are promising sources for new drug development. We developed ClanTox (classifier of animal toxins) to identify toxin-like proteins (TOLIPs) using machine learning models trained on a large-scale proteomic database. Insects proteomes provide a rich source for protein innovations. Therefore, we seek overlooked toxin-like proteins from insects (coined iTOLIPs). Out of 4180 short (<75 amino acids) secreted proteins, 379 were predicted as iTOLIPs with high confidence, with as many as 30% of the genes marked as uncharacterized. Based on bioinformatics, structure modeling, and data-mining methods, we found that the most significant group of predicted iTOLIPs carry antimicrobial activity. Among the top predicted sequences were 120 termicin genes from termites with antifungal properties. Structural variations of insect antimicrobial peptides illustrate the similarity to a short version of the defensin fold with antifungal specificity. We also identified 9 proteins that strongly resemble ion channel inhibitors from scorpion and conus toxins. Furthermore, we assigned functional fold to numerous uncharacterized iTOLIPs. We conclude that a systematic approach for finding iTOLIPs provides a rich source of peptides for drug design and innovative therapeutic discoveries.
Collapse
Affiliation(s)
- Michal Linial
- Department of Biological Chemistry, Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel.
| | - Nadav Rappoport
- Institute for Computational Health Sciences, UCSF, San Francisco, CA 94158, USA.
| | - Dan Ofer
- Department of Biological Chemistry, Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel.
| |
Collapse
|
19
|
Abruzzi KC, Zadina A, Luo W, Wiyanto E, Rahman R, Guo F, Shafer O, Rosbash M. RNA-seq analysis of Drosophila clock and non-clock neurons reveals neuron-specific cycling and novel candidate neuropeptides. PLoS Genet 2017; 13:e1006613. [PMID: 28182648 PMCID: PMC5325595 DOI: 10.1371/journal.pgen.1006613] [Citation(s) in RCA: 96] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2016] [Revised: 02/24/2017] [Accepted: 02/01/2017] [Indexed: 12/21/2022] Open
Abstract
Locomotor activity rhythms are controlled by a network of ~150 circadian neurons within the adult Drosophila brain. They are subdivided based on their anatomical locations and properties. We profiled transcripts “around the clock” from three key groups of circadian neurons with different functions. We also profiled a non-circadian outgroup, dopaminergic (TH) neurons. They have cycling transcripts but fewer than clock neurons as well as low expression and poor cycling of clock gene transcripts. This suggests that TH neurons do not have a canonical circadian clock and that their gene expression cycling is driven by brain systemic cues. The three circadian groups are surprisingly diverse in their cycling transcripts and overall gene expression patterns, which include known and putative novel neuropeptides. Even the overall phase distributions of cycling transcripts are distinct, indicating that different regulatory principles govern transcript oscillations. This surprising cell-type diversity parallels the functional heterogeneity of the different neurons. Organisms ranging from bacteria to humans contain circadian clocks. They keep internal time and also integrate environmental cues such as light to provide external time information for entrainment. In the fruit fly Drosophila melanogaster, ~150 brain neurons contain the circadian machinery and are critical for controlling behavior. Several subgroups of these clock neurons have been identified by their anatomical locations and specific functions. Our work aims to profile these neurons and to characterize their molecular contents: what to they contain and how do they differ? To this end, we have purified 3 important subgroups of clock neurons and identified their expressed genes at different times of day. Some are expressed at all times, whereas others are “cycling,” i.e., expressed more strongly at a particular time of day like the morning. Interestingly, each circadian subgroup is quite different. The data provide hints about what functions each group of neurons carries out and how they may work together to keep time. In addition, even a non-circadian group of neurons has cycling genes and has implications for the extent to which all cells have or do not have a functional circadian clock.
Collapse
Affiliation(s)
- Katharine C. Abruzzi
- Howard Hughes Medical Institute and National Center for Behavioral Genomics,Department of Biology, Brandeis University, Waltham, United States of America
| | - Abigail Zadina
- Howard Hughes Medical Institute and National Center for Behavioral Genomics,Department of Biology, Brandeis University, Waltham, United States of America
| | - Weifei Luo
- Howard Hughes Medical Institute and National Center for Behavioral Genomics,Department of Biology, Brandeis University, Waltham, United States of America
| | - Evelyn Wiyanto
- Howard Hughes Medical Institute and National Center for Behavioral Genomics,Department of Biology, Brandeis University, Waltham, United States of America
| | - Reazur Rahman
- Howard Hughes Medical Institute and National Center for Behavioral Genomics,Department of Biology, Brandeis University, Waltham, United States of America
| | - Fang Guo
- Howard Hughes Medical Institute and National Center for Behavioral Genomics,Department of Biology, Brandeis University, Waltham, United States of America
| | - Orie Shafer
- Howard Hughes Medical Institute and National Center for Behavioral Genomics,Department of Biology, Brandeis University, Waltham, United States of America
| | - Michael Rosbash
- Howard Hughes Medical Institute and National Center for Behavioral Genomics,Department of Biology, Brandeis University, Waltham, United States of America
- * E-mail:
| |
Collapse
|
20
|
Brandes N, Ofer D, Linial M. ASAP: a machine learning framework for local protein properties. Database (Oxford) 2016; 2016:baw133. [PMID: 27694209 PMCID: PMC5045867 DOI: 10.1093/database/baw133] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Revised: 08/08/2016] [Accepted: 08/28/2016] [Indexed: 11/14/2022]
Abstract
Determining residue-level protein properties, such as sites of post-translational modifications (PTMs), is vital to understanding protein function. Experimental methods are costly and time-consuming, while traditional rule-based computational methods fail to annotate sites lacking substantial similarity. Machine Learning (ML) methods are becoming fundamental in annotating unknown proteins and their heterogeneous properties. We present ASAP (Amino-acid Sequence Annotation Prediction), a universal ML framework for predicting residue-level properties. ASAP extracts numerous features from raw sequences, and supports easy integration of external features such as secondary structure, solvent accessibility, intrinsically disorder or PSSM profiles. Features are then used to train ML classifiers. ASAP can create new classifiers within minutes for a variety of tasks, including PTM prediction (e.g. cleavage sites by convertase, phosphoserine modification). We present a detailed case study for ASAP: CleavePred, an ASAP-based model to predict protein precursor cleavage sites, with state-of-the-art results. Protein cleavage is a PTM shared by a wide variety of proteins sharing minimal sequence similarity. Current rule-based methods suffer from high false positive rates, making them suboptimal. The high performance of CleavePred makes it suitable for analyzing new proteomes at a genomic scale. The tool is attractive to protein design, mass spectrometry search engines and the discovery of new bioactive peptides from precursors. ASAP functions as a baseline approach for residue-level protein sequence prediction. CleavePred is freely accessible as a web-based application. Both ASAP and CleavePred are open-source with a flexible Python API.Database URL: ASAP's and CleavePred source code, webtool and tutorials are available at: https://github.com/ddofer/asap; http://protonet.cs.huji.ac.il/cleavepred.
Collapse
Affiliation(s)
- Nadav Brandes
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| | - Dan Ofer
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| | - Michal Linial
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| |
Collapse
|
21
|
Li L, Li J, Xiao W, Li Y, Qin Y, Zhou S, Yang H. Prediction the Substrate Specificities of Membrane Transport Proteins Based on Support Vector Machine and Hybrid Features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:947-953. [PMID: 26571537 DOI: 10.1109/tcbb.2015.2495140] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
22
|
Koziol U, Koziol M, Preza M, Costábile A, Brehm K, Castillo E. De novo discovery of neuropeptides in the genomes of parasitic flatworms using a novel comparative approach. Int J Parasitol 2016; 46:709-21. [PMID: 27388856 DOI: 10.1016/j.ijpara.2016.05.007] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Revised: 05/18/2016] [Accepted: 05/20/2016] [Indexed: 12/11/2022]
Abstract
Neuropeptide mediated signalling is an ancient mechanism found in almost all animals and has been proposed as a promising target for the development of novel drugs against helminths. However, identification of neuropeptides from genomic data is challenging, and knowledge of the neuropeptide complement of parasitic flatworms is still fragmentary. In this work, we have developed an evolution-based strategy for the de novo discovery of neuropeptide precursors, based on the detection of localised sequence conservation between possible prohormone convertase cleavage sites. The method detected known neuropeptide precursors with good precision and specificity in the models Drosophila melanogaster and Caenorhabditis elegans. Furthermore, it identified novel putative neuropeptide precursors in nematodes, including the first description of allatotropin homologues in this phylum. Our search for neuropeptide precursors in the genomes of parasitic flatworms resulted in the description of 34 conserved neuropeptide precursor families, including 13 new ones, and of hundreds of new homologues of known neuropeptide precursor families. Most neuropeptide precursor families show a wide phylogenetic distribution among parasitic flatworms and show little similarity to neuropeptide precursors of other bilaterian animals. However, we could also find orthologs of some conserved bilaterian neuropeptides including pyrokinin, crustacean cardioactive peptide, myomodulin, neuropeptide-Y, neuropeptide KY and SIF-amide. Finally, we determined the expression patterns of seven putative neuropeptide precursor genes in the protoscolex of Echinococcus multilocularis. All genes were expressed in the nervous system with different patterns, indicating a hidden complexity of peptidergic signalling in cestodes.
Collapse
Affiliation(s)
- Uriel Koziol
- Sección Bioquímica, Facultad de Ciencias, Universidad de la República, Iguá 4225, CP11400 Montevideo, Uruguay.
| | - Miguel Koziol
- Sección Bioquímica, Facultad de Ciencias, Universidad de la República, Iguá 4225, CP11400 Montevideo, Uruguay
| | - Matías Preza
- Sección Bioquímica, Facultad de Ciencias, Universidad de la República, Iguá 4225, CP11400 Montevideo, Uruguay
| | - Alicia Costábile
- Sección Bioquímica, Facultad de Ciencias, Universidad de la República, Iguá 4225, CP11400 Montevideo, Uruguay
| | - Klaus Brehm
- University of Würzburg, Institute for Hygiene and Microbiology, Josef-Schneider-Straße 2 / Bau E1, 97080 Würzburg, Germany
| | - Estela Castillo
- Sección Bioquímica, Facultad de Ciencias, Universidad de la República, Iguá 4225, CP11400 Montevideo, Uruguay
| |
Collapse
|
23
|
Li L, Luo Q, Xiao W, Li J, Zhou S, Li Y, Zheng X, Yang H. A machine-learning approach for predicting palmitoylation sites from integrated sequence-based features. J Bioinform Comput Biol 2016; 15:1650025. [PMID: 27411307 DOI: 10.1142/s0219720016500256] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Palmitoylation is the covalent attachment of lipids to amino acid residues in proteins. As an important form of protein posttranslational modification, it increases the hydrophobicity of proteins, which contributes to the protein transportation, organelle localization, and functions, therefore plays an important role in a variety of cell biological processes. Identification of palmitoylation sites is necessary for understanding protein-protein interaction, protein stability, and activity. Since conventional experimental techniques to determine palmitoylation sites in proteins are both labor intensive and costly, a fast and accurate computational approach to predict palmitoylation sites from protein sequences is in urgent need. In this study, a support vector machine (SVM)-based method was proposed through integrating PSI-BLAST profile, physicochemical properties, [Formula: see text]-mer amino acid compositions (AACs), and [Formula: see text]-mer pseudo AACs into the principal feature vector. A recursive feature selection scheme was subsequently implemented to single out the most discriminative features. Finally, an SVM method was implemented to predict palmitoylation sites in proteins based on the optimal features. The proposed method achieved an accuracy of 99.41% and Matthews Correlation Coefficient of 0.9773 for a benchmark dataset. The result indicates the efficiency and accuracy of our method in prediction of palmitoylation sites based on protein sequences.
Collapse
Affiliation(s)
- Liqi Li
- * Department of General Surgery, Xinqiao Hospital, Third Military Medical University, Chongqing 400037, China
| | - Qifa Luo
- * Department of General Surgery, Xinqiao Hospital, Third Military Medical University, Chongqing 400037, China
| | - Weidong Xiao
- * Department of General Surgery, Xinqiao Hospital, Third Military Medical University, Chongqing 400037, China
| | - Jinhui Li
- * Department of General Surgery, Xinqiao Hospital, Third Military Medical University, Chongqing 400037, China
| | - Shiwen Zhou
- † National Drug Clinical Trial Institution, Xinqiao Hospital, Third Military Medical University, Chongqing 400037, China
| | - Yongsheng Li
- ‡ Institute of Cancer, Xinqiao Hospital, Third Military Medical University, Chongqing 400037, China
| | - Xiaoqi Zheng
- § Department of Mathematics, Shanghai Normal University, Shanghai 200234, China
| | - Hua Yang
- * Department of General Surgery, Xinqiao Hospital, Third Military Medical University, Chongqing 400037, China
| |
Collapse
|
24
|
Ofer D, Linial M. ProFET: Feature engineering captures high-level protein functions. Bioinformatics 2015; 31:3429-36. [DOI: 10.1093/bioinformatics/btv345] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 05/29/2015] [Indexed: 11/13/2022] Open
|
25
|
Buchberger A, Yu Q, Li L. Advances in Mass Spectrometric Tools for Probing Neuropeptides. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2015; 8:485-509. [PMID: 26070718 PMCID: PMC6314846 DOI: 10.1146/annurev-anchem-071114-040210] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Neuropeptides are important mediators in the functionality of the brain and other neurological organs. Because neuropeptides exist in a wide range of concentrations, appropriate characterization methods are needed to provide dynamic, chemical, and spatial information. Mass spectrometry and compatible tools have been a popular choice in analyzing neuropeptides. There have been several advances and challenges, both of which are the focus of this review. Discussions range from sample collection to bioinformatic tools, although avenues such as quantitation and imaging are included. Further development of the presented methods for neuropeptidomic mass spectrometric analysis is inevitable, which will lead to a further understanding of the complex interplay of neuropeptides and other signaling molecules in the nervous system.
Collapse
Affiliation(s)
- Amanda Buchberger
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706-1322;
| | - Qing Yu
- School of Pharmacy, University of Wisconsin-Madison, Madison, Wisconsin 53705-2222;
| | - Lingjun Li
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706-1322;
- School of Pharmacy, University of Wisconsin-Madison, Madison, Wisconsin 53705-2222;
| |
Collapse
|
26
|
Li L, Yu S, Xiao W, Li Y, Huang L, Zheng X, Zhou S, Yang H. Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM. BMC Bioinformatics 2014; 15:340. [PMID: 25409550 PMCID: PMC4289199 DOI: 10.1186/1471-2105-15-340] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 09/29/2014] [Indexed: 02/08/2023] Open
Abstract
Background Identification of the recombination hot/cold spots is critical for understanding the mechanism of recombination as well as the genome evolution process. However, experimental identification of recombination spots is both time-consuming and costly. Developing an accurate and automated method for reliably and quickly identifying recombination spots is thus urgently needed. Results Here we proposed a novel approach by fusing features from pseudo nucleic acid composition (PseNAC), including NAC, n-tier NAC and pseudo dinucleotide composition (PseDNC). A recursive feature extraction by linear kernel support vector machine (SVM) was then used to rank the integrated feature vectors and extract optimal features. SVM was adopted for identifying recombination spots based on these optimal features. To evaluate the performance of the proposed method, jackknife cross-validation test was employed on a benchmark dataset. The overall accuracy of this approach was 84.09%, which was higher (from 0.37% to 3.79%) than those of state-of-the-art tools. Conclusions Comparison results suggested that linear kernel SVM is a useful vehicle for identifying recombination hot/cold spots.
Collapse
Affiliation(s)
| | | | | | | | | | - Xiaoqi Zheng
- Department of General Surgery, Xinqiao Hospital, Third Military Medical University, Chongqing 400037, China.
| | | | | |
Collapse
|
27
|
Karsenty S, Rappoport N, Ofer D, Zair A, Linial M. NeuroPID: a classifier of neuropeptide precursors. Nucleic Acids Res 2014; 42:W182-6. [PMID: 24792159 PMCID: PMC4086121 DOI: 10.1093/nar/gku363] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Neuropeptides (NPs) are short secreted peptides produced in neurons. NPs act by activating signaling cascades governing broad functions such as metabolism, sensation and behavior throughout the animal kingdom. NPs are the products of multistep processing of longer proteins, the NP precursors (NPPs). We present NeuroPID (Neuropeptide Precursor Identifier), an online machine-learning tool that identifies metazoan NPPs. NeuroPID was trained on 1418 NPPs annotated as such by UniProtKB. A large number of sequence-based features were extracted for each sequence with the goal of capturing the biophysical and informational-statistical properties that distinguish NPPs from other proteins. Training several machine-learning models, including support vector machines and ensemble decision trees, led to high accuracy (89–94%) and precision (90–93%) in cross-validation tests. For inputs of thousands of unseen sequences, the tool provides a ranked list of high quality predictions based on the results of four machine-learning classifiers. The output reveals many uncharacterized NPPs and secreted cell modulators that are rich in potential cleavage sites. NeuroPID is a discovery and a prediction tool that can be used to identify NPPs from unannotated transcriptomes and mass spectrometry experiments. NeuroPID predicted sequences are attractive targets for investigating behavior, physiology and cell modulation. The NeuroPID web tool is available at http:// neuropid.cs.huji.ac.il.
Collapse
Affiliation(s)
- Solange Karsenty
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel School of Computer Science, Hadassah Academic College, Jerusalem, Israel
| | - Nadav Rappoport
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Dan Ofer
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Sudarsky Center for Computational Biology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Adva Zair
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Michal Linial
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Sudarsky Center for Computational Biology, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|