1
|
van Wijk KJ, Leppert T, Sun Z, Guzchenko I, Debley E, Sauermann G, Routray P, Mendoza L, Sun Q, Deutsch EW. The Zea mays PeptideAtlas: A New Maize Community Resource. J Proteome Res 2024; 23:3984-4004. [PMID: 39101213 DOI: 10.1021/acs.jproteome.4c00320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/06/2024]
Abstract
This study presents the Maize PeptideAtlas resource (www.peptideatlas.org/builds/maize) to help solve questions about the maize proteome. Publicly available raw tandem mass spectrometry (MS/MS) data for maize collected from ProteomeXchange were reanalyzed through a uniform processing and metadata annotation pipeline. These data are from a wide range of genetic backgrounds and many sample types and experimental conditions. The protein search space included different maize genome annotations for the B73 inbred line from MaizeGDB, UniProtKB, NCBI RefSeq, and for the W22 inbred line. 445 million MS/MS spectra were searched, of which 120 million were matched to 0.37 million distinct peptides. Peptides were matched to 66.2% of proteins in the most recent B73 nuclear genome annotation. Furthermore, most conserved plastid- and mitochondrial-encoded proteins (NCBI RefSeq annotations) were identified. Peptides and proteins identified in the other B73 genome annotations will improve maize genome annotation. We also illustrate the high-confidence detection of unique W22 proteins. N-terminal acetylation, phosphorylation, ubiquitination, and three lysine acylations (K-acetyl, K-malonyl, and K-hydroxyisobutyryl) were identified and can be inspected through a PTM viewer in PeptideAtlas. All matched MS/MS-derived peptide data are linked to spectral, technical, and biological metadata. This new PeptideAtlas is integrated in MaizeGDB with a peptide track in JBrowse.
Collapse
Affiliation(s)
- Klaas J van Wijk
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Tami Leppert
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Zhi Sun
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Isabell Guzchenko
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Erica Debley
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Georgia Sauermann
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Pratyush Routray
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Luis Mendoza
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Qi Sun
- Computational Biology Service Unit, Cornell University, Ithaca, New York 14853, United States
| | - Eric W Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| |
Collapse
|
2
|
Coelho LP, Santos-Júnior CD, de la Fuente-Nunez C. Challenges in computational discovery of bioactive peptides in 'omics data. Proteomics 2024; 24:e2300105. [PMID: 38458994 DOI: 10.1002/pmic.202300105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 02/06/2024] [Accepted: 02/06/2024] [Indexed: 03/10/2024]
Abstract
Peptides have a plethora of activities in biological systems that can potentially be exploited biotechnologically. Several peptides are used clinically, as well as in industry and agriculture. The increase in available 'omics data has recently provided a large opportunity for mining novel enzymes, biosynthetic gene clusters, and molecules. While these data primarily consist of DNA sequences, other types of data provide important complementary information. Due to their size, the approaches proven successful at discovering novel proteins of canonical size cannot be naïvely applied to the discovery of peptides. Peptides can be encoded directly in the genome as short open reading frames (smORFs), or they can be derived from larger proteins by proteolysis. Both of these peptide classes pose challenges as simple methods for their prediction result in large numbers of false positives. Similarly, functional annotation of larger proteins, traditionally based on sequence similarity to infer orthology and then transferring functions between characterized proteins and uncharacterized ones, cannot be applied for short sequences. The use of these techniques is much more limited and alternative approaches based on machine learning are used instead. Here, we review the limitations of traditional methods as well as the alternative methods that have recently been developed for discovering novel bioactive peptides with a focus on prokaryotic genomes and metagenomes.
Collapse
Affiliation(s)
- Luis Pedro Coelho
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Woolloongabba, Queensland, Australia
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Célio Dias Santos-Júnior
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
- Laboratory of Microbial Processes & Biodiversity - LMPB, Hydrobiology Department, Federal University of São Carlos - UFSCar, São Paulo, Brazil
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
3
|
van Wijk KJ, Leppert T, Sun Z, Kearly A, Li M, Mendoza L, Guzchenko I, Debley E, Sauermann G, Routray P, Malhotra S, Nelson A, Sun Q, Deutsch EW. Detection of the Arabidopsis Proteome and Its Post-translational Modifications and the Nature of the Unobserved (Dark) Proteome in PeptideAtlas. J Proteome Res 2024; 23:185-214. [PMID: 38104260 DOI: 10.1021/acs.jproteome.3c00536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
This study describes a new release of the Arabidopsis thaliana PeptideAtlas proteomics resource (build 2023-10) providing protein sequence coverage, matched mass spectrometry (MS) spectra, selected post-translational modifications (PTMs), and metadata. 70 million MS/MS spectra were matched to the Araport11 annotation, identifying ∼0.6 million unique peptides and 18,267 proteins at the highest confidence level and 3396 lower confidence proteins, together representing 78.6% of the predicted proteome. Additional identified proteins not predicted in Araport11 should be considered for the next Arabidopsis genome annotation. This release identified 5198 phosphorylated proteins, 668 ubiquitinated proteins, 3050 N-terminally acetylated proteins, and 864 lysine-acetylated proteins and mapped their PTM sites. MS support was lacking for 21.4% (5896 proteins) of the predicted Araport11 proteome: the "dark" proteome. This dark proteome is highly enriched for E3 ligases, transcription factors, and for certain (e.g., CLE, IDA, PSY) but not other (e.g., THIONIN, CAP) signaling peptides families. A machine learning model trained on RNA expression data and protein properties predicts the probability that proteins will be detected. The model aids in discovery of proteins with short half-life (e.g., SIG1,3 and ERF-VII TFs) and for developing strategies to identify the missing proteins. PeptideAtlas is linked to TAIR, tracks in JBrowse, and several other community proteomics resources.
Collapse
Affiliation(s)
- Klaas J van Wijk
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Tami Leppert
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Zhi Sun
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Alyssa Kearly
- Boyce Thompson Institute, Ithaca, New York 14853, United States
| | - Margaret Li
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Isabell Guzchenko
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Erica Debley
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Georgia Sauermann
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Pratyush Routray
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, United States
| | - Sagunya Malhotra
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Andrew Nelson
- Boyce Thompson Institute, Ithaca, New York 14853, United States
| | - Qi Sun
- Computational Biology Service Unit, Cornell University, Ithaca, New York 14853, United States
| | - Eric W Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| |
Collapse
|
4
|
Zhao B, Zhao J, Wang M, Guo Y, Mehmood A, Wang W, Xiong Y, Luo S, Wei DQ, Zhao XQ, Wang Y. Exploring microproteins from various model organisms using the mip-mining database. BMC Genomics 2023; 24:661. [PMID: 37919660 PMCID: PMC10623795 DOI: 10.1186/s12864-023-09735-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Accepted: 10/12/2023] [Indexed: 11/04/2023] Open
Abstract
Microproteins, prevalent across all kingdoms of life, play a crucial role in cell physiology and human health. Although global gene transcription is widely explored and abundantly available, our understanding of microprotein functions using transcriptome data is still limited. To mitigate this problem, we present a database, Mip-mining ( https://weilab.sjtu.edu.cn/mipmining/ ), underpinned by high-quality RNA-sequencing data exclusively aimed at analyzing microprotein functions. The Mip-mining hosts 336 sets of high-quality transcriptome data from 8626 samples and nine representative living organisms, including microorganisms, plants, animals, and humans, in our Mip-mining database. Our database specifically provides a focus on a range of diseases and environmental stress conditions, taking into account chemical, physical, biological, and diseases-related stresses. Comparatively, our platform enables customized analysis by inputting desired data sets with self-determined cutoff values. The practicality of Mip-mining is demonstrated by identifying essential microproteins in different species and revealing the importance of ATP15 in the acetic acid stress tolerance of budding yeast. We believe that Mip-mining will facilitate a greater understanding and application of microproteins in biotechnology. Moreover, it will be beneficial for designing therapeutic strategies under various biological conditions.
Collapse
Affiliation(s)
- Bowen Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Jing Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Muyao Wang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yangfan Guo
- Central Laboratory of Yan'an Hospital Affiliated to Kunming Medical University, Kunming, 650051, China
| | - Aamir Mehmood
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Weibin Wang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Shenggan Luo
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nayang, Henan, 473006, China.
- Peng Cheng Laboratory, Vanke Cloud City Phase I Building 8, Xili Street, Nanshan District, Shenzhen, 518055, Guangdong, China.
| | - Xin-Qing Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Engineering Research Center of Cell & Therapeutic Antibody, School of Pharmacy, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
5
|
Dong X, Zhang K, Xun C, Chu T, Liang S, Zeng Y, Liu Z. Small Open Reading Frame-Encoded Micro-Peptides: An Emerging Protein World. Int J Mol Sci 2023; 24:10562. [PMID: 37445739 DOI: 10.3390/ijms241310562] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 06/20/2023] [Accepted: 06/21/2023] [Indexed: 07/15/2023] Open
Abstract
Small open reading frames (sORFs) are often overlooked features in genomes. In the past, they were labeled as noncoding or "transcriptional noise". However, accumulating evidence from recent years suggests that sORFs may be transcribed and translated to produce sORF-encoded polypeptides (SEPs) with less than 100 amino acids. The vigorous development of computational algorithms, ribosome profiling, and peptidome has facilitated the prediction and identification of many new SEPs. These SEPs were revealed to be involved in a wide range of basic biological processes, such as gene expression regulation, embryonic development, cellular metabolism, inflammation, and even carcinogenesis. To effectively understand the potential biological functions of SEPs, we discuss the history and development of the newly emerging research on sORFs and SEPs. In particular, we review a range of recently discovered bioinformatics tools for identifying, predicting, and validating SEPs as well as a variety of biochemical experiments for characterizing SEP functions. Lastly, this review underlines the challenges and future directions in identifying and validating sORFs and their encoded micropeptides, providing a significant reference for upcoming research on sORF-encoded peptides.
Collapse
Affiliation(s)
- Xiaoping Dong
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Kun Zhang
- The State Key Laboratory of Developmental Biology of Freshwater Fish, College of Life Science, Hunan Normal University, Changsha 410081, China
| | - Chengfeng Xun
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Tianqi Chu
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Songping Liang
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Yong Zeng
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
- The State Key Laboratory of Developmental Biology of Freshwater Fish, College of Life Science, Hunan Normal University, Changsha 410081, China
| | - Zhonghua Liu
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| |
Collapse
|
6
|
van Wijk KJ, Leppert T, Sun Z, Kearly A, Li M, Mendoza L, Guzchenko I, Debley E, Sauermann G, Routray P, Malhotra S, Nelson A, Sun Q, Deutsch EW. Mapping the Arabidopsis thaliana proteome in PeptideAtlas and the nature of the unobserved (dark) proteome; strategies towards a complete proteome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.01.543322. [PMID: 37333403 PMCID: PMC10274743 DOI: 10.1101/2023.06.01.543322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
This study describes a new release of the Arabidopsis thaliana PeptideAtlas proteomics resource providing protein sequence coverage, matched mass spectrometry (MS) spectra, selected PTMs, and metadata. 70 million MS/MS spectra were matched to the Araport11 annotation, identifying ∼0.6 million unique peptides and 18267 proteins at the highest confidence level and 3396 lower confidence proteins, together representing 78.6% of the predicted proteome. Additional identified proteins not predicted in Araport11 should be considered for building the next Arabidopsis genome annotation. This release identified 5198 phosphorylated proteins, 668 ubiquitinated proteins, 3050 N-terminally acetylated proteins and 864 lysine-acetylated proteins and mapped their PTM sites. MS support was lacking for 21.4% (5896 proteins) of the predicted Araport11 proteome - the 'dark' proteome. This dark proteome is highly enriched for certain ( e.g. CLE, CEP, IDA, PSY) but not other ( e.g. THIONIN, CAP,) signaling peptides families, E3 ligases, TFs, and other proteins with unfavorable physicochemical properties. A machine learning model trained on RNA expression data and protein properties predicts the probability for proteins to be detected. The model aids in discovery of proteins with short-half life ( e.g. SIG1,3 and ERF-VII TFs) and completing the proteome. PeptideAtlas is linked to TAIR, JBrowse, PPDB, SUBA, UniProtKB and Plant PTM Viewer.
Collapse
|
7
|
Hellinger R, Sigurdsson A, Wu W, Romanova EV, Li L, Sweedler JV, Süssmuth RD, Gruber CW. Peptidomics. NATURE REVIEWS. METHODS PRIMERS 2023; 3:25. [PMID: 37250919 PMCID: PMC7614574 DOI: 10.1038/s43586-023-00205-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/09/2023] [Indexed: 05/31/2023]
Abstract
Peptides are biopolymers, typically consisting of 2-50 amino acids. They are biologically produced by the cellular ribosomal machinery or by non-ribosomal enzymes and, sometimes, other dedicated ligases. Peptides are arranged as linear chains or cycles, and include post-translational modifications, unusual amino acids and stabilizing motifs. Their structure and molecular size render them a unique chemical space, between small molecules and larger proteins. Peptides have important physiological functions as intrinsic signalling molecules, such as neuropeptides and peptide hormones, for cellular or interspecies communication, as toxins to catch prey or as defence molecules to fend off enemies and microorganisms. Clinically, they are gaining popularity as biomarkers or innovative therapeutics; to date there are more than 60 peptide drugs approved and more than 150 in clinical development. The emerging field of peptidomics comprises the comprehensive qualitative and quantitative analysis of the suite of peptides in a biological sample (endogenously produced, or exogenously administered as drugs). Peptidomics employs techniques of genomics, modern proteomics, state-of-the-art analytical chemistry and innovative computational biology, with a specialized set of tools. The complex biological matrices and often low abundance of analytes typically examined in peptidomics experiments require optimized sample preparation and isolation, including in silico analysis. This Primer covers the combination of techniques and workflows needed for peptide discovery and characterization and provides an overview of various biological and clinical applications of peptidomics.
Collapse
Affiliation(s)
- Roland Hellinger
- Center for Physiology and Pharmacology, Medical University of Vienna, Vienna, Austria
| | - Arnar Sigurdsson
- Institut für Chemie, Technische Universität Berlin, Berlin, Germany
| | - Wenxin Wu
- School of Pharmacy and Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Elena V Romanova
- Department of Chemistry, University of Illinois, Urbana, IL, USA
| | - Lingjun Li
- School of Pharmacy and Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Christian W Gruber
- Center for Physiology and Pharmacology, Medical University of Vienna, Vienna, Austria
| |
Collapse
|
8
|
Kobayashi H, Murakami K, Sugano SS, Tamura K, Oka Y, Matsushita T, Shimada T. Comprehensive analysis of peptide-coding genes and initial characterization of an LRR-only microprotein in Marchantia polymorpha. FRONTIERS IN PLANT SCIENCE 2023; 13:1051017. [PMID: 36756228 PMCID: PMC9901580 DOI: 10.3389/fpls.2022.1051017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 12/28/2022] [Indexed: 06/18/2023]
Abstract
In the past two decades, many plant peptides have been found to play crucial roles in various biological events by mediating cell-to-cell communications. However, a large number of small open reading frames (sORFs) or short genes capable of encoding peptides remain uncharacterized. In this study, we examined several candidate genes for peptides conserved between two model plants: Arabidopsis thaliana and Marchantia polymorpha. We examined their expression pattern in M. polymorpha and subcellular localization using a transient assay with Nicotiana benthamiana. We found that one candidate, MpSGF10B, was expressed in meristems, gemma cups, and male reproductive organs called antheridiophores. MpSGF10B has an N-terminal signal peptide followed by two leucine-rich repeat (LRR) domains and was secreted to the extracellular region in N. benthamiana and M. polymorpha. Compared with the wild type, two independent Mpsgf10b mutants had a slightly increased number of antheridiophores. It was revealed in gene ontology enrichment analysis that MpSGF10B was significantly co-expressed with genes related to cell cycle and development. These results suggest that MpSGF10B may be involved in the reproductive development of M. polymorpha. Our research should shed light on the unknown role of LRR-only proteins in land plants.
Collapse
Affiliation(s)
| | | | - Shigeo S. Sugano
- Bioproduction Research Institute, The National Institute of Advanced Industrial Science and Technology, Tsukuba, Ibaraki, Japan
| | - Kentaro Tamura
- Department of Environmental and Life Sciences, University of Shizuoka, Shizuoka-shi, Shizuoka, Japan
| | - Yoshito Oka
- Graduate School of Science, Kyoto University, Kyoto, Japan
| | | | - Tomoo Shimada
- Graduate School of Science, Kyoto University, Kyoto, Japan
| |
Collapse
|
9
|
A Proteomics Data Mining Strategy for the Identification of Quinoa Grain Proteins with Potential Immunonutritional Bioactivities. Foods 2023; 12:foods12020390. [PMID: 36673481 PMCID: PMC9858122 DOI: 10.3390/foods12020390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 01/10/2023] [Accepted: 01/11/2023] [Indexed: 01/18/2023] Open
Abstract
Quinoa proteins are attracting global interest for their wide amino acid profile and as a promising source for the development of biomedical treatments, including those against immune-mediated diseases. However, information about the bioactivity of quinoa proteins is scarce. In this study, a quinoa grain proteome map obtained by label-free mass spectrometry-based shotgun proteomics was investigated for the identification of quinoa grain proteins with potential immunonutritional bioactivities, including those related to cancer. After carefully examining the sequence similarities of the 1211 identified quinoa grain proteins against already described bioactive proteins from other plant organisms, 71, 48, and 3 of them were classified as antimicrobial peptides (AMPs), oxidative stress induced peptides (OSIPs), and serine-type protease inhibitors (STPIs), respectively, suggesting their potential as immunomodulatory, anti-inflammatory, and anticancer agents. In addition, data interpretation using Venn diagrams, heat maps, and scatterplots revealed proteome similarities and differences with respect to the AMPs, OSIPs, and STPIs, and the most relevant bioactive proteins in the predominant commercial quinoa grains (i.e., black, red, white (from Peru), and royal (white from Bolivia)). The presented proteomics data mining strategy allows easy screening for potentially relevant quinoa grain proteins and commercial classes for immunonutrition, as a basis for future bioactivity testing.
Collapse
|
10
|
Álvarez-Urdiola R, Borràs E, Valverde F, Matus JT, Sabidó E, Riechmann JL. Peptidomics Methods Applied to the Study of Flower Development. Methods Mol Biol 2023; 2686:509-536. [PMID: 37540375 DOI: 10.1007/978-1-0716-3299-4_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
Understanding the global and dynamic nature of plant developmental processes requires not only the study of the transcriptome, but also of the proteome, including its largely uncharacterized peptidome fraction. Recent advances in proteomics and high-throughput analyses of translating RNAs (ribosome profiling) have begun to address this issue, evidencing the existence of novel, uncharacterized, and possibly functional peptides. To validate the accumulation in tissues of sORF-encoded polypeptides (SEPs), the basic setup of proteomic analyses (i.e., LC-MS/MS) can be followed. However, the detection of peptides that are small (up to ~100 aa, 6-7 kDa) and novel (i.e., not annotated in reference databases) presents specific challenges that need to be addressed both experimentally and with computational biology resources. Several methods have been developed in recent years to isolate and identify peptides from plant tissues. In this chapter, we outline two different peptide extraction protocols and the subsequent peptide identification by mass spectrometry using the database search or the de novo identification methods.
Collapse
Affiliation(s)
- Raquel Álvarez-Urdiola
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
| | - Eva Borràs
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Federico Valverde
- Institute for Plant Biochemistry and Photosynthesis CSIC - University of Seville, Seville, Spain
| | - José Tomás Matus
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València-CSIC, Paterna, Valencia, Spain
| | - Eduard Sabidó
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - José Luis Riechmann
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
11
|
Sruthi KB, Menon A, P A, Vasudevan Soniya E. Pervasive translation of small open reading frames in plant long non-coding RNAs. FRONTIERS IN PLANT SCIENCE 2022; 13:975938. [PMID: 36352887 PMCID: PMC9638090 DOI: 10.3389/fpls.2022.975938] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 09/29/2022] [Indexed: 06/16/2023]
Abstract
Long non-coding RNAs (lncRNAs) are primarily recognized as non-coding transcripts longer than 200 nucleotides with low coding potential and are present in both eukaryotes and prokaryotes. Recent findings reveal that lncRNAs can code for micropeptides in various species. Micropeptides are generated from small open reading frames (smORFs) and have been discovered frequently in short mRNAs and non-coding RNAs, such as lncRNAs, circular RNAs, and pri-miRNAs. The most accepted definition of a smORF is an ORF containing fewer than 100 codons, and ribosome profiling and mass spectrometry are the most prevalent experimental techniques used to identify them. Although the majority of micropeptides perform critical roles throughout plant developmental processes and stress conditions, only a handful of their functions have been verified to date. Even though more research is being directed toward identifying micropeptides, there is still a dearth of information regarding these peptides in plants. This review outlines the lncRNA-encoded peptides, the evolutionary roles of such peptides in plants, and the techniques used to identify them. It also describes the functions of the pri-miRNA and circRNA-encoded peptides that have been identified in plants.
Collapse
|
12
|
Zhao S, Meng J, Kang Q, Luan Y. Identifying LncRNA-Encoded Short Peptides Using Optimized Hybrid Features and Ensemble Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2873-2881. [PMID: 34383651 DOI: 10.1109/tcbb.2021.3104288] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Long non-coding RNA (lncRNA) contains short open reading frames (sORFs), and sORFs-encoded short peptides (SEPs) have become the focus of scientific studies due to their crucial role in life activities. The identification of SEPs is vital to further understanding their regulatory function. Bioinformatics methods can quickly identify SEPs to provide credible candidate sequences for verifying SEPs by biological experimenrts. However, there is a lack of methods for identifying SEPs directly. In this study, a machine learning method to identify SEPs of plant lncRNA (ISPL) is proposed. Hybrid features including sequence features and physicochemical features are extracted manually or adaptively to construct different modal features. In order to keep the stability of feature selection, the non-linear correction applied in Max-Relevance-Max-Distance (nocRD) feature selection method is proposed, which integrates multiple feature ranking results and uses the iterative random forest for different modal features dimensionality reduction. Classification models with different modal features are constructed, and their outputs are combined for ensemble classification. The experimental results show that the accuracy of ISPL is 89.86% percent on the independent test set, which will have important implications for further studies of functional genomic.
Collapse
|
13
|
Fabre B, Choteau SA, Duboé C, Pichereaux C, Montigny A, Korona D, Deery MJ, Camus M, Brun C, Burlet-Schiltz O, Russell S, Combier JP, Lilley KS, Plaza S. In Depth Exploration of the Alternative Proteome of Drosophila melanogaster. Front Cell Dev Biol 2022; 10:901351. [PMID: 35721519 PMCID: PMC9204603 DOI: 10.3389/fcell.2022.901351] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 04/25/2022] [Indexed: 12/13/2022] Open
Abstract
Recent studies have shown that hundreds of small proteins were occulted when protein-coding genes were annotated. These proteins, called alternative proteins, have failed to be annotated notably due to the short length of their open reading frame (less than 100 codons) or the enforced rule establishing that messenger RNAs (mRNAs) are monocistronic. Several alternative proteins were shown to be biologically active molecules and seem to be involved in a wide range of biological functions. However, genome-wide exploration of the alternative proteome is still limited to a few species. In the present article, we describe a deep peptidomics workflow which enabled the identification of 401 alternative proteins in Drosophila melanogaster. Subcellular localization, protein domains, and short linear motifs were predicted for 235 of the alternative proteins identified and point toward specific functions of these small proteins. Several alternative proteins had approximated abundances higher than their canonical counterparts, suggesting that these alternative proteins are actually the main products of their corresponding genes. Finally, we observed 14 alternative proteins with developmentally regulated expression patterns and 10 induced upon the heat-shock treatment of embryos, demonstrating stage or stress-specific production of alternative proteins.
Collapse
Affiliation(s)
- Bertrand Fabre
- Laboratoire de Recherche en Sciences Végétales, UMR5546, Université de Toulouse, UPS, INP, CNRS, Auzeville-Tolosane, France,Cambridge Centre for Proteomics, Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom,*Correspondence: Bertrand Fabre, ; Serge Plaza,
| | - Sebastien A. Choteau
- Aix-Marseille Université, INSERM, TAGC, Turing Centre for Living Systems, Marseille, France
| | - Carine Duboé
- Laboratoire de Recherche en Sciences Végétales, UMR5546, Université de Toulouse, UPS, INP, CNRS, Auzeville-Tolosane, France
| | - Carole Pichereaux
- Fédération de Recherche (FR3450), Agrobiosciences, Interactions et Biodiversité (AIB), CNRS, Toulouse, France,Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, UPS, Toulouse, France,Infrastructure Nationale de Protéomique, ProFI, FR 2048, Toulouse, France
| | - Audrey Montigny
- Laboratoire de Recherche en Sciences Végétales, UMR5546, Université de Toulouse, UPS, INP, CNRS, Auzeville-Tolosane, France
| | - Dagmara Korona
- Cambridge Systems Biology Centre and Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Michael J. Deery
- Cambridge Centre for Proteomics, Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Mylène Camus
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, UPS, Toulouse, France,Infrastructure Nationale de Protéomique, ProFI, FR 2048, Toulouse, France
| | - Christine Brun
- Aix-Marseille Université, INSERM, TAGC, Turing Centre for Living Systems, Marseille, France,CNRS, Marseille, France
| | - Odile Burlet-Schiltz
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, UPS, Toulouse, France,Infrastructure Nationale de Protéomique, ProFI, FR 2048, Toulouse, France
| | - Steven Russell
- Cambridge Systems Biology Centre and Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Jean-Philippe Combier
- Laboratoire de Recherche en Sciences Végétales, UMR5546, Université de Toulouse, UPS, INP, CNRS, Auzeville-Tolosane, France
| | - Kathryn S. Lilley
- Cambridge Centre for Proteomics, Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Serge Plaza
- Laboratoire de Recherche en Sciences Végétales, UMR5546, Université de Toulouse, UPS, INP, CNRS, Auzeville-Tolosane, France,*Correspondence: Bertrand Fabre, ; Serge Plaza,
| |
Collapse
|
14
|
Zhang Z, Li Y, Yuan W, Wang Z, Wan C. Proteomic-driven identification of short open reading frame-encoded peptides. Proteomics 2022; 22:e2100312. [PMID: 35384297 DOI: 10.1002/pmic.202100312] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/29/2022] [Accepted: 03/30/2022] [Indexed: 11/10/2022]
Abstract
Accumulating evidence has shown that a large number of short open reading frames (sORFs) also have the ability to encode proteins. The discovery of sORFs opens up a new research area, leading to the identification and functional study of sORF encoded peptides (SEPs) at the omics level. Besides bioinformatics prediction and ribosomal profiling, mass spectrometry (MS) has become a significant tool as it directly detects the sequence of SEPs. Though MS-based proteomics methods have proved to be effective for qualitative and quantitative analysis of SEPs, the detection of SEPs is still a great challenge due to their low abundance and short sequence. To illustrate the progress in method development, we described and discussed the main steps of large-scale proteomics identification of SEPs, including SEP extraction and enrichment, MS detection, data processing and quality control, quantification, and function prediction and validation methods. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Zheng Zhang
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Yujie Li
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Wenqian Yuan
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Zhiwei Wang
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Cuihong Wan
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| |
Collapse
|
15
|
Leong AZX, Lee PY, Mohtar MA, Syafruddin SE, Pung YF, Low TY. Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures. J Biomed Sci 2022; 29:19. [PMID: 35300685 PMCID: PMC8928697 DOI: 10.1186/s12929-022-00802-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 03/09/2022] [Indexed: 12/17/2022] Open
Abstract
A short open reading frame (sORFs) constitutes ≤ 300 bases, encoding a microprotein or sORF-encoded protein (SEP) which comprises ≤ 100 amino acids. Traditionally dismissed by genome annotation pipelines as meaningless noise, sORFs were found to possess coding potential with ribosome profiling (RIBO-Seq), which unveiled sORF-based transcripts at various genome locations. Nonetheless, the existence of corresponding microproteins that are stable and functional was little substantiated by experimental evidence initially. With recent advancements in multi-omics, the identification, validation, and functional characterisation of sORFs and microproteins have become feasible. In this review, we discuss the history and development of an emerging research field of sORFs and microproteins. In particular, we focus on an array of bioinformatics and OMICS approaches used for predicting, sequencing, validating, and characterizing these recently discovered entities. These strategies include RIBO-Seq which detects sORF transcripts via ribosome footprints, and mass spectrometry (MS)-based proteomics for sequencing the resultant microproteins. Subsequently, our discussion extends to the functional characterisation of microproteins by incorporating CRISPR/Cas9 screen and protein–protein interaction (PPI) studies. Our review discusses not only detection methodologies, but we also highlight on the challenges and potential solutions in identifying and validating sORFs and their microproteins. The novelty of this review lies within its validation for the functional role of microproteins, which could contribute towards the future landscape of microproteomics.
Collapse
Affiliation(s)
- Alyssa Zi-Xin Leong
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - Pey Yee Lee
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - M Aiman Mohtar
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - Saiful Effendi Syafruddin
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia
| | - Yuh-Fen Pung
- Division of Biomedical Science, School of Pharmacy, University of Nottingham Malaysia, Semenyih, 43500, Selangor, Malaysia
| | - Teck Yew Low
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, 56000, Kuala Lumpur, Malaysia.
| |
Collapse
|
16
|
Jourquin J, Fernandez AI, Parizot B, Xu K, Grunewald W, Mamiya A, Fukaki H, Beeckman T. Two phylogenetically unrelated peptide-receptor modules jointly regulate lateral root initiation via a partially shared signaling pathway in Arabidopsis thaliana. THE NEW PHYTOLOGIST 2022; 233:1780-1796. [PMID: 34913488 PMCID: PMC9302118 DOI: 10.1111/nph.17919] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 12/04/2021] [Indexed: 05/06/2023]
Abstract
Peptide-receptor signaling is an important system for intercellular communication, regulating many developmental processes. A single process can be controlled by several distinct signaling peptides. However, since peptide-receptor modules are usually studied separately, their mechanistic interactions remain largely unexplored. Two phylogenetically unrelated peptide-receptor modules, GLV6/GLV10-RGI and TOLS2/PIP2-RLK7, independently described as inhibitors of lateral root initiation, show striking similarities between their expression patterns and gain- and loss-of-function phenotypes, suggesting a common function during lateral root spacing and initiation. The GLV6/GLV10-RGI and TOLS2/PIP2-RLK7 modules trigger similar transcriptional changes, likely in part via WRKY transcription factors. Their overlapping set of response genes includes PUCHI and PLT5, both required for the effect of GLV6/10, as well as TOLS2, on lateral root initiation. Furthermore, both modules require the activity of MPK6 and can independently trigger MPK3/MPK6 phosphorylation. The GLV6/10 and TOLS2/PIP2 signaling pathways seem to converge in the activation of MPK3/MPK6, leading to the induction of a similar transcriptional response in the same target cells, thereby regulating lateral root initiation through a (partially) common mechanism. Convergence of signaling pathways downstream of phylogenetically unrelated peptide-receptor modules adds an additional, and hitherto unrecognized, level of complexity to intercellular communication networks in plants.
Collapse
Affiliation(s)
- Joris Jourquin
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhent9052Belgium
- Center for Plant Systems BiologyVIB‐UGentGhent9052Belgium
| | - Ana Ibis Fernandez
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhent9052Belgium
- Center for Plant Systems BiologyVIB‐UGentGhent9052Belgium
| | - Boris Parizot
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhent9052Belgium
- Center for Plant Systems BiologyVIB‐UGentGhent9052Belgium
| | - Ke Xu
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhent9052Belgium
- Center for Plant Systems BiologyVIB‐UGentGhent9052Belgium
| | - Wim Grunewald
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhent9052Belgium
- Center for Plant Systems BiologyVIB‐UGentGhent9052Belgium
| | - Akihito Mamiya
- Department of BiologyGraduate School of ScienceKobe UniversityKobe657‐8501Japan
| | - Hidehiro Fukaki
- Department of BiologyGraduate School of ScienceKobe UniversityKobe657‐8501Japan
| | - Tom Beeckman
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhent9052Belgium
- Center for Plant Systems BiologyVIB‐UGentGhent9052Belgium
| |
Collapse
|
17
|
Luo X, Huang Y, Li H, Luo Y, Zuo Z, Ren J, Xie Y. SPENCER: a comprehensive database for small peptides encoded by noncoding RNAs in cancer patients. Nucleic Acids Res 2022; 50:D1373-D1381. [PMID: 34570216 PMCID: PMC8728293 DOI: 10.1093/nar/gkab822] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 09/03/2021] [Accepted: 09/08/2021] [Indexed: 01/07/2023] Open
Abstract
As an increasing number of noncoding RNAs (ncRNAs) have been suggested to encode short bioactive peptides in cancer, the exploration of ncRNA-encoded small peptides (ncPEPs) is emerging as a fascinating field in cancer research. To assist in studies on the regulatory mechanisms of ncPEPs, we describe here a database called SPENCER (http://spencer.renlab.org). Currently, SPENCER has collected a total of 2806 mass spectrometry (MS) data points from 55 studies, covering 1007 tumor samples and 719 normal samples. Using an MS-based proteomics analysis pipeline, SPENCER identified 29 526 ncPEPs across 15 different cancer types. Specifically, 22 060 of these ncPEPs were experimentally validated in other studies. By comparing tumor and normal samples, the identified ncPEPs were divided into four expression groups: tumor-specific, upregulated in cancer, downregulated in cancer, and others. Additionally, since ncPEPs are potential targets for neoantigen-based cancer immunotherapy, SPENCER also predicted the immunogenicity of all the identified ncPEPs by assessing their MHC-I binding affinity, stability, and TCR recognition probability. As a result, 4497 ncPEPs curated in SPENCER were predicted to be immunogenic. Overall, SPENCER will be a useful resource for investigating cancer-associated ncPEPs and may boost further research in cancer.
Collapse
Affiliation(s)
- Xiaotong Luo
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510080, China
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Yuantai Huang
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510080, China
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Huiqin Li
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510080, China
| | - Yihai Luo
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510080, China
| | - Zhixiang Zuo
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Jian Ren
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510080, China
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University, Guangzhou 510060, China
| | - Yubin Xie
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510080, China
| |
Collapse
|
18
|
Chen L, Yang Y, Zhang Y, Li K, Cai H, Wang H, Zhao Q. The Small Open Reading Frame-Encoded Peptides: Advances in Methodologies and Functional Studies. Chembiochem 2021; 23:e202100534. [PMID: 34862721 DOI: 10.1002/cbic.202100534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 11/15/2021] [Indexed: 11/07/2022]
Abstract
Small open reading frames (sORFs) are an important class of genes with less than 100 codons. They were historically annotated as noncoding or even junk sequences. In recent years, accumulating evidence suggests that sORFs could encode a considerable number of polypeptides, many of which play important roles in both physiology and disease pathology. However, it has been technically challenging to directly detect sORF-encoded peptides (SEPs). Here, we discuss the latest advances in methodologies for identifying SEPs with mass spectrometry, as well as the progress on functional studies of SEPs.
Collapse
Affiliation(s)
- Lei Chen
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China.,Laboratory for Synthetic Chemistry and Chemical Biology Limited, Hong Kong Science and Technology Park, New Territories, Hong Kong SAR, 999077, P. R. China
| | - Ying Yang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Yuanliang Zhang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Kecheng Li
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510623, P. R. China
| | - Hongwei Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510623, P. R. China
| | - Qian Zhao
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| |
Collapse
|
19
|
van Wijk KJ, Leppert T, Sun Q, Boguraev SS, Sun Z, Mendoza L, Deutsch EW. The Arabidopsis PeptideAtlas: Harnessing worldwide proteomics data to create a comprehensive community proteomics resource. THE PLANT CELL 2021; 33:3421-3453. [PMID: 34411258 PMCID: PMC8566204 DOI: 10.1093/plcell/koab211] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 08/13/2021] [Indexed: 05/02/2023]
Abstract
We developed a resource, the Arabidopsis PeptideAtlas (www.peptideatlas.org/builds/arabidopsis/), to solve central questions about the Arabidopsis thaliana proteome, such as the significance of protein splice forms and post-translational modifications (PTMs), or simply to obtain reliable information about specific proteins. PeptideAtlas is based on published mass spectrometry (MS) data collected through ProteomeXchange and reanalyzed through a uniform processing and metadata annotation pipeline. All matched MS-derived peptide data are linked to spectral, technical, and biological metadata. Nearly 40 million out of ∼143 million MS/MS (tandem MS) spectra were matched to the reference genome Araport11, identifying ∼0.5 million unique peptides and 17,858 uniquely identified proteins (only isoform per gene) at the highest confidence level (false discovery rate 0.0004; 2 non-nested peptides ≥9 amino acid each), assigned canonical proteins, and 3,543 lower-confidence proteins. Physicochemical protein properties were evaluated for targeted identification of unobserved proteins. Additional proteins and isoforms currently not in Araport11 were identified that were generated from pseudogenes, alternative start, stops, and/or splice variants, and small Open Reading Frames; these features should be considered when updating the Arabidopsis genome. Phosphorylation can be inspected through a sophisticated PTM viewer. PeptideAtlas is integrated with community resources including TAIR, tracks in JBrowse, PPDB, and UniProtKB. Subsequent PeptideAtlas builds will incorporate millions more MS/MS data.
Collapse
Affiliation(s)
- Klaas J van Wijk
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, USA
- Authors for correspondence: (K.J.V.W.), (E.W.D.)
| | - Tami Leppert
- Institute for Systems Biology (ISB), Seattle, Washington 98109, USA
| | - Qi Sun
- Computational Biology Service Unit, Cornell University, Ithaca, New York 14853, USA
| | - Sascha S Boguraev
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, USA
| | - Zhi Sun
- Institute for Systems Biology (ISB), Seattle, Washington 98109, USA
| | - Luis Mendoza
- Institute for Systems Biology (ISB), Seattle, Washington 98109, USA
| | - Eric W Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington 98109, USA
- Authors for correspondence: (K.J.V.W.), (E.W.D.)
| |
Collapse
|
20
|
Fesenko I, Shabalina SA, Mamaeva A, Knyazev A, Glushkevich A, Lyapina I, Ziganshin R, Kovalchuk S, Kharlampieva D, Lazarev V, Taliansky M, Koonin EV. A vast pool of lineage-specific microproteins encoded by long non-coding RNAs in plants. Nucleic Acids Res 2021; 49:10328-10346. [PMID: 34570232 DOI: 10.1093/nar/gkab816] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 08/17/2021] [Accepted: 09/17/2021] [Indexed: 12/17/2022] Open
Abstract
Pervasive transcription of eukaryotic genomes results in expression of long non-coding RNAs (lncRNAs) most of which are poorly conserved in evolution and appear to be non-functional. However, some lncRNAs have been shown to perform specific functions, in particular, transcription regulation. Thousands of small open reading frames (smORFs, <100 codons) located on lncRNAs potentially might be translated into peptides or microproteins. We report a comprehensive analysis of the conservation and evolutionary trajectories of lncRNAs-smORFs from the moss Physcomitrium patens across transcriptomes of 479 plant species. Although thousands of smORFs are subject to substantial purifying selection, the majority of the smORFs appear to be evolutionary young and could represent a major pool for functional innovation. Using nanopore RNA sequencing, we show that, on average, the transcriptional level of conserved smORFs is higher than that of non-conserved smORFs. Proteomic analysis confirmed translation of 82 novel species-specific smORFs. Numerous conserved smORFs containing low complexity regions (LCRs) or transmembrane domains were identified, the biological functions of a selected LCR-smORF were demonstrated experimentally. Thus, microproteins encoded by smORFs are a major, functionally diverse component of the plant proteome.
Collapse
Affiliation(s)
- Igor Fesenko
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Anna Mamaeva
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Andrey Knyazev
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Anna Glushkevich
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Irina Lyapina
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Rustam Ziganshin
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Sergey Kovalchuk
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation
| | - Daria Kharlampieva
- Department of Cell Biology, Federal Research and Clinical Center of Physical -Chemical Medicine of Federal Medical Biological Agency, Moscow 119435, Russian Federation
| | - Vassili Lazarev
- Department of Cell Biology, Federal Research and Clinical Center of Physical -Chemical Medicine of Federal Medical Biological Agency, Moscow 119435, Russian Federation.,Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Moscow region, 141701, Russian Federation
| | - Michael Taliansky
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russian Federation.,The James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
21
|
Li Y, Zhou H, Chen X, Zheng Y, Kang Q, Hao D, Zhang L, Song T, Luo H, Hao Y, Chen R, Zhang P, He S. SmProt: A Reliable Repository with Comprehensive Annotation of Small Proteins Identified from Ribosome Profiling. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:602-610. [PMID: 34536568 PMCID: PMC9039559 DOI: 10.1016/j.gpb.2021.09.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 09/07/2021] [Accepted: 09/08/2021] [Indexed: 12/30/2022]
Abstract
Small proteins specifically refer to proteins consisting of less than 100 amino acids translated from small open reading frames (sORFs), which were usually missed in previous genome annotation. The significance of small proteins has been revealed in current years, along with the discovery of their diverse functions. However, systematic annotation of small proteins is still insufficient. SmProt was specially developed to provide valuable information on small proteins for scientific community. Here we present the update of SmProt, which emphasizes reliability of translated sORFs, genetic variants in translated sORFs, disease-specific sORF translation events or sequences, and remarkably increased data volume. More components such as non-ATG translation initiation, function, and new sources are also included. SmProt incorporated 638,958 unique small proteins curated from 3,165,229 primary records, which were computationally predicted from 419 ribosome profiling (Ribo-seq) datasets or collected from literature and other sources from 370 cell lines or tissues in 8 species (Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Saccharomyces cerevisiae, Caenorhabditis elegans, and Escherichia coli). In addition, small protein families identified from human microbiomes were also collected. All datasets in SmProt are free to access, and available for browse, search, and bulk downloads at http://bigdata.ibp.ac.cn/SmProt/.
Collapse
Affiliation(s)
- Yanyan Li
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Honghong Zhou
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiaomin Chen
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yu Zheng
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Quan Kang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Di Hao
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Lili Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tingrui Song
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Huaxia Luo
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yajing Hao
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Runsheng Chen
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; Guangdong Geneway Decoding Bio-Tech Co. Ltd, Foshan 528316, China.
| | - Peng Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.
| | - Shunmin He
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.
| |
Collapse
|
22
|
Zhao S, Meng J, Luan Y. LncRNA-Encoded Short Peptides Identification Using Feature Subset Recombination and Ensemble Learning. Interdiscip Sci 2021; 14:101-112. [PMID: 34304369 DOI: 10.1007/s12539-021-00464-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/14/2021] [Accepted: 07/16/2021] [Indexed: 11/28/2022]
Abstract
Long non-coding RNA (lncRNA), which is a type of non-coding RNA, was reported to contain short open reading frames (sORFs). SORFs-encoded short peptides (SEPs) have been demonstrated to play a crucial role in regulating the biological processes such as growth, development, and resistance response. The identification of SEPs is vital to further understanding their function. However, there is still a lack of methods for identifying SEPs effectively and rapidly. In this study, a novel method for lncRNA-encoded short peptides identification based on feature subset recombination and ensemble learning, lncPepid, is developed. lncPepid transforms the data of Zea mays and Arabidopsis thaliana into hybrid features from two aspects including sequence composition and physicochemical properties separately. It optimizes hybrid features by proposing a novel weighted iteration-based feature selection method to recombine a stable subset that characterizes SEPs effectively. Different classification models with different optimized features are constructed and tested separately. The outputs of the optimal models are integrated for ensemble classification to improve efficiency. Experimental results manifest that the geometric mean of sensitivity and specificity of lncPepid is about 70% on the identification of functional SEPs derived from multiple species. It is an effective and rapid method for the identification of lncRNA-encoded short peptides. This study can be extended to the research on SEPs from other species and have crucial implications for further findings and studies of functional genomics.
Collapse
Affiliation(s)
- Siyuan Zhao
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, China
| |
Collapse
|
23
|
Vitorino R, Guedes S, Amado F, Santos M, Akimitsu N. The role of micropeptides in biology. Cell Mol Life Sci 2021; 78:3285-3298. [PMID: 33507325 PMCID: PMC11073438 DOI: 10.1007/s00018-020-03740-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 12/01/2020] [Accepted: 12/11/2020] [Indexed: 12/11/2022]
Abstract
Micropeptides are small polypeptides coded by small open-reading frames. Progress in computational biology and the analyses of large-scale transcriptomes and proteomes have revealed that mammalian genomes produce a large number of transcripts encoding micropeptides. Many of these have been previously annotated as long noncoding RNAs. The role of micropeptides in cellular homeostasis maintenance has been demonstrated. This review discusses different types of micropeptides as well as methods to identify them, such as computational approaches, ribosome profiling, and mass spectrometry.
Collapse
Affiliation(s)
- Rui Vitorino
- Departamento de Cirurgia E Fisiologia, Faculdade de Medicina da Universidade Do Porto, UnIC, Porto, Portugal.
- Department of Medical Sciences, iBiMED, University of Aveiro, Aveiro, Portugal.
| | - Sofia Guedes
- Departamento de Química, LAQV-REQUIMTE, Universidade de Aveiro, Aveiro, Portugal
- Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Francisco Amado
- Departamento de Química, LAQV-REQUIMTE, Universidade de Aveiro, Aveiro, Portugal
- Department of Chemistry, University of Aveiro, Aveiro, Portugal
| | - Manuel Santos
- Department of Medical Sciences, iBiMED, University of Aveiro, Aveiro, Portugal
| | | |
Collapse
|
24
|
Schlesinger D, Elsässer SJ. Revisiting sORFs: overcoming challenges to identify and characterize functional microproteins. FEBS J 2021; 289:53-74. [PMID: 33595896 DOI: 10.1111/febs.15769] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 01/17/2021] [Accepted: 02/15/2021] [Indexed: 02/07/2023]
Abstract
Short ORFs (sORFs), that is, occurrences of a start and stop codon within 100 codons or less, can be found in organisms of all domains of life, outnumbering annotated protein-coding ORFs by orders of magnitude. Even though functional proteins smaller than 100 amino acids are known, the coding potential of sORFs has often been overlooked, as it is not trivial to predict and test for functionality within the large number of sORFs. Recent advances in ribosome profiling and mass spectrometry approaches, together with refined bioinformatic predictions, have enabled a huge leap forward in this field and identified thousands of likely coding sORFs. A relatively low number of small proteins or microproteins produced from these sORFs have been characterized so far on the molecular, structural, and/or mechanistic level. These however display versatile and, in some cases, essential cellular functions, allowing for the exciting possibility that many more, previously unknown small proteins might be encoded in the genome, waiting to be discovered. This review will give an overview of the steadily growing microprotein field, focusing on eukaryotic small proteins. We will discuss emerging themes in the molecular action of microproteins, as well as advances and challenges in microprotein identification and characterization.
Collapse
Affiliation(s)
- Dörte Schlesinger
- Science for Life Laboratory, Division of Genome Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.,Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, Sweden
| | - Simon J Elsässer
- Science for Life Laboratory, Division of Genome Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.,Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
25
|
Nasir MA, Nawaz S, Huang J. A Mini-review of Computational Approaches to Predict Functions and Findings of Novel Micro Peptides. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200811130522] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
:
New techniques in bioinformatics and the study of the transcriptome at a wide-scale
have uncovered the fact that a large part of the genome is being translated than recently perceived
thoughts and research, bringing about the creation of a various quantity of RNA with proteincoding
and noncoding potential. A lot of RNA particles have been considered as noncoding due to
many reasons, according to developing proofs. Like many sORFs that encode many functional
micro peptides have neglected due to their tiny sizes.
:
Advanced studies reveal many major biological functions of these sORFs and their encoded micro
peptides in a different and wide range of species. All the achievement in the identification of these
sORFs and micro peptides is due to the progressive bioinformatics and high-throughput
sequencing methods. This field has pulled in more consideration due to the detection of a large
number of more sORFs and micro peptides. Nowadays, COVID-19 grabs all the attention of
science as it is a sudden outbreak. sORFs of COVID-19 should be revealed for new ways to
understand this virus. This review discusses ongoing progress in the systems for the identification
and distinguishing proof of sORFs and micro peptides.
Collapse
Affiliation(s)
- Mohsin Ali Nasir
- Center for Informational Biology, University of Electronic Science and Technology of China, No. 2006, Xiyuan Ave, West Hi-Tech Zone, Chengdu 611731, China
| | - Samia Nawaz
- Center for Informational Biology, University of Electronic Science and Technology of China, No. 2006, Xiyuan Ave, West Hi-Tech Zone, Chengdu 611731, China
| | - Jian Huang
- Center for Informational Biology, University of Electronic Science and Technology of China, No. 2006, Xiyuan Ave, West Hi-Tech Zone, Chengdu 611731, China
| |
Collapse
|
26
|
Fabre B, Combier JP, Plaza S. Recent advances in mass spectrometry-based peptidomics workflows to identify short-open-reading-frame-encoded peptides and explore their functions. Curr Opin Chem Biol 2021; 60:122-130. [PMID: 33401134 DOI: 10.1016/j.cbpa.2020.12.002] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 11/26/2020] [Accepted: 12/03/2020] [Indexed: 12/12/2022]
Abstract
Short open reading frame (sORF)-encoded polypeptides (SEPs) have recently emerged as key regulators of major cellular processes. Computational methods for the annotation of sORFs combined with transcriptomics and ribosome profiling approaches predicted the existence of tens of thousands of SEPs across the kingdom of life. Although, we still lack unambiguous evidence for most of them. The method of choice to validate the expression of SEPs is mass spectrometry (MS)-based peptidomics. Peptides are less abundant than proteins, which tends to hinder their detection. Therefore, optimization and enrichment methods are necessary to validate the existence of SEPs. In this article, we discuss the challenges for the detection of SEPs by MS and recent developments of biochemical approaches applied to the study of these peptides. We detail the advances made in the different key steps of a typical peptidomics workflow and highlight possible alternatives that have not been explored yet.
Collapse
Affiliation(s)
- Bertrand Fabre
- Laboratoire de Recherche en Sciences Végétales, UMR5546, Université de Toulouse, UPS, CNRS, 31320, Auzeville-Tolosane, France.
| | - Jean-Philippe Combier
- Laboratoire de Recherche en Sciences Végétales, UMR5546, Université de Toulouse, UPS, CNRS, 31320, Auzeville-Tolosane, France
| | - Serge Plaza
- Laboratoire de Recherche en Sciences Végétales, UMR5546, Université de Toulouse, UPS, CNRS, 31320, Auzeville-Tolosane, France
| |
Collapse
|
27
|
Chen Y, Li D, Fan W, Zheng X, Zhou Y, Ye H, Liang X, Du W, Zhou Y, Wang K. PsORF: a database of small ORFs in plants. PLANT BIOTECHNOLOGY JOURNAL 2020; 18:2158-2160. [PMID: 32333496 PMCID: PMC7589237 DOI: 10.1111/pbi.13389] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 04/14/2020] [Accepted: 04/18/2020] [Indexed: 05/15/2023]
Affiliation(s)
- Yanjun Chen
- College of Life SciencesWuhan UniversityWuhanChina
| | - Danyang Li
- College of Life SciencesWuhan UniversityWuhanChina
| | - Weiliang Fan
- College of Life SciencesWuhan UniversityWuhanChina
- State Key Laboratory of VirologyWuhan UniversityWuhanChina
| | | | - Yifan Zhou
- College of Life SciencesWuhan UniversityWuhanChina
| | - Hanzhe Ye
- College of Life SciencesWuhan UniversityWuhanChina
| | | | - Wei Du
- College of Life SciencesWuhan UniversityWuhanChina
| | - Yu Zhou
- College of Life SciencesWuhan UniversityWuhanChina
- State Key Laboratory of VirologyWuhan UniversityWuhanChina
| | - Kun Wang
- College of Life SciencesWuhan UniversityWuhanChina
| |
Collapse
|
28
|
Mass-spectrometry-based draft of the Arabidopsis proteome. Nature 2020; 579:409-414. [PMID: 32188942 DOI: 10.1038/s41586-020-2094-2] [Citation(s) in RCA: 261] [Impact Index Per Article: 65.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 01/17/2020] [Indexed: 01/05/2023]
Abstract
Plants are essential for life and are extremely diverse organisms with unique molecular capabilities1. Here we present a quantitative atlas of the transcriptomes, proteomes and phosphoproteomes of 30 tissues of the model plant Arabidopsis thaliana. Our analysis provides initial answers to how many genes exist as proteins (more than 18,000), where they are expressed, in which approximate quantities (a dynamic range of more than six orders of magnitude) and to what extent they are phosphorylated (over 43,000 sites). We present examples of how the data may be used, such as to discover proteins that are translated from short open-reading frames, to uncover sequence motifs that are involved in the regulation of protein production, and to identify tissue-specific protein complexes or phosphorylation-mediated signalling events. Interactive access to this resource for the plant community is provided by the ProteomicsDB and ATHENA databases, which include powerful bioinformatics tools to explore and characterize Arabidopsis proteins, their modifications and interactions.
Collapse
|
29
|
Abstract
INTRODUCTION Small open reading frames (sORFs) with potential protein-coding capacity have been disclosed in various transcripts, including long noncoding RNAs (LncRNAs), mRNAs (5'-upstream, coding domain, and 3'-downstream), circular RNAs, pri-miRNAs, and ribosomal RNAs (rRNAs). Recent characterization of several sORF-encoded peptides (SEPs or micropeptides) revealed their important roles in many fundamental biological processes in a broad range of species from yeast to human. The success in the mining of micropeptides attributes to the advanced bioinformatics and high-throughput sequencing techniques. Areas covered: sORFs and SEPs were overlooked for their tiny size and the difficulty of identification by bioinformatics analyses. With more and more sORFs and SEPs have been identified, this field has attracted more attention. This review covers recent advances in the strategies for the detection and identification of sORFs and SEPs. Expert commentary: The advantages and drawbacks of the strategies for detection and identification of sORFs and SEPs are discussed, as well as the techniques that are used to decipher the roles of micropeptides in organisms are described.
Collapse
Affiliation(s)
- Xinqiang Yin
- a The Engineering Research Center of Synthetic Polypeptide Drug Discovery and Evaluation of Jiangsu Province , China Pharmaceutical University , Nanjing , China.,b The Basic Medical School , North Sichuan Medical College , Nanchong , China
| | - Yuanyuan Jing
- c Department of Preventive Medicine , North Sichuan Medical College , Nanchong , China
| | - Hanmei Xu
- a The Engineering Research Center of Synthetic Polypeptide Drug Discovery and Evaluation of Jiangsu Province , China Pharmaceutical University , Nanjing , China.,d State Key Laboratory of Natural Medicines, Ministry of Education , China Pharmaceutical University , Nanjing , China
| |
Collapse
|
30
|
Hazarika RR, Sostaric N, Sun Y, van Noort V. Large-scale docking predicts that sORF-encoded peptides may function through protein-peptide interactions in Arabidopsis thaliana. PLoS One 2018; 13:e0205179. [PMID: 30321192 PMCID: PMC6188750 DOI: 10.1371/journal.pone.0205179] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Accepted: 09/20/2018] [Indexed: 02/07/2023] Open
Abstract
Several recent studies indicate that small Open Reading Frames (sORFs) embedded within multiple eukaryotic non-coding RNAs can be translated into bioactive peptides of up to 100 amino acids in size. However, the functional roles of the 607 Stress Induced Peptides (SIPs) previously identified from 189 Transcriptionally Active Regions (TARs) in Arabidopsis thaliana remain unclear. To provide a starting point for functional annotation of these plant-derived peptides, we performed a large-scale prediction of peptide binding sites on protein surfaces using coarse-grained peptide docking. The docked models were subjected to further atomistic refinement and binding energy calculations. A total of 530 peptide-protein pairs were successfully docked. In cases where a peptide encoded by a TAR is predicted to bind at a known ligand or cofactor-binding site within the protein, it can be assumed that the peptide modulates the ligand or cofactor-binding. Moreover, we predict that several peptides bind at protein-protein interfaces, which could therefore regulate the formation of the respective complexes. Protein-peptide binding analysis further revealed that peptides employ both their backbone and side chain atoms when binding to the protein, forming predominantly hydrophobic interactions and hydrogen bonds. In this study, we have generated novel predictions on the potential protein-peptide interactions in A. thaliana, which will help in further experimental validation.
Collapse
Affiliation(s)
- Rashmi R. Hazarika
- Department of Microbial and Molecular Systems, KU Leuven, Leuven, Belgium
| | - Nikolina Sostaric
- Department of Microbial and Molecular Systems, KU Leuven, Leuven, Belgium
| | - Yifeng Sun
- Department of Microbial and Molecular Systems, KU Leuven, Leuven, Belgium
- Faculty of Engineering Technology, Campus Group T, KU Leuven, Leuven, Belgium
| | - Vera van Noort
- Department of Microbial and Molecular Systems, KU Leuven, Leuven, Belgium
- Institute of Biology Leiden, Leiden University, Leiden, The Netherlands
- * E-mail:
| |
Collapse
|
31
|
Li Q, Ahsan MA, Chen H, Xue J, Chen M. Discovering Putative Peptides Encoded from Noncoding RNAs in Ribosome Profiling Data of Arabidopsis thaliana. ACS Synth Biol 2018; 7:655-663. [PMID: 29376339 DOI: 10.1021/acssynbio.7b00386] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Most noncoding RNAs are considered by their expression at low levels and as having a limited phylogenetic distribution in the cytoplasm, indicating that they may be only involved in specific biological processes. However, recent studies showed the protein-coding potential of ncRNAs, indicating that they might be a source of some special proteins. Although there are increasing noncoding RNAs identified to be able to code proteins, it is challenging to distinguish coding RNAs from previously annotated ncRNAs, and to detect the proteins from their translation. In this article, we designed a pipeline to identify these noncoding RNAs in Arabidopsis thaliana from three NCBI GEO data sets with coding potential and predict their translation products. 31 311 noncoding RNAs were predicted to be translated into peptides, and they showed lower conservation rate than common proteins. In addition, we built an interaction network between these peptides and annotated Arabidopsis proteins using BIPS, which included 69 peptides from noncoding RNAs. Peptides in the interaction network showed different characteristics from other noncoding RNA-derived peptides, and they participated in several crucial biological processes, such as photorespiration and stress-responses. All the information of putative ncPEPs and their interaction with proteins predicted above are finally integrated in a database, PncPEPDB ( http://bis.zju.edu.cn/PncPEPDB ). These results showed that peptides derived from noncoding RNAs may play important roles in noncoding RNA regulation, which provided another hypothesis that noncoding RNA may regulate the metabolism via their translation products.
Collapse
Affiliation(s)
- Qilin Li
- Department
of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Md. Asif Ahsan
- Department
of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Hongjun Chen
- Department
of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jitong Xue
- Department
of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
- James
D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ming Chen
- Department
of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
- James
D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
32
|
Olexiouk V, Van Criekinge W, Menschaert G. An update on sORFs.org: a repository of small ORFs identified by ribosome profiling. Nucleic Acids Res 2018; 46:D497-D502. [PMID: 29140531 PMCID: PMC5753181 DOI: 10.1093/nar/gkx1130] [Citation(s) in RCA: 123] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 10/25/2017] [Accepted: 10/26/2017] [Indexed: 12/13/2022] Open
Abstract
sORFs.org (http://www.sorfs.org) is a public repository of small open reading frames (sORFs) identified by ribosome profiling (RIBO-seq). This update elaborates on the major improvements implemented since its initial release. sORFs.org now additionally supports three more species (zebrafish, rat and Caenorhabditis elegans) and currently includes 78 RIBO-seq datasets, a vast increase compared to the three that were processed in the initial release. Therefore, a novel pipeline was constructed that also enables sORF detection in RIBO-seq datasets comprising solely elongating RIBO-seq data while previously, matching initiating RIBO-seq data was necessary to delineate the sORFs. Furthermore, a novel noise filtering algorithm was designed, able to distinguish sORFs with true ribosomal activity from simulated noise, consequently reducing the false positive identification rate. The inclusion of other species also led to the development of an inner BLAST pipeline, assessing sequence similarity between sORFs in the repository. Building on the proof of concept model in the initial release of sORFs.org, a full PRIDE-ReSpin pipeline was now released, reprocessing publicly available MS-based proteomics PRIDE datasets, reporting on true translation events. Next to reporting those identified peptides, sORFs.org allows visual inspection of the annotated spectra within the Lorikeet MS/MS viewer, thus enabling detailed manual inspection and interpretation.
Collapse
Affiliation(s)
- Volodimir Olexiouk
- Lab of Bioinformatics and Computational Genomics (BioBix), Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, 9000 Ghent, Belgium
| | - Wim Van Criekinge
- Lab of Bioinformatics and Computational Genomics (BioBix), Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, 9000 Ghent, Belgium
| | - Gerben Menschaert
- Lab of Bioinformatics and Computational Genomics (BioBix), Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
33
|
Hsu PY, Benfey PN. Small but Mighty: Functional Peptides Encoded by Small ORFs in Plants. Proteomics 2017; 18:e1700038. [PMID: 28759167 DOI: 10.1002/pmic.201700038] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Revised: 07/26/2017] [Indexed: 12/18/2022]
Abstract
Peptides encoded by small open reading frames (sORFs, usually <100 codons) play critical regulatory roles in plant development and environmental responses. Despite their importance, only a small number of these peptides have been identified and characterized. Genomic studies have revealed that many plant genomes contain thousands of possible sORFs, which could potentially encode small peptides. The challenge is to distinguish translated sORFs from nontranslated ones. Here, we highlight advances in methodologies for identifying these hidden sORFs in plant genomes, including ribosome profiling and proteomics. We also examine the evidence for new peptides arising from sORFs and discuss their functions in plant development, environmental responses, and translational control.
Collapse
Affiliation(s)
| | - Philip N Benfey
- Department of Biology, Duke University, Durham, NC, USA.,Howard Hughes Medical Institute, Duke University, Durham, NC, USA
| |
Collapse
|
34
|
Abstract
A large body of evidence indicates that genome annotation pipelines have biased our view of coding sequences because they generally undersample small proteins and peptides. The recent development of genome-wide translation profiling reveals the prevalence of small/short open reading frames (smORFs or sORFs), which are scattered over all classes of transcripts, including both mRNAs and presumptive long noncoding RNAs. Proteomic approaches further confirm an unexpected variety of smORF-encoded peptides (SEPs), representing an overlooked reservoir of bioactive molecules. Indeed, functional studies in a broad range of species from yeast to humans demonstrate that SEPs can harbor key activities for the control of development, differentiation, and physiology. Here we summarize recent advances in the discovery and functional characterization of smORF/SEPs and discuss why these small players can no longer be ignored with regard to genome function.
Collapse
Affiliation(s)
- Serge Plaza
- Laboratoire de Recherches en Sciences Végétales, Université de Toulouse, Université Paul Sabatier, 31326 Castanet Tolosan, France; .,CNRS, UMR5546, Laboratoire de Recherches en Sciences Végétales, 31326 Castanet Tolosan, France
| | - Gerben Menschaert
- Department of Mathematical Modeling, Statistics and Bioinformatics, University of Ghent, 9000 Gent, Belgium
| | - François Payre
- Centre de Biologie du Développement, Centre de Biologie Intégrative, Université de Toulouse, CNRS, Université Paul Sabatier, 31062 Toulouse, France;
| |
Collapse
|