1
|
Huang J, Yang P, Pan W, Wu F, Qiu J, Ma Z. The role of polypeptides encoded by ncRNAs in cancer. Gene 2024; 928:148817. [PMID: 39098512 DOI: 10.1016/j.gene.2024.148817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 07/22/2024] [Accepted: 07/31/2024] [Indexed: 08/06/2024]
Abstract
It was previously thought that ncRNA could not encode polypeptides, but recent reports have challenged this notion. As research into ncRNA progresses, it is increasingly clear that it serves roles beyond traditional mechanisms, playing significant regulatory roles in various diseases, notably cancer, which is responsible for 70% of human deaths. Numerous studies have highlighted the diverse regulatory mechanisms of ncRNA that are pivotal in cancer initiation and progression. The role of ncRNA-encoded polypeptides in cancer regulation has gained prominence. This article explores the newly identified regulatory functions of these polypeptides in three types of ncRNA-lncRNA, pri-miRNA, and circRNA. These polypeptides can interact with proteins, influence signaling pathways, enhance miRNA stability, and regulate cancer progression, malignancy, resistance, and other clinical challenges. Furthermore, we discuss the evolutionary significance of these polypeptides in the transition from RNA to protein, examining their emergence and conservation throughout evolution.
Collapse
Affiliation(s)
- Jiayuan Huang
- Lab for Noncoding RNA & Cancer, School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Ping Yang
- Department of Gynecology, The Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Kunming 650118,China
| | - Wei Pan
- Lab for Noncoding RNA & Cancer, School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Fan Wu
- Lab for Noncoding RNA & Cancer, School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Jianhua Qiu
- Department of Anesthesiology, Ruijin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 201800, China.
| | - Zhongliang Ma
- Lab for Noncoding RNA & Cancer, School of Life Sciences, Shanghai University, Shanghai 200444, China.
| |
Collapse
|
2
|
Tian H, Tang L, Yang Z, Xiang Y, Min Q, Yin M, You H, Xiao Z, Shen J. Current understanding of functional peptides encoded by lncRNA in cancer. Cancer Cell Int 2024; 24:252. [PMID: 39030557 PMCID: PMC11265036 DOI: 10.1186/s12935-024-03446-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 07/09/2024] [Indexed: 07/21/2024] Open
Abstract
Dysregulated gene expression and imbalance of transcriptional regulation are typical features of cancer. RNA always plays a key role in these processes. Human transcripts contain many RNAs without long open reading frames (ORF, > 100 aa) and that are more than 200 bp in length. They are usually regarded as long non-coding RNA (lncRNA) which play an important role in cancer regulation, including chromatin remodeling, transcriptional regulation, translational regulation and as miRNA sponges. With the advancement of ribosome profiling and sequencing technologies, increasing research evidence revealed that some ORFs in lncRNA can also encode peptides and participate in the regulation of multiple organ tumors, which undoubtedly opens a new chapter in the field of lncRNA and oncology research. In this review, we discuss the biological function of lncRNA in tumors, the current methods to evaluate their coding potential and the role of functional small peptides encoded by lncRNA in cancers. Investigating the small peptides encoded by lncRNA and understanding the regulatory mechanisms of these functional peptides may contribute to a deeper understanding of cancer and the development of new targeted anticancer therapies.
Collapse
Affiliation(s)
- Hua Tian
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
- School of Nursing, Chongqing College of Humanities, Science & Technology, Chongqing, China
| | - Lu Tang
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
| | - Zihan Yang
- Department of Pathology, The Affiliated Hospital of Southwest Medical University, Luzhou, China, 646000
| | - Yanxi Xiang
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
| | - Qi Min
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
| | - Mengshuang Yin
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
| | - Huili You
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
| | - Zhangang Xiao
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China.
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China.
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China.
- Gulin Traditional Chinese Medicine Hospital, Luzhou, China.
- Department of Pharmacology, School of Pharmacy, Sichuan College of Traditional Chinese Medicine, Mianyang, China.
| | - Jing Shen
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China.
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China.
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China.
| |
Collapse
|
3
|
Santos-Júnior CD, Torres MDT, Duan Y, Rodríguez Del Río Á, Schmidt TSB, Chong H, Fullam A, Kuhn M, Zhu C, Houseman A, Somborski J, Vines A, Zhao XM, Bork P, Huerta-Cepas J, de la Fuente-Nunez C, Coelho LP. Discovery of antimicrobial peptides in the global microbiome with machine learning. Cell 2024; 187:3761-3778.e16. [PMID: 38843834 DOI: 10.1016/j.cell.2024.05.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 04/11/2024] [Accepted: 05/06/2024] [Indexed: 06/25/2024]
Abstract
Novel antibiotics are urgently needed to combat the antibiotic-resistance crisis. We present a machine-learning-based approach to predict antimicrobial peptides (AMPs) within the global microbiome and leverage a vast dataset of 63,410 metagenomes and 87,920 prokaryotic genomes from environmental and host-associated habitats to create the AMPSphere, a comprehensive catalog comprising 863,498 non-redundant peptides, few of which match existing databases. AMPSphere provides insights into the evolutionary origins of peptides, including by duplication or gene truncation of longer sequences, and we observed that AMP production varies by habitat. To validate our predictions, we synthesized and tested 100 AMPs against clinically relevant drug-resistant pathogens and human gut commensals both in vitro and in vivo. A total of 79 peptides were active, with 63 targeting pathogens. These active AMPs exhibited antibacterial activity by disrupting bacterial membranes. In conclusion, our approach identified nearly one million prokaryotic AMP sequences, an open-access resource for antibiotic discovery.
Collapse
Affiliation(s)
- Célio Dias Santos-Júnior
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai 200433, China; Laboratory of Microbial Processes & Biodiversity - LMPB, Department of Hydrobiology, Universidade Federal de São Carlos - UFSCar, São Carlos, São Paulo 13565-905, Brazil
| | - Marcelo D T Torres
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA; Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA; Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Yiqian Duan
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai 200433, China
| | - Álvaro Rodríguez Del Río
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Pozuelo de Alarcón, 28223 Madrid, Spain
| | - Thomas S B Schmidt
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany; APC Microbiome & School of Medicine, University College Cork, Cork, Ireland
| | - Hui Chong
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai 200433, China
| | - Anthony Fullam
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Michael Kuhn
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Chengkai Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai 200433, China
| | - Amy Houseman
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai 200433, China
| | - Jelena Somborski
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai 200433, China
| | - Anna Vines
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai 200433, China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai 200433, China; Department of Neurology, Zhongshan Hospital, Fudan University, Shanghai, China; State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China; MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany; Max Delbrück Centre for Molecular Medicine, Berlin, Germany; Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Pozuelo de Alarcón, 28223 Madrid, Spain
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA; Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA; Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA.
| | - Luis Pedro Coelho
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai 200433, China; Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, QLD, Australia.
| |
Collapse
|
4
|
Coelho LP, Santos-Júnior CD, de la Fuente-Nunez C. Challenges in computational discovery of bioactive peptides in 'omics data. Proteomics 2024; 24:e2300105. [PMID: 38458994 DOI: 10.1002/pmic.202300105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 02/06/2024] [Accepted: 02/06/2024] [Indexed: 03/10/2024]
Abstract
Peptides have a plethora of activities in biological systems that can potentially be exploited biotechnologically. Several peptides are used clinically, as well as in industry and agriculture. The increase in available 'omics data has recently provided a large opportunity for mining novel enzymes, biosynthetic gene clusters, and molecules. While these data primarily consist of DNA sequences, other types of data provide important complementary information. Due to their size, the approaches proven successful at discovering novel proteins of canonical size cannot be naïvely applied to the discovery of peptides. Peptides can be encoded directly in the genome as short open reading frames (smORFs), or they can be derived from larger proteins by proteolysis. Both of these peptide classes pose challenges as simple methods for their prediction result in large numbers of false positives. Similarly, functional annotation of larger proteins, traditionally based on sequence similarity to infer orthology and then transferring functions between characterized proteins and uncharacterized ones, cannot be applied for short sequences. The use of these techniques is much more limited and alternative approaches based on machine learning are used instead. Here, we review the limitations of traditional methods as well as the alternative methods that have recently been developed for discovering novel bioactive peptides with a focus on prokaryotic genomes and metagenomes.
Collapse
Affiliation(s)
- Luis Pedro Coelho
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Woolloongabba, Queensland, Australia
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Célio Dias Santos-Júnior
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
- Laboratory of Microbial Processes & Biodiversity - LMPB, Hydrobiology Department, Federal University of São Carlos - UFSCar, São Paulo, Brazil
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
5
|
Das D, Podder S. Microscale marvels: unveiling the macroscopic significance of micropeptides in human health. Brief Funct Genomics 2024:elae018. [PMID: 38706311 DOI: 10.1093/bfgp/elae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 04/07/2024] [Accepted: 04/15/2024] [Indexed: 05/07/2024] Open
Abstract
Non-coding RNA encodes micropeptides from small open reading frames located within the RNA. Interestingly, these micropeptides are involved in a variety of functions within the body. They are emerging as the resolving piece of the puzzle for complex biomolecular signaling pathways within the body. Recent studies highlight the pivotal role of small peptides in regulating important biological processes like DNA repair, gene expression, muscle regeneration, immune responses, etc. On the contrary, altered expression of micropeptides also plays a pivotal role in the progression of various diseases like cardiovascular diseases, neurological disorders and several types of cancer, including colorectal cancer, hepatocellular cancer, lung cancer, etc. This review delves into the dual impact of micropeptides on health and pathology, exploring their pivotal role in preserving normal physiological homeostasis and probing their involvement in the triggering and progression of diseases.
Collapse
Affiliation(s)
- Deepyaman Das
- Computational and Systems Biology Laboratory, Department of Microbiology, Raiganj University, Raiganj, Uttar Dinajpur, West Bengal-733134, India
| | - Soumita Podder
- Computational and Systems Biology Laboratory, Department of Microbiology, Raiganj University, Raiganj, Uttar Dinajpur, West Bengal-733134, India
| |
Collapse
|
6
|
Fesenko I, Sahakyan H, Shabalina SA, Koonin EV. The Cryptic Bacterial Microproteome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.17.580829. [PMID: 38903115 PMCID: PMC11188072 DOI: 10.1101/2024.02.17.580829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/22/2024]
Abstract
Microproteins encoded by small open reading frames (smORFs) comprise the "dark matter" of proteomes. Although functional microproteins were identified in diverse organisms from all three domains of life, bacterial smORFs remain poorly characterized. In this comprehensive study of intergenic smORFs (ismORFs, 15-70 codons) in 5,668 bacterial genomes of the family Enterobacteriaceae, we identified 67,297 clusters of ismORFs subject to purifying selection. The ismORFs mainly code for hydrophobic, potentially transmembrane, unstructured, or minimally structured microproteins. Using AlphaFold Multimer, we predicted interactions of some of the predicted microproteins encoded by transcribed ismORFs with proteins encoded by neighboring genes, revealing the potential of microproteins to regulate the activity of various proteins, particularly, under stress. We compiled a catalog of predicted microprotein families with different levels of evidence from synteny analysis, structure prediction, and transcription and translation data. This study offers a resource for investigation of biological functions of microproteins.
Collapse
Affiliation(s)
- Igor Fesenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Harutyun Sahakyan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Svetlana A. Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
7
|
Rodríguez Del Río Á, Giner-Lamia J, Cantalapiedra CP, Botas J, Deng Z, Hernández-Plaza A, Munar-Palmer M, Santamaría-Hernando S, Rodríguez-Herva JJ, Ruscheweyh HJ, Paoli L, Schmidt TSB, Sunagawa S, Bork P, López-Solanilla E, Coelho LP, Huerta-Cepas J. Functional and evolutionary significance of unknown genes from uncultivated taxa. Nature 2024; 626:377-384. [PMID: 38109938 PMCID: PMC10849945 DOI: 10.1038/s41586-023-06955-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 12/08/2023] [Indexed: 12/20/2023]
Abstract
Many of the Earth's microbes remain uncultured and understudied, limiting our understanding of the functional and evolutionary aspects of their genetic material, which remain largely overlooked in most metagenomic studies1. Here we analysed 149,842 environmental genomes from multiple habitats2-6 and compiled a curated catalogue of 404,085 functionally and evolutionarily significant novel (FESNov) gene families exclusive to uncultivated prokaryotic taxa. All FESNov families span multiple species, exhibit strong signals of purifying selection and qualify as new orthologous groups, thus nearly tripling the number of bacterial and archaeal gene families described to date. The FESNov catalogue is enriched in clade-specific traits, including 1,034 novel families that can distinguish entire uncultivated phyla, classes and orders, probably representing synapomorphies that facilitated their evolutionary divergence. Using genomic context analysis and structural alignments we predicted functional associations for 32.4% of FESNov families, including 4,349 high-confidence associations with important biological processes. These predictions provide a valuable hypothesis-driven framework that we used for experimental validatation of a new gene family involved in cell motility and a novel set of antimicrobial peptides. We also demonstrate that the relative abundance profiles of novel families can discriminate between environments and clinical conditions, leading to the discovery of potentially new biomarkers associated with colorectal cancer. We expect this work to enhance future metagenomics studies and expand our knowledge of the genetic repertory of uncultivated organisms.
Collapse
Affiliation(s)
- Álvaro Rodríguez Del Río
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Joaquín Giner-Lamia
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
- Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid, Spain
- Departamento de Bioquímica Vegetal y Biología Molecular, Facultad de Biología, Instituto de Bioquímica Vegetal y Fotosíntesis (IBVF), Universidad de Sevilla-CSIC, Seville, Spain
| | - Carlos P Cantalapiedra
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Jorge Botas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Ziqi Deng
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Ana Hernández-Plaza
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Martí Munar-Palmer
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Saray Santamaría-Hernando
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - José J Rodríguez-Herva
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
- Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Hans-Joachim Ruscheweyh
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | - Lucas Paoli
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | - Thomas S B Schmidt
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Shinichi Sunagawa
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Max Delbrück Centre for Molecular Medicine, Berlin, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Emilia López-Solanilla
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
- Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Luis Pedro Coelho
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Shanghai, China
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain.
| |
Collapse
|
8
|
Eggenhofer F, Höner Zu Siederdissen C. Evolutionary Structure Conservation and Covariance Scores. Methods Mol Biol 2024; 2726:255-284. [PMID: 38780735 DOI: 10.1007/978-1-0716-3519-3_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Effective homology search for non-coding RNAs is frequently not possible via sequence similarity alone. Current methods leverage evolutionary information like structure conservation or covariance scores to identify homologs in organisms that are phylogenetically more distant. In this chapter, we introduce the theoretical background of evolutionary structure conservation and covariance score, and we show hands-on how current methods in the field are applied on example datasets.
Collapse
Affiliation(s)
- Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science University of Freiburg, Freiburg, Germany
| | - Christian Höner Zu Siederdissen
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Bioinformatics/High-Throughput Analysis, Faculty of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany.
| |
Collapse
|
9
|
Fuchs S, Engelmann S. Small proteins in bacteria - Big challenges in prediction and identification. Proteomics 2023; 23:e2200421. [PMID: 37609810 DOI: 10.1002/pmic.202200421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/24/2023]
Abstract
Proteins with up to 100 amino acids have been largely overlooked due to the challenges associated with predicting and identifying them using traditional methods. Recent advances in bioinformatics and machine learning, DNA sequencing, RNA and Ribo-seq technologies, and mass spectrometry (MS) have greatly facilitated the detection and characterisation of these elusive proteins in recent years. This has revealed their crucial role in various cellular processes including regulation, signalling and transport, as toxins and as folding helpers for protein complexes. Consequently, the systematic identification and characterisation of these proteins in bacteria have emerged as a prominent field of interest within the microbial research community. This review provides an overview of different strategies for predicting and identifying these proteins on a large scale, leveraging the power of these advanced technologies. Furthermore, the review offers insights into the future developments that may be expected in this field.
Collapse
Affiliation(s)
- Stephan Fuchs
- Genome Competence Center (MF1), Department MFI, Robert-Koch-Institut, Berlin, Germany
| | - Susanne Engelmann
- Institute for Microbiology, Technische Universität Braunschweig, Braunschweig, Germany
- Microbial Proteomics, Helmholtzzentrum für Infektionsforschung GmbH, Braunschweig, Germany
| |
Collapse
|
10
|
Fremin BJ, Bhatt AS, Kyrpides NC. Identification of over ten thousand candidate structured RNAs in viruses and phages. Comput Struct Biotechnol J 2023; 21:5630-5639. [PMID: 38047235 PMCID: PMC10690425 DOI: 10.1016/j.csbj.2023.11.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 11/03/2023] [Accepted: 11/03/2023] [Indexed: 12/05/2023] Open
Abstract
Structured RNAs play crucial roles in viruses, exerting influence over both viral and host gene expression. However, the extensive diversity of structured RNAs and their ability to act in cis or trans positions pose challenges for predicting and assigning their functions. While comparative genomics approaches have successfully predicted candidate structured RNAs in microbes on a large scale, similar efforts for viruses have been lacking. In this study, we screened over 5 million DNA and RNA viral sequences, resulting in the prediction of 10,006 novel candidate structured RNAs. These predictions are widely distributed across taxonomy and ecosystem. We found transcriptional evidence for 206 of these candidate structured RNAs in the human fecal microbiome. These candidate RNAs exhibited evidence of nucleotide covariation, indicative of selective pressure maintaining the predicted secondary structures. Our analysis revealed a diverse repertoire of candidate structured RNAs, encompassing a substantial number of putative tRNAs or tRNA-like structures, Rho-independent transcription terminators, and potentially cis-regulatory structures consistently positioned upstream of genes. In summary, our findings shed light on the extensive diversity of structured RNAs in viruses, offering a valuable resource for further investigations into their functional roles and implications in viral gene expression and pave the way for a deeper understanding of the intricate interplay between viruses and their hosts at the molecular level.
Collapse
Affiliation(s)
- Brayon J. Fremin
- Department of Energy, Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Ami S. Bhatt
- Blood and Marrow Transplantation) and Genetics, Stanford University, Stanford, CA, USA
- Department of Medicine (Hematology, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Nikos C. Kyrpides
- Department of Energy, Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Lead Contact, USA
| |
Collapse
|
11
|
Ballarino M, Pepe G, Helmer-Citterich M, Palma A. Exploring the landscape of tools and resources for the analysis of long non-coding RNAs. Comput Struct Biotechnol J 2023; 21:4706-4716. [PMID: 37841333 PMCID: PMC10568309 DOI: 10.1016/j.csbj.2023.09.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 09/28/2023] [Accepted: 09/28/2023] [Indexed: 10/17/2023] Open
Abstract
In recent years, research on long non-coding RNAs (lncRNAs) has gained considerable attention due to the increasing number of newly identified transcripts. Several characteristics make their functional evaluation challenging, which called for the urgent need to combine molecular biology with other disciplines, including bioinformatics. Indeed, the recent development of computational pipelines and resources has greatly facilitated both the discovery and the mechanisms of action of lncRNAs. In this review, we present a curated collection of the most recent computational resources, which have been categorized into distinct groups: databases and annotation, identification and classification, interaction prediction, and structure prediction. As the repertoire of lncRNAs and their analysis tools continues to expand over the years, standardizing the computational pipelines and improving the existing annotation of lncRNAs will be crucial to facilitate functional genomics studies.
Collapse
Affiliation(s)
- Monica Ballarino
- Department of Biology and Biotechnologies “Charles Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00161 Rome, Italy
| | - Gerardo Pepe
- Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 1, 00133 Rome, Italy
| | - Manuela Helmer-Citterich
- Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 1, 00133 Rome, Italy
| | - Alessandro Palma
- Department of Biology and Biotechnologies “Charles Darwin”, Sapienza University of Rome, Piazzale Aldo Moro 5, 00161 Rome, Italy
| |
Collapse
|
12
|
Santos-Júnior CD, Der Torossian Torres M, Duan Y, del Río ÁR, Schmidt TS, Chong H, Fullam A, Kuhn M, Zhu C, Houseman A, Somborski J, Vines A, Zhao XM, Bork P, Huerta-Cepas J, de la Fuente-Nunez C, Coelho LP. Computational exploration of the global microbiome for antibiotic discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.31.555663. [PMID: 37693522 PMCID: PMC10491242 DOI: 10.1101/2023.08.31.555663] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Novel antibiotics are urgently needed to combat the antibiotic-resistance crisis. We present a machine learning-based approach to predict prokaryotic antimicrobial peptides (AMPs) by leveraging a vast dataset of 63,410 metagenomes and 87,920 microbial genomes. This led to the creation of AMPSphere, a comprehensive catalog comprising 863,498 non-redundant peptides, the majority of which were previously unknown. We observed that AMP production varies by habitat, with animal-associated samples displaying the highest proportion of AMPs compared to other habitats. Furthermore, within different human-associated microbiota, strain-level differences were evident. To validate our predictions, we synthesized and experimentally tested 50 AMPs, demonstrating their efficacy against clinically relevant drug-resistant pathogens both in vitro and in vivo. These AMPs exhibited antibacterial activity by targeting the bacterial membrane. Additionally, AMPSphere provides valuable insights into the evolutionary origins of peptides. In conclusion, our approach identified AMP sequences within prokaryotic microbiomes, opening up new avenues for the discovery of antibiotics.
Collapse
Affiliation(s)
- Célio Dias Santos-Júnior
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Marcelo Der Torossian Torres
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania; Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania; Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania; Philadelphia, Pennsylvania, United States of America
| | - Yiqian Duan
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Álvaro Rodríguez del Río
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón, Madrid, Spain
| | - Thomas S.B. Schmidt
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Hui Chong
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Anthony Fullam
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Michael Kuhn
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Chengkai Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Amy Houseman
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Jelena Somborski
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Anna Vines
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
- Department of Neurology, Zhongshan Hospital, Fudan University, Shanghai, China
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
- International Human Phenome Institute, Shanghai, China
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Max Delbrück Centre for Molecular Medicine, Berlin, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, 28223 Pozuelo de Alarcón, Madrid, Spain
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania; Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania; Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania; Philadelphia, Pennsylvania, United States of America
| | - Luis Pedro Coelho
- Institute of Science and Technology for Brain-Inspired Intelligence - ISTBI, Fudan University, Shanghai, China
| |
Collapse
|
13
|
Anders J, Stadler PF. RNAcode_Web - Convenient identification of evolutionary conserved protein coding regions. J Integr Bioinform 2023; 20:jib-2022-0046. [PMID: 37615674 PMCID: PMC10757073 DOI: 10.1515/jib-2022-0046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 02/15/2023] [Indexed: 08/25/2023] Open
Abstract
The differentiation of regions with coding potential from non-coding regions remains a key task in computational biology. Methods such as RNAcode that exploit patterns of sequence conservation for this task have a substantial advantage in classification accuracy in particular for short coding sequences, compared to methods that rely on a single input sequence. However, they require sequence alignments as input. Frequently, suitable multiple sequence alignments are not readily available and are tedious, and sometimes difficult to construct. We therefore introduce here a new web service that provides access to the well-known coding sequence detector RNAcode with minimal user overhead. It requires as input only a single target nucleotide sequence. The service automates the collection, selection, and preparation of homologous sequences from the NCBI database, as well as the construction of the multiple sequence alignment that are needed as input for RNAcode. The service automatizes the entire pre- and postprocessing and thus makes the investigation of specific genomic regions for previously unannotated coding regions, such as small peptides or additional introns, a simple task that is easily accessible to non-expert users. RNAcode_Web is accessible online at rnacode.bioinf.uni-leipzig.de.
Collapse
Affiliation(s)
- John Anders
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, D-04107Leipzig, Germany
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, D-04107Leipzig, Germany
- Max-Planck-Institute for Mathematics in the Sciences, Inselstraße 22, D-04103Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090Wien, Austria
- Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501, USA
| |
Collapse
|
14
|
Klapproth C, Zötzsche S, Kühnl F, Fallmann J, Stadler P, Findeiß S. Tailored machine learning models for functional RNA detection in genome-wide screens. NAR Genom Bioinform 2023; 5:lqad072. [PMID: 37608800 PMCID: PMC10440787 DOI: 10.1093/nargab/lqad072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 06/28/2023] [Accepted: 07/30/2023] [Indexed: 08/24/2023] Open
Abstract
The in silico prediction of non-coding and protein-coding genetic loci has received considerable attention in comparative genomics aiming in particular at the identification of properties of nucleotide sequences that are informative of their biological role in the cell. We present here a software framework for the alignment-based training, evaluation and application of machine learning models with user-defined parameters. Instead of focusing on the one-size-fits-all approach of pervasive in silico annotation pipelines, we offer a framework for the structured generation and evaluation of models based on arbitrary features and input data, focusing on stable and explainable results. Furthermore, we showcase the usage of our software package in a full-genome screen of Drosophila melanogaster and evaluate our results against the well-known but much less flexible program RNAz.
Collapse
Affiliation(s)
- Christopher Klapproth
- Leipzig University, Department of Computer Science and Interdisciplinary Center of Bioinformatics, Bioinformatics Group, Härtelstrasse 16-18, D-04107 Leipzig, Germany
- ScaDS.AI Leipzig (Center for Scalable Data Analytics and Artificial Intelligence), Humboldtstraße 25, D-04105 Leipzig, Germany
| | - Siegfried Zötzsche
- Leipzig University, Department of Computer Science and Interdisciplinary Center of Bioinformatics, Bioinformatics Group, Härtelstrasse 16-18, D-04107 Leipzig, Germany
| | - Felix Kühnl
- Leipzig University, Department of Computer Science and Interdisciplinary Center of Bioinformatics, Bioinformatics Group, Härtelstrasse 16-18, D-04107 Leipzig, Germany
| | - Jörg Fallmann
- Leipzig University, Department of Computer Science and Interdisciplinary Center of Bioinformatics, Bioinformatics Group, Härtelstrasse 16-18, D-04107 Leipzig, Germany
| | - Peter F Stadler
- Leipzig University, Department of Computer Science and Interdisciplinary Center of Bioinformatics, Bioinformatics Group, Härtelstrasse 16-18, D-04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Science, Inselstraße 22, D-04103 Leipzig, Germany
- University of Vienna, Institute for Theoretical Chemistry, Währingerstraße 17, A-1090 Vienna, Austria
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe NM 97501, USA
- Universidad Nacional de Colombia, Facultad de Ciencias, Bogotá, D.C., Colombia
| | - Sven Findeiß
- Leipzig University, Department of Computer Science and Interdisciplinary Center of Bioinformatics, Bioinformatics Group, Härtelstrasse 16-18, D-04107 Leipzig, Germany
| |
Collapse
|
15
|
Dong X, Zhang K, Xun C, Chu T, Liang S, Zeng Y, Liu Z. Small Open Reading Frame-Encoded Micro-Peptides: An Emerging Protein World. Int J Mol Sci 2023; 24:10562. [PMID: 37445739 DOI: 10.3390/ijms241310562] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 06/20/2023] [Accepted: 06/21/2023] [Indexed: 07/15/2023] Open
Abstract
Small open reading frames (sORFs) are often overlooked features in genomes. In the past, they were labeled as noncoding or "transcriptional noise". However, accumulating evidence from recent years suggests that sORFs may be transcribed and translated to produce sORF-encoded polypeptides (SEPs) with less than 100 amino acids. The vigorous development of computational algorithms, ribosome profiling, and peptidome has facilitated the prediction and identification of many new SEPs. These SEPs were revealed to be involved in a wide range of basic biological processes, such as gene expression regulation, embryonic development, cellular metabolism, inflammation, and even carcinogenesis. To effectively understand the potential biological functions of SEPs, we discuss the history and development of the newly emerging research on sORFs and SEPs. In particular, we review a range of recently discovered bioinformatics tools for identifying, predicting, and validating SEPs as well as a variety of biochemical experiments for characterizing SEP functions. Lastly, this review underlines the challenges and future directions in identifying and validating sORFs and their encoded micropeptides, providing a significant reference for upcoming research on sORF-encoded peptides.
Collapse
Affiliation(s)
- Xiaoping Dong
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Kun Zhang
- The State Key Laboratory of Developmental Biology of Freshwater Fish, College of Life Science, Hunan Normal University, Changsha 410081, China
| | - Chengfeng Xun
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Tianqi Chu
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Songping Liang
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Yong Zeng
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
- The State Key Laboratory of Developmental Biology of Freshwater Fish, College of Life Science, Hunan Normal University, Changsha 410081, China
| | - Zhonghua Liu
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| |
Collapse
|
16
|
Asemi R, Rajabpoor Nikoo N, Asemi Z, Shafabakhsh R, Hajijafari M, Sharifi M, Homayoonfal M, Davoodvandi A, Hakamifard A. Modulation of long non-coding RNAs by resveratrol as a potential therapeutic approach in cancer: A comprehensive review. Pathol Res Pract 2023; 246:154507. [PMID: 37196467 DOI: 10.1016/j.prp.2023.154507] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 05/05/2023] [Accepted: 05/05/2023] [Indexed: 05/19/2023]
Abstract
LncRNAs, or long non-coding RNAs, are a subset of RNAs that play a regulatory role in a wide range of biological functions, including RNA processing, epigenetic regulation, and signal transduction. Recent research indicates that lncRNAs play a key role in the development and spread of cancer by being dysregulated in the disease. In addition, lncRNAs have been linked to the overexpression of certain proteins that are involved in tumor development and progression. Resveratrol has anti-inflammatory and anti-cancer properties that it exerts through regulating different lncRNAs. By the regulation of tumor-supportive and tumor-suppressive lncRNAs, resveratrol acts as an anti-cancer agent. By downregulating the tumor-supportive lncRNAs DANCR, MALAT1, CCAT1, CRNDE, HOTAIR, PCAT1, PVT1, SNHG16, AK001796, DIO3OS, GAS5 and H19, and upregulating MEG3, PTTG3P, BISPR, PCAT29, GAS5, LOC146880, HOTAIR, PCA3, NBR2, this herbal remedy causes apoptosis and cytotoxicity. For the purpose of using polyphenols in cancer therapy, it would be helpful to have more in-depth knowledge about lncRNA modulation via resveratrol. Here, we discuss the current knowledge and future promise of resveratrol as modulators of lncRNAs in different cancers.
Collapse
Affiliation(s)
- Reza Asemi
- Department of Internal Medicine, School of Medicine, Cancer Prevention Research Center, Seyyed Al-Shohada Hospital, Isfahan University of Medical Sciences, Isfahan, Islamic Republic of Iran.
| | - Nesa Rajabpoor Nikoo
- Department of Gynecology and Obstetrics, Tehran University of Medical Sciences, Tehran, Islamic Republic of Iran.
| | - Zatollah Asemi
- Research Center for Biochemistry and Nutrition in Metabolic Diseases, Institute for Basic Sciences, Kashan University of Medical Sciences, Kashan, Islamic Republic of Iran.
| | - Rana Shafabakhsh
- Research Center for Biochemistry and Nutrition in Metabolic Diseases, Institute for Basic Sciences, Kashan University of Medical Sciences, Kashan, Islamic Republic of Iran.
| | - Mohammad Hajijafari
- Department of Anesthesiology, School of Medicine, Kashan University of Medical Sciences, Kashan, Islamic Republic of Iran.
| | - Mehran Sharifi
- Department of Internal Medicine, School of Medicine, Cancer Prevention Research Center, Seyyed Al-Shohada Hospital, Isfahan University of Medical Sciences, Isfahan, Islamic Republic of Iran.
| | - Mina Homayoonfal
- Research Center for Biochemistry and Nutrition in Metabolic Diseases, Institute for Basic Sciences, Kashan University of Medical Sciences, Kashan, Islamic Republic of Iran.
| | - Amirhossein Davoodvandi
- Cancer Immunology Project (CIP), Universal Scientific Education and Research Network (USERN), Tehran, Islamic Republic of Iran.
| | - Atousa Hakamifard
- Department of Infectious Diseases, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Islamic Republic of Iran.
| |
Collapse
|
17
|
Long Non-Coding RNAs of Plants in Response to Abiotic Stresses and Their Regulating Roles in Promoting Environmental Adaption. Cells 2023; 12:cells12050729. [PMID: 36899864 PMCID: PMC10001313 DOI: 10.3390/cells12050729] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 02/10/2023] [Accepted: 02/21/2023] [Indexed: 03/03/2023] Open
Abstract
Abiotic stresses triggered by climate change and human activity cause substantial agricultural and environmental problems which hamper plant growth. Plants have evolved sophisticated mechanisms in response to abiotic stresses, such as stress perception, epigenetic modification, and regulation of transcription and translation. Over the past decade, a large body of literature has revealed the various regulatory roles of long non-coding RNAs (lncRNAs) in the plant response to abiotic stresses and their irreplaceable functions in environmental adaptation. LncRNAs are recognized as a class of ncRNAs that are longer than 200 nucleotides, influencing a variety of biological processes. In this review, we mainly focused on the recent progress of plant lncRNAs, outlining their features, evolution, and functions of plant lncRNAs in response to drought, low or high temperature, salt, and heavy metal stress. The approaches to characterize the function of lncRNAs and the mechanisms of how they regulate plant responses to abiotic stresses were further reviewed. Moreover, we discuss the accumulating discoveries regarding the biological functions of lncRNAs on plant stress memory as well. The present review provides updated information and directions for us to characterize the potential functions of lncRNAs in abiotic stresses in the future.
Collapse
|
18
|
Chothani S, Ho L, Schafer S, Rackham O. Discovering microproteins: making the most of ribosome profiling data. RNA Biol 2023; 20:943-954. [PMID: 38013207 PMCID: PMC10730196 DOI: 10.1080/15476286.2023.2279845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/30/2023] [Indexed: 11/29/2023] Open
Abstract
Building a reference set of protein-coding open reading frames (ORFs) has revolutionized biological process discovery and understanding. Traditionally, gene models have been confirmed using cDNA sequencing and encoded translated regions inferred using sequence-based detection of start and stop combinations longer than 100 amino-acids to prevent false positives. This has led to small ORFs (smORFs) and their encoded proteins left un-annotated. Ribo-seq allows deciphering translated regions from untranslated irrespective of the length. In this review, we describe the power of Ribo-seq data in detection of smORFs while discussing the major challenge posed by data-quality, -depth and -sparseness in identifying the start and end of smORF translation. In particular, we outline smORF cataloguing efforts in humans and the large differences that have arisen due to variation in data, methods and assumptions. Although current versions of smORF reference sets can already be used as a powerful tool for hypothesis generation, we recommend that future editions should consider these data limitations and adopt unified processing for the community to establish a canonical catalogue of translated smORFs.
Collapse
Affiliation(s)
- Sonia Chothani
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore
| | - Lena Ho
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore
| | - Sebastian Schafer
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore
| | - Owen Rackham
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore
- School of Biological Sciences, University of Southampton, Southampton, UK
- The Alan Turing Institute, The British Library, London, UK
| |
Collapse
|
19
|
Olo Ndela E, Roux S, Henke C, Sczyrba A, Sime Ngando T, Varsani A, Enault F. Reekeekee- and roodoodooviruses, two different Microviridae clades constituted by the smallest DNA phages. Virus Evol 2022; 9:veac123. [PMID: 36694818 PMCID: PMC9865509 DOI: 10.1093/ve/veac123] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 10/19/2022] [Accepted: 12/22/2022] [Indexed: 12/25/2022] Open
Abstract
Small circular single-stranded DNA viruses of the Microviridae family are both prevalent and diverse in all ecosystems. They usually harbor a genome between 4.3 and 6.3 kb, with a microvirus recently isolated from a marine Alphaproteobacteria being the smallest known genome of a DNA phage (4.248 kb). A subfamily, Amoyvirinae, has been proposed to classify this virus and other related small Alphaproteobacteria-infecting phages. Here, we report the discovery, in meta-omics data sets from various aquatic ecosystems, of sixteen complete microvirus genomes significantly smaller (2.991-3.692 kb) than known ones. Phylogenetic analysis reveals that these sixteen genomes represent two related, yet distinct and diverse, novel groups of microviruses-amoyviruses being their closest known relatives. We propose that these small microviruses are members of two tentatively named subfamilies Reekeekeevirinae and Roodoodoovirinae. As known microvirus genomes encode many overlapping and overprinted genes that are not identified by gene prediction software, we developed a new methodology to identify all genes based on protein conservation, amino acid composition, and selection pressure estimations. Surprisingly, only four to five genes could be identified per genome, with the number of overprinted genes lower than that in phiX174. These small genomes thus tend to have both a lower number of genes and a shorter length for each gene, leaving no place for variable gene regions that could harbor overprinted genes. Even more surprisingly, these two Microviridae groups had specific and different gene content, and major differences in their conserved protein sequences, highlighting that these two related groups of small genome microviruses use very different strategies to fulfill their lifecycle with such a small number of genes. The discovery of these genomes and the detailed prediction and annotation of their genome content expand our understanding of ssDNA phages in nature and are further evidence that these viruses have explored a wide range of possibilities during their long evolution.
Collapse
Affiliation(s)
| | | | - Christian Henke
- Computational Metagenomics, Bielefeld University, Universitätsstraße 27, Bielefeld 30501, Germany,Center for Biotechnology, Bielefeld University, Universitätsstraße 27, Bielefeld 33615, Germany
| | - Alexander Sczyrba
- Computational Metagenomics, Bielefeld University, Universitätsstraße 27, Bielefeld 30501, Germany,Center for Biotechnology, Bielefeld University, Universitätsstraße 27, Bielefeld 33615, Germany
| | - Télesphore Sime Ngando
- Université Clermont Auvergne, CNRS, Laboratoire Microorganismes: Genome et Environnement, Clermont-Ferrand F-63000, France
| | | | | |
Collapse
|
20
|
Vakirlis N, Vance Z, Duggan KM, McLysaght A. De novo birth of functional microproteins in the human lineage. Cell Rep 2022; 41:111808. [PMID: 36543139 PMCID: PMC10073203 DOI: 10.1016/j.celrep.2022.111808] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 06/21/2022] [Accepted: 11/18/2022] [Indexed: 12/24/2022] Open
Abstract
Small open reading frames (sORFs) can encode functional "microproteins" that perform crucial biological tasks. However, their size makes them less amenable to genomic analysis, and their origins and conservation are poorly understood. Given their short length, it is plausible that some of these functional microproteins have recently originated entirely de novo from noncoding sequences. Here we sought to identify such cases in the human lineage by reconstructing the evolutionary origins of human microproteins previously found to have measurable, statistically significant fitness effects. By tracing the formation of each ORF and its transcriptional activation, we show that novel microproteins with significant phenotypic effects have emerged de novo throughout animal evolution, including two after the human-chimpanzee split. Notably, traditional methods for assessing coding potential would miss most of these cases. This evidence demonstrates that the functional potential intrinsic to sORFs can be relatively rapidly and frequently realized through de novo gene emergence.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari, Greece.
| | - Zoe Vance
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Kate M Duggan
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland.
| |
Collapse
|
21
|
Kerachian MA, Azghandi M. Identification of long non-coding RNA using single nucleotide epimutation analysis: a novel gene discovery approach. Cancer Cell Int 2022; 22:337. [PMID: 36333783 PMCID: PMC9636742 DOI: 10.1186/s12935-022-02752-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 10/12/2022] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) are involved in a variety of mechanisms related to tumorigenesis by functioning as oncogenes or tumor-suppressors or even harboring oncogenic and tumor-suppressing effects; representing a new class of cancer biomarkers and therapeutic targets. It is predicted that more than 35,000 ncRNA especially lncRNA are positioned at the intergenic regions of the human genome. Emerging research indicates that one of the key pathways controlling lncRNA expression and tissue specificity is epigenetic regulation. METHODS In the current article, a novel approach for lncRNA discovery based on the intergenic position of most lncRNAs and a single CpG site methylation level representing epigenetic characteristics has been suggested. RESULTS Using this method, a novel antisense lncRNA named LINC02892 presenting three transcripts without the capacity of coding a protein was found exhibiting nuclear, cytoplasmic, and exosome distributions. CONCLUSION The current discovery strategy could be applied to identify novel non-coding RNAs influenced by methylation aberrations.
Collapse
Affiliation(s)
- Mohammad Amin Kerachian
- Medical Genetics Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.
- Department of Medical Genetics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
- Cancer Genetics Research Unit, Reza Radiotherapy and Oncology Center, Mashhad, Iran.
- Department of Chemistry and Biology, Toronto Metropolitan University, Toronto, ON, Canada.
| | - Marjan Azghandi
- Cancer Genetics Research Unit, Reza Radiotherapy and Oncology Center, Mashhad, Iran
- Department of Animal Science, Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad, Iran
| |
Collapse
|
22
|
Cohen D. General Designs Reveal Distinct Codes in Protein-Coding and Non-Coding Human DNA. Genes (Basel) 2022; 13:1970. [PMID: 36360206 PMCID: PMC9690640 DOI: 10.3390/genes13111970] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 10/19/2022] [Accepted: 10/22/2022] [Indexed: 08/27/2023] Open
Abstract
This study seeks to investigate distinct signatures and codes within different genomic sequence locations of the human genome. The promoter and other non-coding regions contain sites for the binding of biological particles, for processes such as transcription regulation. The specific rules and sequence codes that govern this remain poorly understood. To derive these (codes), the general designs of sequence are investigated. Genomic signatures are a powerful tool for assessing the general designs of sequence, and cross-comparing different genomic regions for their distinct sequence properties. Through these genomic signatures, the relative non-random properties of sequences are also assessed. Furthermore, a binary components analysis is carried out making use of information theory ideas, to study the RY (purine/pyrimidine), WS (weak/strong) and KM (keto/amino) signatures in the sequences. From this comparison, it is possible to identify the relative importance of these properties within the various protein-coding and non-coding genomic locations. The results show that coding DNA has a strongly non-random WS signature, which reflects the genetic code, and the hydrogen-bond base pairing of codon-anti-codon interactions. In contrast, non-coding locations, such as the promoter, contain a distinct genomic signature. A prominent feature throughout non-coding DNA is a highly non-random RY signature, which is very different in nature to coding DNA, and suggests a structural-based RY code. This marks progress towards deciphering the unknown code(s) in non-protein-coding DNA, and a further understanding of the coding DNA. Additionally, it unravels how DNA carries information. These findings have implications for the most fundamental principles of biology, including knowledge of gene regulation, development and disease.
Collapse
Affiliation(s)
- Dana Cohen
- Ronin Institute, 127 Haddon Pl, Montclair, NJ 07043-2314, USA
| |
Collapse
|
23
|
Flnc: Machine Learning Improves the Identification of Novel Long Noncoding RNAs from Stand-Alone RNA-Seq Data. Noncoding RNA 2022; 8:ncrna8050070. [DOI: 10.3390/ncrna8050070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 10/01/2022] [Accepted: 10/06/2022] [Indexed: 11/16/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) play critical regulatory roles in human development and disease. Although there are over 100,000 samples with available RNA sequencing (RNA-seq) data, many lncRNAs have yet to be annotated. The conventional approach to identifying novel lncRNAs from RNA-seq data is to find transcripts without coding potential but this approach has a false discovery rate of 30–75%. Other existing methods either identify only multi-exon lncRNAs, missing single-exon lncRNAs, or require transcriptional initiation profiling data (such as H3K4me3 ChIP-seq data), which is unavailable for many samples with RNA-seq data. Because of these limitations, current methods cannot accurately identify novel lncRNAs from existing RNA-seq data. To address this problem, we have developed software, Flnc, to accurately identify both novel and annotated full-length lncRNAs, including single-exon lncRNAs, directly from RNA-seq data without requiring transcriptional initiation profiles. Flnc integrates machine learning models built by incorporating four types of features: transcript length, promoter signature, multiple exons, and genomic location. Flnc achieves state-of-the-art prediction power with an AUROC score over 0.92. Flnc significantly improves the prediction accuracy from less than 50% using the conventional approach to over 85%. Flnc is available via GitHub platform.
Collapse
|
24
|
Grzejda D, Mach J, Schweizer JA, Hummel B, Rezansoff AM, Eggenhofer F, Panhale A, Lalioti ME, Cabezas Wallscheid N, Backofen R, Felsenberg J, Hilgers V. The long noncoding RNA mimi scaffolds neuronal granules to maintain nervous system maturity. SCIENCE ADVANCES 2022; 8:eabo5578. [PMID: 36170367 PMCID: PMC9519039 DOI: 10.1126/sciadv.abo5578] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 08/15/2022] [Indexed: 05/29/2023]
Abstract
RNA binding proteins and messenger RNAs (mRNAs) assemble into ribonucleoprotein granules that regulate mRNA trafficking, local translation, and turnover. The dysregulation of RNA-protein condensation disturbs synaptic plasticity and neuron survival and has been widely associated with human neurological disease. Neuronal granules are thought to condense around particular proteins that dictate the identity and composition of each granule type. Here, we show in Drosophila that a previously uncharacterized long noncoding RNA, mimi, is required to scaffold large neuronal granules in the adult nervous system. Neuronal ELAV-like proteins directly bind mimi and mediate granule assembly, while Staufen maintains condensate integrity. mimi granules contain mRNAs and proteins involved in synaptic processes; granule loss in mimi mutant flies impairs nervous system maturity and neuropeptide-mediated signaling and causes phenotypes of neurodegeneration. Our work reports an architectural RNA for a neuronal granule and provides a handle to interrogate functions of a condensate independently of those of its constituent proteins.
Collapse
Affiliation(s)
- Dominika Grzejda
- Max-Planck-Institute of Immunobiology and Epigenetics, Freiburg 79108, Germany
- Faculty of Biology, Albert Ludwig University of Freiburg, Freiburg 79104, Germany
- International Max Planck Research School for Molecular and Cellular Biology (IMPRS- MCB), Freiburg 79108, Germany
| | - Jana Mach
- Max-Planck-Institute of Immunobiology and Epigenetics, Freiburg 79108, Germany
| | - Johanna Aurelia Schweizer
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel 4058, Switzerland
- University of Basel, Basel 4001, Switzerland
| | - Barbara Hummel
- Max-Planck-Institute of Immunobiology and Epigenetics, Freiburg 79108, Germany
| | | | - Florian Eggenhofer
- Department of Computer Science, Albert Ludwig University of Freiburg, Freiburg 79110, Germany
| | - Amol Panhale
- Max-Planck-Institute of Immunobiology and Epigenetics, Freiburg 79108, Germany
| | - Maria-Eleni Lalioti
- Max-Planck-Institute of Immunobiology and Epigenetics, Freiburg 79108, Germany
| | | | - Rolf Backofen
- Department of Computer Science, Albert Ludwig University of Freiburg, Freiburg 79110, Germany
- BIOSS and CIBSS Centres for Biological Signalling Studies, University of Freiburg, Freiburg 79104, Germany
| | - Johannes Felsenberg
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel 4058, Switzerland
| | - Valérie Hilgers
- Max-Planck-Institute of Immunobiology and Epigenetics, Freiburg 79108, Germany
- CIBSS Centre for Integrative Biological Signalling Studies, University of Freiburg, Freiburg 79104, Germany
| |
Collapse
|
25
|
Chothani SP, Adami E, Widjaja AA, Langley SR, Viswanathan S, Pua CJ, Zhihao NT, Harmston N, D'Agostino G, Whiffin N, Mao W, Ouyang JF, Lim WW, Lim S, Lee CQE, Grubman A, Chen J, Kovalik JP, Tryggvason K, Polo JM, Ho L, Cook SA, Rackham OJL, Schafer S. A high-resolution map of human RNA translation. Mol Cell 2022; 82:2885-2899.e8. [PMID: 35841888 DOI: 10.1016/j.molcel.2022.06.023] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 03/10/2022] [Accepted: 06/15/2022] [Indexed: 10/17/2022]
Abstract
Translated small open reading frames (smORFs) can have important regulatory roles and encode microproteins, yet their genome-wide identification has been challenging. We determined the ribosome locations across six primary human cell types and five tissues and detected 7,767 smORFs with translational profiles matching those of known proteins. The human genome was found to contain highly cell-type- and tissue-specific smORFs and a subset that encodes highly conserved amino acid sequences. Changes in the translational efficiency of upstream-encoded smORFs (uORFs) and the corresponding main ORFs predominantly occur in the same direction. Integration with 456 mass-spectrometry datasets confirms the presence of 603 small peptides at the protein level in humans and provides insights into the subcellular localization of these small proteins. This study provides a comprehensive atlas of high-confidence translated smORFs derived from primary human cells and tissues in order to provide a more complete understanding of the translated human genome.
Collapse
Affiliation(s)
- Sonia P Chothani
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore 169857, Singapore
| | - Eleonora Adami
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore 169857, Singapore; Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Anissa A Widjaja
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore 169857, Singapore
| | - Sarah R Langley
- Lee Kong Chian School of Medicine, Nanyang Technological University, Clinical Sciences Building, Singapore 308232, Singapore
| | - Sivakumar Viswanathan
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore 169857, Singapore
| | - Chee Jian Pua
- National Heart Research Institute Singapore (NHRIS), National Heart Centre Singapore, Singapore 169609, Singapore
| | - Nevin Tham Zhihao
- Lee Kong Chian School of Medicine, Nanyang Technological University, Clinical Sciences Building, Singapore 308232, Singapore
| | - Nathan Harmston
- Program in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore 169857, Singapore; Science Division, Yale-NUS College, Singapore 138527, Singapore
| | - Giuseppe D'Agostino
- Lee Kong Chian School of Medicine, Nanyang Technological University, Clinical Sciences Building, Singapore 308232, Singapore
| | - Nicola Whiffin
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | - Wang Mao
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore 169857, Singapore
| | - John F Ouyang
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore 169857, Singapore
| | - Wei Wen Lim
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore 169857, Singapore; National Heart Research Institute Singapore (NHRIS), National Heart Centre Singapore, Singapore 169609, Singapore
| | - Shiqi Lim
- National Heart Research Institute Singapore (NHRIS), National Heart Centre Singapore, Singapore 169609, Singapore
| | - Cheryl Q E Lee
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore 169857, Singapore
| | - Alexandra Grubman
- Department of Anatomy and Developmental Biology, Monash University, Wellington Road, Clayton, VIC 3800, Australia; Development and Stem Cells Program, Monash Biomedicine Discovery Institute, Wellington Road, Clayton, VIC 3800, Australia; Australian Regenerative Medicine Institute, Monash University, Wellington Road, Clayton, VIC 3800, Australia
| | - Joseph Chen
- Department of Anatomy and Developmental Biology, Monash University, Wellington Road, Clayton, VIC 3800, Australia; Development and Stem Cells Program, Monash Biomedicine Discovery Institute, Wellington Road, Clayton, VIC 3800, Australia; Australian Regenerative Medicine Institute, Monash University, Wellington Road, Clayton, VIC 3800, Australia
| | - J P Kovalik
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore 169857, Singapore
| | - Karl Tryggvason
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore 169857, Singapore
| | - Jose M Polo
- Department of Anatomy and Developmental Biology, Monash University, Wellington Road, Clayton, VIC 3800, Australia; Development and Stem Cells Program, Monash Biomedicine Discovery Institute, Wellington Road, Clayton, VIC 3800, Australia; Australian Regenerative Medicine Institute, Monash University, Wellington Road, Clayton, VIC 3800, Australia
| | - Lena Ho
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore 169857, Singapore
| | - Stuart A Cook
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore 169857, Singapore; National Heart Research Institute Singapore (NHRIS), National Heart Centre Singapore, Singapore 169609, Singapore; London Institute of Medical Sciences, London W12 ONN, UK
| | - Owen J L Rackham
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore 169857, Singapore; School of Biological Sciences, University of Southampton, Southampton, UK.
| | - Sebastian Schafer
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore, Singapore 169857, Singapore; National Heart Research Institute Singapore (NHRIS), National Heart Centre Singapore, Singapore 169609, Singapore.
| |
Collapse
|
26
|
Identification and analysis of smORFs in Chlamydomonas reinhardtii. Genomics 2022; 114:110444. [PMID: 35933072 DOI: 10.1016/j.ygeno.2022.110444] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 07/06/2022] [Accepted: 07/31/2022] [Indexed: 11/24/2022]
Abstract
Small open reading frames (smORFs) have been acknowledged as an important partner in organism functions ranging from bacteria to higher eukaryotes. However, lack of investigation of smORFs in green algae, despite their importance in ecology and evolution. We applied bioinformatic analysis, ribosome profiling, and small peptide proteomics to provide a genome-wide and high-confident smORF database in the model green alga Chlamydomonas reinhardtii. The whole genome was screened first to mine potential coding smORFs. Then conservative analysis, ribosome profiling, and proteomics data were processed to identify conserved smORFs and generate translation evidence. The combination of procedures resulted in 2014 smORFs that might exist in the C. reinhardtii genome. The expression of smORFs in Cd treatment suggested that two smORFs might participate in redox reaction, three in inorganic phosphate transport, and one in DNA repair under stress. Our study built a genome-widely database in C. reinhardtii, providing target smORFs for further research.
Collapse
|
27
|
Bonidia RP, Santos APA, de Almeida BLS, Stadler PF, da Rocha UN, Sanches DS, de Carvalho ACPLF. BioAutoML: automated feature engineering and metalearning to predict noncoding RNAs in bacteria. Brief Bioinform 2022; 23:6618238. [PMID: 35753697 PMCID: PMC9294424 DOI: 10.1093/bib/bbac218] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 05/06/2022] [Accepted: 05/09/2022] [Indexed: 01/19/2023] Open
Abstract
Recent technological advances have led to an exponential expansion of biological sequence data and extraction of meaningful information through Machine Learning (ML) algorithms. This knowledge has improved the understanding of mechanisms related to several fatal diseases, e.g. Cancer and coronavirus disease 2019, helping to develop innovative solutions, such as CRISPR-based gene editing, coronavirus vaccine and precision medicine. These advances benefit our society and economy, directly impacting people’s lives in various areas, such as health care, drug discovery, forensic analysis and food processing. Nevertheless, ML-based approaches to biological data require representative, quantitative and informative features. Many ML algorithms can handle only numerical data, and therefore sequences need to be translated into a numerical feature vector. This process, known as feature extraction, is a fundamental step for developing high-quality ML-based models in bioinformatics, by allowing the feature engineering stage, with design and selection of suitable features. Feature engineering, ML algorithm selection and hyperparameter tuning are often manual and time-consuming processes, requiring extensive domain knowledge. To deal with this problem, we present a new package: BioAutoML. BioAutoML automatically runs an end-to-end ML pipeline, extracting numerical and informative features from biological sequence databases, using the MathFeature package, and automating the feature selection, ML algorithm(s) recommendation and tuning of the selected algorithm(s) hyperparameters, using Automated ML (AutoML). BioAutoML has two components, divided into four modules: (1) automated feature engineering (feature extraction and selection modules) and (2) Metalearning (algorithm recommendation and hyper-parameter tuning modules). We experimentally evaluate BioAutoML in two different scenarios: (i) prediction of the three main classes of noncoding RNAs (ncRNAs) and (ii) prediction of the eight categories of ncRNAs in bacteria, including housekeeping and regulatory types. To assess BioAutoML predictive performance, it is experimentally compared with two other AutoML tools (RECIPE and TPOT). According to the experimental results, BioAutoML can accelerate new studies, reducing the cost of feature engineering processing and either keeping or improving predictive performance. BioAutoML is freely available at https://github.com/Bonidia/BioAutoML.
Collapse
Affiliation(s)
- Robson P Bonidia
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
| | - Anderson P Avila Santos
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil.,Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ GmbH, Leipzig, Saxony, Germany
| | - Breno L S de Almeida
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
| | - Peter F Stadler
- Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, Saxony, Germany
| | - Ulisses N da Rocha
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ GmbH, Leipzig, Saxony, Germany
| | - Danilo S Sanches
- Department of Computer Science, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio 86300-000, Brazil
| | - André C P L F de Carvalho
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
| |
Collapse
|
28
|
Thousands of small, novel genes predicted in global phage genomes. Cell Rep 2022; 39:110984. [PMID: 35732113 DOI: 10.1016/j.celrep.2022.110984] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 02/14/2022] [Accepted: 05/27/2022] [Indexed: 11/22/2022] Open
Abstract
Small genes (<150 nucleotides) have been systematically overlooked in phage genomes. We employ a large-scale comparative genomics approach to predict >40,000 small-gene families in ∼2.3 million phage genome contigs. We find that small genes in phage genomes are approximately 3-fold more prevalent than in host prokaryotic genomes. Our approach enriches for small genes that are translated in microbiomes, suggesting the small genes identified are coding. More than 9,000 families encode potentially secreted or transmembrane proteins, more than 5,000 families encode predicted anti-CRISPR proteins, and more than 500 families encode predicted antimicrobial proteins. By combining homology and genomic-neighborhood analyses, we reveal substantial novelty and diversity within phage biology, including small phage genes found in multiple host phyla, small genes encoding proteins that play essential roles in host infection, and small genes that share genomic neighborhoods and whose encoded proteins may share related functions.
Collapse
|
29
|
Pan J, Wang R, Shang F, Ma R, Rong Y, Zhang Y. Functional Micropeptides Encoded by Long Non-Coding RNAs: A Comprehensive Review. Front Mol Biosci 2022; 9:817517. [PMID: 35769907 PMCID: PMC9234465 DOI: 10.3389/fmolb.2022.817517] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 05/24/2022] [Indexed: 12/03/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) were originally defined as non-coding RNAs (ncRNAs) which lack protein-coding ability. However, with the emergence of technologies such as ribosome profiling sequencing and ribosome-nascent chain complex sequencing, it has been demonstrated that most lncRNAs have short open reading frames hence the potential to encode functional micropeptides. Such micropeptides have been described to be widely involved in life-sustaining activities in several organisms, such as homeostasis regulation, disease, and tumor occurrence, and development, and morphological development of animals, and plants. In this review, we focus on the latest developments in the field of lncRNA-encoded micropeptides, and describe the relevant computational tools and techniques for micropeptide prediction and identification. This review aims to serve as a reference for future research studies on lncRNA-encoded micropeptides.
Collapse
Affiliation(s)
- Jianfeng Pan
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Ruijun Wang
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
- Key Laboratory of Mutton Sheep Genetics and Breeding, Ministry of Agriculture, Hohhot, China
- Key Laboratory of Animal Genetics, Breeding and Reproduction, Hohhot, China
- Engineering Research Center for Goat Genetics and Breeding, Hohhot, China
| | - Fangzheng Shang
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Rong Ma
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Youjun Rong
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Yanjun Zhang
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
- Key Laboratory of Mutton Sheep Genetics and Breeding, Ministry of Agriculture, Hohhot, China
- Key Laboratory of Animal Genetics, Breeding and Reproduction, Hohhot, China
- Engineering Research Center for Goat Genetics and Breeding, Hohhot, China
- *Correspondence: Yanjun Zhang,
| |
Collapse
|
30
|
Cancer-related micropeptides encoded by ncRNAs: Promising drug targets and prognostic biomarkers. Cancer Lett 2022; 547:215723. [DOI: 10.1016/j.canlet.2022.215723] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 04/14/2022] [Accepted: 05/01/2022] [Indexed: 02/07/2023]
|
31
|
Li M, Sun C, Xu N, Bian P, Tian X, Wang X, Wang Y, Jia X, Heller R, Wang M, Wang F, Dai X, Luo R, Guo Y, Wang X, Yang P, Hu D, Liu Z, Fu W, Zhang S, Li X, Wen C, Lan F, Siddiki AZ, Suwannapoom C, Zhao X, Nie Q, Hu X, Jiang Y, Yang N. De novo assembly of 20 chicken genomes reveals the undetectable phenomenon for thousands of core genes on micro-chromosomes and sub-telomeric regions. Mol Biol Evol 2022; 39:6553873. [PMID: 35325213 PMCID: PMC9021737 DOI: 10.1093/molbev/msac066] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The gene numbers and evolutionary rates of birds were assumed to be much lower than those of mammals, which is in sharp contrast to the huge species number and morphological diversity of birds. It is, therefore, necessary to construct a complete avian genome and analyze its evolution. We constructed a chicken pan-genome from 20 de novo assembled genomes with high sequencing depth, and identified 1,335 protein-coding genes and 3,011 long noncoding RNAs not found in GRCg6a. The majority of these novel genes were detected across most individuals of the examined transcriptomes but were seldomly measured in each of the DNA sequencing data regardless of Illumina or PacBio technology. Furthermore, different from previous pan-genome models, most of these novel genes were overrepresented on chromosomal subtelomeric regions and microchromosomes, surrounded by extremely high proportions of tandem repeats, which strongly blocks DNA sequencing. These hidden genes were proved to be shared by all chicken genomes, included many housekeeping genes, and enriched in immune pathways. Comparative genomics revealed the novel genes had 3-fold elevated substitution rates than known ones, updating the knowledge about evolutionary rates in birds. Our study provides a framework for constructing a better chicken genome, which will contribute toward the understanding of avian evolution and the improvement of poultry breeding.
Collapse
Affiliation(s)
- Ming Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Congjiao Sun
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100193, China
| | - Naiyi Xu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Peipei Bian
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Xiaomeng Tian
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Xihong Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Yuzhe Wang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China.,National Research Facility for Phenotypic and Genotypic Analysis of Model Animals (Beijing), China Agricultural University, Beijing 100193, China
| | - Xinzheng Jia
- Department of Animal Science, Iowa State University, Ames, IA 50011, USA.,School of Life Science and Engineering, Foshan University, Foshan 528225, China
| | - Rasmus Heller
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen N 2200, Denmark
| | - Mingshan Wang
- Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.,Department of Ecology and Evolutionary Biology, University of California Santa Cruz, CA 95064, USA
| | - Fei Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Xuelei Dai
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Rongsong Luo
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Yingwei Guo
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Xiangnan Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Peng Yang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Dexiang Hu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Zhenyu Liu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Weiwei Fu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Shunjin Zhang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Xiaochang Li
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100193, China
| | - Chaoliang Wen
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100193, China
| | - Fangren Lan
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100193, China
| | - Amam Zonaed Siddiki
- Department of Pathology and Parasitology, Faculty of Veterinary Medicine, Chittagong Veterinary and Animal Sciences University, Chittagong-4202, Bangladesh
| | | | - Xin Zhao
- Department of Animal Science, McGill University, Montreal, Quebec, Canada
| | - Qinghua Nie
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou, 510642, Guangdong, China
| | - Xiaoxiang Hu
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China.,Center for Functional Genomics, Institute of Future Agriculture, Northwest A&F University
| | - Ning Yang
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, China Agricultural University, Beijing 100193, China
| |
Collapse
|
32
|
Coelho LP, Alves R, del Río ÁR, Myers PN, Cantalapiedra CP, Giner-Lamia J, Schmidt TS, Mende DR, Orakov A, Letunic I, Hildebrand F, Van Rossum T, Forslund SK, Khedkar S, Maistrenko OM, Pan S, Jia L, Ferretti P, Sunagawa S, Zhao XM, Nielsen HB, Huerta-Cepas J, Bork P. Towards the biogeography of prokaryotic genes. Nature 2022; 601:252-256. [PMID: 34912116 PMCID: PMC7613196 DOI: 10.1038/s41586-021-04233-4] [Citation(s) in RCA: 70] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2019] [Accepted: 11/12/2021] [Indexed: 12/19/2022]
Abstract
Microbial genes encode the majority of the functional repertoire of life on earth. However, despite increasing efforts in metagenomic sequencing of various habitats1-3, little is known about the distribution of genes across the global biosphere, with implications for human and planetary health. Here we constructed a non-redundant gene catalogue of 303 million species-level genes (clustered at 95% nucleotide identity) from 13,174 publicly available metagenomes across 14 major habitats and use it to show that most genes are specific to a single habitat. The small fraction of genes found in multiple habitats is enriched in antibiotic-resistance genes and markers for mobile genetic elements. By further clustering these species-level genes into 32 million protein families, we observed that a small fraction of these families contain the majority of the genes (0.6% of families account for 50% of the genes). The majority of species-level genes and protein families are rare. Furthermore, species-level genes, and in particular the rare ones, show low rates of positive (adaptive) selection, supporting a model in which most genetic variability observed within each protein family is neutral or nearly neutral.
Collapse
Affiliation(s)
- Luis Pedro Coelho
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China. .,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Shanghai, China. .,Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
| | - Renato Alves
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Álvaro Rodríguez del Río
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Pernille Neve Myers
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Carlos P. Cantalapiedra
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Joaquín Giner-Lamia
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain,Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Thomas Sebastian Schmidt
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Daniel R. Mende
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany,Daniel K. Inouye Center for Microbial Oceanography: Research and Education, University of Hawai’i at Mānoa, Honolulu, HI, USA
| | - Askarbek Orakov
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | | | - Falk Hildebrand
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany,Earlham Institute, Norwich Research Park, Norwich, UK,Gut Health and Microbes Programme, Quadram Institute, Norwich Research Park, Norwich, UK
| | - Thea Van Rossum
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Sofia K. Forslund
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany,Experimental and Clinical Research Center (ECRC), a joint venture of the Max Delbrück Centre (MDC) and Charité University Hospital, Berlin, Germany,Berlin Initiative of Health, Berlin, Germany
| | - Supriya Khedkar
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Oleksandr M. Maistrenko
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Shaojun Pan
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Shanghai, China
| | - Longhao Jia
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Shanghai, China
| | - Pamela Ferretti
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Shinichi Sunagawa
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany,Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China,MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Shanghai, China
| | | | - Jaime Huerta-Cepas
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany. .,Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain.
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany. .,Max Delbrück Centre for Molecular Medicine, Berlin, Germany. .,Yonsei Frontier Lab (YFL), Yonsei University, Seoul, South Korea. .,Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany.
| |
Collapse
|
33
|
Klapproth C, Sen R, Stadler PF, Findeiß S, Fallmann J. Common Features in lncRNA Annotation and Classification: A Survey. Noncoding RNA 2021; 7:77. [PMID: 34940758 PMCID: PMC8708962 DOI: 10.3390/ncrna7040077] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 12/03/2021] [Accepted: 12/06/2021] [Indexed: 12/29/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are widely recognized as important regulators of gene expression. Their molecular functions range from miRNA sponging to chromatin-associated mechanisms, leading to effects in disease progression and establishing them as diagnostic and therapeutic targets. Still, only a few representatives of this diverse class of RNAs are well studied, while the vast majority is poorly described beyond the existence of their transcripts. In this review we survey common in silico approaches for lncRNA annotation. We focus on the well-established sets of features used for classification and discuss their specific advantages and weaknesses. While the available tools perform very well for the task of distinguishing coding sequence from other RNAs, we find that current methods are not well suited to distinguish lncRNAs or parts thereof from other non-protein-coding input sequences. We conclude that the distinction of lncRNAs from intronic sequences and untranslated regions of coding mRNAs remains a pressing research gap.
Collapse
Affiliation(s)
- Christopher Klapproth
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; (C.K.); (P.F.S.); (S.F.)
| | - Rituparno Sen
- Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz-Center for Infection Research (HZI), D-97080 Würzburg, Germany;
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; (C.K.); (P.F.S.); (S.F.)
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Center for Scalable Data Services and Solutions, and Leipzig Research Center for Civilization Diseases, University Leipzig, D-04103 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
- Facultad de Ciencias, Universidad National de Colombia, Bogotá CO-111321, Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| | - Sven Findeiß
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; (C.K.); (P.F.S.); (S.F.)
| | - Jörg Fallmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; (C.K.); (P.F.S.); (S.F.)
| |
Collapse
|
34
|
Chen L, Yang Y, Zhang Y, Li K, Cai H, Wang H, Zhao Q. The Small Open Reading Frame-Encoded Peptides: Advances in Methodologies and Functional Studies. Chembiochem 2021; 23:e202100534. [PMID: 34862721 DOI: 10.1002/cbic.202100534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 11/15/2021] [Indexed: 11/07/2022]
Abstract
Small open reading frames (sORFs) are an important class of genes with less than 100 codons. They were historically annotated as noncoding or even junk sequences. In recent years, accumulating evidence suggests that sORFs could encode a considerable number of polypeptides, many of which play important roles in both physiology and disease pathology. However, it has been technically challenging to directly detect sORF-encoded peptides (SEPs). Here, we discuss the latest advances in methodologies for identifying SEPs with mass spectrometry, as well as the progress on functional studies of SEPs.
Collapse
Affiliation(s)
- Lei Chen
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China.,Laboratory for Synthetic Chemistry and Chemical Biology Limited, Hong Kong Science and Technology Park, New Territories, Hong Kong SAR, 999077, P. R. China
| | - Ying Yang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Yuanliang Zhang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Kecheng Li
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510623, P. R. China
| | - Hongwei Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510623, P. R. China
| | - Qian Zhao
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| |
Collapse
|
35
|
A putative long noncoding RNA-encoded micropeptide maintains cellular homeostasis in pancreatic β cells. MOLECULAR THERAPY. NUCLEIC ACIDS 2021; 26:307-320. [PMID: 34513312 PMCID: PMC8416971 DOI: 10.1016/j.omtn.2021.06.027] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 06/30/2021] [Indexed: 12/12/2022]
Abstract
Micropeptides (microproteins) encoded by transcripts previously annotated as long noncoding RNAs (lncRNAs) are emerging as important mediators of fundamental biological processes in health and disease. Here, we applied two computational tools to identify putative micropeptides encoded by lncRNAs that are expressed in the human pancreas. We experimentally verified one such micropeptide encoded by a β cell- and neural cell-enriched lncRNA TCL1 Upstream Neural Differentiation-Associated RNA (TUNAR, also known as TUNA, HI-LNC78, or LINC00617). We named this highly conserved 48-amino-acid micropeptide beta cell- and neural cell-regulin (BNLN). BNLN contains a single-pass transmembrane domain and localizes at the endoplasmic reticulum (ER) in pancreatic β cells. Overexpression of BNLN lowered ER calcium levels, maintained ER homeostasis, and elevated glucose-stimulated insulin secretion in pancreatic β cells. We further assessed the BNLN expression in islets from mice fed a high-fat diet and a regular diet and found that BNLN is suppressed by diet-induced obesity (DIO). Conversely, overexpression of BNLN enhanced insulin secretion in islets from lean and obese mice as well as from humans. Taken together, our study provides the first evidence that lncRNA-encoded micropeptides play a critical role in pancreatic β cell functions and provides a foundation for future comprehensive analyses of micropeptide function and pathophysiological impact on diabetes.
Collapse
|
36
|
Song K, Baumgartner D, Hagemann M, Muro-Pastor AM, Maaß S, Becher D, Hess WR. AtpΘ is an inhibitor of F 0F 1 ATP synthase to arrest ATP hydrolysis during low-energy conditions in cyanobacteria. Curr Biol 2021; 32:136-148.e5. [PMID: 34762820 DOI: 10.1016/j.cub.2021.10.051] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 10/14/2021] [Accepted: 10/22/2021] [Indexed: 10/19/2022]
Abstract
Biological processes in all living cells are powered by ATP, a nearly universal molecule of energy transfer. ATP synthases produce ATP utilizing proton gradients that are usually generated by either respiration or photosynthesis. However, cyanobacteria are unique in combining photosynthetic and respiratory electron transport chains in the same membrane system, the thylakoids. How cyanobacteria prevent the futile reverse operation of ATP synthase under unfavorable conditions pumping protons while hydrolyzing ATP is mostly unclear. Here, we provide evidence that the small protein AtpΘ, which is widely conserved in cyanobacteria, is mainly fulfilling this task. The expression of AtpΘ becomes induced under conditions such as darkness or heat shock, which can lead to a weakening of the proton gradient. Translational fusions of AtpΘ to the green fluorescent protein revealed targeting to the thylakoid membrane. Immunoprecipitation assays followed by mass spectrometry and far western blots identified subunits of ATP synthase as interacting partners of AtpΘ. ATP hydrolysis assays with isolated membrane fractions, as well as purified ATP synthase complexes, demonstrated that AtpΘ inhibits ATPase activity in a dose-dependent manner similar to the F0F1-ATP synthase inhibitor N,N-dicyclohexylcarbodimide. The results show that, even in a well-investigated process, crucial new players can be discovered if small proteins are taken into consideration and indicate that ATP synthase activity can be controlled in surprisingly different ways.
Collapse
Affiliation(s)
- Kuo Song
- University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Schänzlestr. 1, 79104 Freiburg, Germany
| | - Desirée Baumgartner
- University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Schänzlestr. 1, 79104 Freiburg, Germany
| | - Martin Hagemann
- University of Rostock, Institute of Biosciences, Plant Physiology Department, Albert-Einstein-Str. 3, 18059 Rostock, Germany
| | - Alicia M Muro-Pastor
- Instituto de Bioquímica Vegetal y Fotosíntesis, Consejo Superior de Investigaciones Científicas and Universidad de Sevilla, 41092 Sevilla, Spain
| | - Sandra Maaß
- University of Greifswald, Department of Microbial Proteomics, Institute of Microbiology, 17489 Greifswald, Germany
| | - Dörte Becher
- University of Greifswald, Department of Microbial Proteomics, Institute of Microbiology, 17489 Greifswald, Germany
| | - Wolfgang R Hess
- University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Schänzlestr. 1, 79104 Freiburg, Germany.
| |
Collapse
|
37
|
Brewer KI, Gaffield GJ, Puri M, Breaker RR. DIMPL: a bioinformatics pipeline for the discovery of structured noncoding RNA motifs in bacteria. Bioinformatics 2021; 38:533-535. [PMID: 34524415 PMCID: PMC8723152 DOI: 10.1093/bioinformatics/btab624] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 05/07/2021] [Accepted: 09/10/2021] [Indexed: 02/03/2023] Open
Abstract
SUMMARY Recent efforts to identify novel bacterial structured noncoding RNA (ncRNA) motifs through searching long, GC-rich intergenic regions (IGRs) have revealed several new classes, including the recently validated HMP-PP riboswitch. The DIMPL (Discovery of Intergenic Motifs PipeLine) discovery pipeline described herein enables rapid extraction and selection of bacterial IGRs that are enriched for structured ncRNAs. Moreover, DIMPL automates the subsequent computational steps necessary for their functional identification. AVAILABILITY AND IMPLEMENTATION The DIMPL pipeline is freely available as a Docker image with an accompanying set of Jupyter notebooks. Full instructions for download and use are available at https://github.com/breakerlab/dimpl. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kenneth I Brewer
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520-8103, USA
| | - Glenn J Gaffield
- Howard Hughes Medical Institute, Yale University, New Haven, CT 06520-8103, USA
| | - Malavika Puri
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06520-8103, USA
| | | |
Collapse
|
38
|
Weinberg CE, Olzog VJ, Eckert I, Weinberg Z. Identification of over 200-fold more hairpin ribozymes than previously known in diverse circular RNAs. Nucleic Acids Res 2021; 49:6375-6388. [PMID: 34096583 PMCID: PMC8216279 DOI: 10.1093/nar/gkab454] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Accepted: 05/12/2021] [Indexed: 11/18/2022] Open
Abstract
Self-cleaving ribozymes are catalytic RNAs that cut themselves at a specific inter-nucleotide linkage. They serve as a model of RNA catalysis, and as an important tool in biotechnology. For most of the nine known structural classes of self-cleaving ribozymes, at least hundreds of examples are known, and some are present in multiple domains of life. By contrast, only four unique examples of the hairpin ribozyme class are known, despite its discovery in 1986. We bioinformatically predicted 941 unique hairpin ribozymes of a different permuted form from the four previously known hairpin ribozymes, and experimentally confirmed several diverse predictions. These results profoundly expand the number of natural hairpin ribozymes, enabling biochemical analysis based on natural sequences, and suggest that a distinct permuted form is more biologically relevant. Moreover, all novel hairpins were discovered in metatranscriptomes. They apparently reside in RNA molecules that vary both in size—from 381 to 5170 nucleotides—and in protein content. The RNA molecules likely replicate as circular single-stranded RNAs, and potentially provide a dramatic increase in diversity of such RNAs. Moreover, these organisms have eluded previous attempts to isolate RNA viruses from metatranscriptomes—suggesting a significant untapped universe of viruses or other organisms hidden within metatranscriptome sequences.
Collapse
Affiliation(s)
- Christina E Weinberg
- Institute for Biochemistry, Leipzig University, Brüderstraße 34, 04103 Leipzig, Germany
| | - V Janett Olzog
- Institute for Biochemistry, Leipzig University, Brüderstraße 34, 04103 Leipzig, Germany
| | - Iris Eckert
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Centre for Bioinformatics, Leipzig University, Härtelstraße 16-18, 04107 Leipzig, Germany
| | - Zasha Weinberg
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Centre for Bioinformatics, Leipzig University, Härtelstraße 16-18, 04107 Leipzig, Germany
| |
Collapse
|
39
|
Gao W, Jones TA, Rivas E. Discovery of 17 conserved structural RNAs in fungi. Nucleic Acids Res 2021; 49:6128-6143. [PMID: 34086938 PMCID: PMC8216456 DOI: 10.1093/nar/gkab355] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 03/25/2021] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Many non-coding RNAs with known functions are structurally conserved: their intramolecular secondary and tertiary interactions are maintained across evolutionary time. Consequently, the presence of conserved structure in multiple sequence alignments can be used to identify candidate functional non-coding RNAs. Here, we present a bioinformatics method that couples iterative homology search with covariation analysis to assess whether a genomic region has evidence of conserved RNA structure. We used this method to examine all unannotated regions of five well-studied fungal genomes (Saccharomyces cerevisiae, Candida albicans, Neurospora crassa, Aspergillus fumigatus, and Schizosaccharomyces pombe). We identified 17 novel structurally conserved non-coding RNA candidates, which include four H/ACA box small nucleolar RNAs, four intergenic RNAs and nine RNA structures located within the introns and untranslated regions (UTRs) of mRNAs. For the two structures in the 3' UTRs of the metabolic genes GLY1 and MET13, we performed experiments that provide evidence against them being eukaryotic riboswitches.
Collapse
Affiliation(s)
- William Gao
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, USA
| | - Thomas A Jones
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, USA
- Howard Hughes Medical Institute, Harvard University, Cambridge, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, USA
| |
Collapse
|
40
|
Fuchs S, Kucklick M, Lehmann E, Beckmann A, Wilkens M, Kolte B, Mustafayeva A, Ludwig T, Diwo M, Wissing J, Jänsch L, Ahrens CH, Ignatova Z, Engelmann S. Towards the characterization of the hidden world of small proteins in Staphylococcus aureus, a proteogenomics approach. PLoS Genet 2021; 17:e1009585. [PMID: 34061833 PMCID: PMC8195425 DOI: 10.1371/journal.pgen.1009585] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 06/11/2021] [Accepted: 05/07/2021] [Indexed: 01/08/2023] Open
Abstract
Small proteins play essential roles in bacterial physiology and virulence, however, automated algorithms for genome annotation are often not yet able to accurately predict the corresponding genes. The accuracy and reliability of genome annotations, particularly for small open reading frames (sORFs), can be significantly improved by integrating protein evidence from experimental approaches. Here we present a highly optimized and flexible bioinformatics workflow for bacterial proteogenomics covering all steps from (i) generation of protein databases, (ii) database searches and (iii) peptide-to-genome mapping to (iv) visualization of results. We used the workflow to identify high quality peptide spectrum matches (PSMs) for small proteins (≤ 100 aa, SP100) in Staphylococcus aureus Newman. Protein extracts from S. aureus were subjected to different experimental workflows for protein digestion and prefractionation and measured with highly sensitive mass spectrometers. In total, 175 proteins with up to 100 aa (SP100) were identified. Out of these 24 (ranging from 9 to 99 aa) were novel and not contained in the used genome annotation.144 SP100 are highly conserved and were found in at least 50% of the publicly available S. aureus genomes, while 127 are additionally conserved in other staphylococci. Almost half of the identified SP100 were basic, suggesting a role in binding to more acidic molecules such as nucleic acids or phospholipids.
Collapse
Affiliation(s)
- Stephan Fuchs
- Robert Koch Institute, Methodenentwicklung und Forschungsinfrastruktur (MF), Berlin, Germany
| | - Martin Kucklick
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Erik Lehmann
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Alexander Beckmann
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Maya Wilkens
- Robert Koch Institute, Methodenentwicklung und Forschungsinfrastruktur (MF), Berlin, Germany
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Baban Kolte
- University of Hamburg, Institute of Biochemistry and Molecular Biology, Hamburg, Germany
| | - Ayten Mustafayeva
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Tobias Ludwig
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Maurice Diwo
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Josef Wissing
- Helmholtz Center for Infection Research GmbH, Cellular Proteomics, Braunschweig, Germany
| | - Lothar Jänsch
- Helmholtz Center for Infection Research GmbH, Cellular Proteomics, Braunschweig, Germany
| | - Christian H Ahrens
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Zoya Ignatova
- University of Hamburg, Institute of Biochemistry and Molecular Biology, Hamburg, Germany
| | - Susanne Engelmann
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| |
Collapse
|
41
|
A workflow to identify novel proteins based on the direct mapping of peptide-spectrum-matches to genomic locations. BMC Bioinformatics 2021; 22:277. [PMID: 34039272 PMCID: PMC8157683 DOI: 10.1186/s12859-021-04159-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 04/27/2021] [Indexed: 02/06/2023] Open
Abstract
Background Small Proteins have received increasing attention in recent years. They have in particular been implicated as signals contributing to the coordination of bacterial communities. In genome annotations they are often missing or hidden among large numbers of hypothetical proteins because genome annotation pipelines often exclude short open reading frames or over-predict hypothetical proteins based on simple models. The validation of novel proteins, and in particular of small proteins (sProteins), therefore requires additional evidence. Proteogenomics is considered the gold standard for this purpose. It extends beyond established annotations and includes all possible open reading frames (ORFs) as potential sources of peptides, thus allowing the discovery of novel, unannotated proteins. Typically this results in large numbers of putative novel small proteins fraught with large fractions of false-positive predictions. Results We observe that number and quality of the peptide-spectrum matches (PSMs) that map to a candidate ORF can be highly informative for the purpose of distinguishing proteins from spurious ORF annotations. We report here on a workflow that aggregates PSM quality information and local context into simple descriptors and reliably separates likely proteins from the large pool of false-positive, i.e., most likely untranslated ORFs. We investigated the artificial gut microbiome model SIHUMIx, comprising eight different species, for which we validate 5114 proteins that have previously been annotated only as hypothetical ORFs. In addition, we identified 37 non-annotated protein candidates for which we found evidence at the proteomic and transcriptomic level. Half (19) of these candidates have close functional homologs in other species. Another 12 candidates have homologs designated as hypothetical proteins in other species. The remaining six candidates are short (< 100 AA) and are most likely bona fide novel proteins. Conclusions The aggregation of PSM quality information for predicted ORFs provides a robust and efficient method to identify novel proteins in proteomics data. The workflow is in particular capable of identifying small proteins and frameshift variants. Since PSMs are explicitly mapped to genomic locations, it furthermore facilitates the integration of transcriptomics data and other sources of genome-level information. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04159-8.
Collapse
|
42
|
Fremin BJ, Bhatt AS. Comparative genomics identifies thousands of candidate structured RNAs in human microbiomes. Genome Biol 2021; 22:100. [PMID: 33845850 PMCID: PMC8040213 DOI: 10.1186/s13059-021-02319-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 03/19/2021] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Structured RNAs play varied bioregulatory roles within microbes. To date, hundreds of candidate structured RNAs have been predicted using informatic approaches that search for motif structures in genomic sequence data. The human microbiome contains thousands of species and strains of microbes. Yet, much of the metagenomic data from the human microbiome remains unmined for structured RNA motifs primarily due to computational limitations. RESULTS We sought to apply a large-scale, comparative genomics approach to these organisms to identify candidate structured RNAs. With a carefully constructed, though computationally intensive automated analysis, we identify 3161 conserved candidate structured RNAs in intergenic regions, as well as 2022 additional candidate structured RNAs that may overlap coding regions. We validate the RNA expression of 177 of these candidate structures by analyzing small fragment RNA-seq data from four human fecal samples. CONCLUSIONS This approach identifies a wide variety of candidate structured RNAs, including tmRNAs, antitoxins, and likely ribosome protein leaders, from a wide variety of taxa. Overall, our pipeline enables conservative predictions of thousands of novel candidate structured RNAs from human microbiomes.
Collapse
Affiliation(s)
- Brayon J Fremin
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Ami S Bhatt
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA.
- Department of Medicine (Hematology), Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
43
|
Hou L, Xie J, Wu Y, Wang J, Duan A, Ao Y, Liu X, Yu X, Yan H, Perreault J, Li S. Identification of 11 candidate structured noncoding RNA motifs in humans by comparative genomics. BMC Genomics 2021; 22:164. [PMID: 33750298 PMCID: PMC7941889 DOI: 10.1186/s12864-021-07474-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 02/24/2021] [Indexed: 11/12/2022] Open
Abstract
Background Only 1.5% of the human genome encodes proteins, while large part of the remaining encodes noncoding RNAs (ncRNA). Many ncRNAs form structures and perform many important functions. Accurately identifying structured ncRNAs in the human genome and discovering their biological functions remain a major challenge. Results Here, we have established a pipeline (CM-line) with the following features for analyzing the large genomes of humans and other animals. First, we selected species with larger genetic distances to facilitate the discovery of covariations and compatible mutations. Second, we used CMfinder, which can generate useful alignments even with low sequence conservation. Third, we removed repetitive sequences and known structured ncRNAs to reduce the workload of CMfinder. Fourth, we used Infernal to find more representatives and refine the structure. We reported 11 classes of structured ncRNA candidates with significant covariations in humans. Functional analysis showed that these ncRNAs may have variable functions. Some may regulate circadian clock genes through poly (A) signals (PAS); some may regulate the elongation factor (EEF1A) and the T-cell receptor signaling pathway by cooperating with RNA binding proteins. Conclusions By searching for important features of RNA structure from large genomes, the CM-line has revealed the existence of a variety of novel structured ncRNAs. Functional analysis suggests that some newly discovered ncRNA motifs may have biological functions. The pipeline we have established for the discovery of structured ncRNAs and the identification of their functions can also be applied to analyze other large genomes. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07474-9.
Collapse
Affiliation(s)
- Lijuan Hou
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Jin Xie
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Yaoyao Wu
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Jiaojiao Wang
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Anqi Duan
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Yaqi Ao
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Xuejiao Liu
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Xinmei Yu
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Hui Yan
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China
| | - Jonathan Perreault
- INRS - Institut Armand-Frappier, 531 boul des Prairies, Laval, Québec, H7V1B7, Canada
| | - Sanshu Li
- Medical School, Molecular Medicine Engineering and Research Center of Ministry of Education, Key Laboratory of Precision Medicine and Molecular Diagnosis of Fujian Universities, Institute of Genomics, School of Biomedical Sciences, Huaqiao University, Xiamen, 361021, P. R. China.
| |
Collapse
|
44
|
Schlesinger D, Elsässer SJ. Revisiting sORFs: overcoming challenges to identify and characterize functional microproteins. FEBS J 2021; 289:53-74. [PMID: 33595896 DOI: 10.1111/febs.15769] [Citation(s) in RCA: 54] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 01/17/2021] [Accepted: 02/15/2021] [Indexed: 02/07/2023]
Abstract
Short ORFs (sORFs), that is, occurrences of a start and stop codon within 100 codons or less, can be found in organisms of all domains of life, outnumbering annotated protein-coding ORFs by orders of magnitude. Even though functional proteins smaller than 100 amino acids are known, the coding potential of sORFs has often been overlooked, as it is not trivial to predict and test for functionality within the large number of sORFs. Recent advances in ribosome profiling and mass spectrometry approaches, together with refined bioinformatic predictions, have enabled a huge leap forward in this field and identified thousands of likely coding sORFs. A relatively low number of small proteins or microproteins produced from these sORFs have been characterized so far on the molecular, structural, and/or mechanistic level. These however display versatile and, in some cases, essential cellular functions, allowing for the exciting possibility that many more, previously unknown small proteins might be encoded in the genome, waiting to be discovered. This review will give an overview of the steadily growing microprotein field, focusing on eukaryotic small proteins. We will discuss emerging themes in the molecular action of microproteins, as well as advances and challenges in microprotein identification and characterization.
Collapse
Affiliation(s)
- Dörte Schlesinger
- Science for Life Laboratory, Division of Genome Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.,Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, Sweden
| | - Simon J Elsässer
- Science for Life Laboratory, Division of Genome Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.,Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
45
|
Duan Y, Zhang W, Cheng Y, Shi M, Xia XQ. A systematic evaluation of bioinformatics tools for identification of long noncoding RNAs. RNA (NEW YORK, N.Y.) 2021; 27:80-98. [PMID: 33055239 PMCID: PMC7749630 DOI: 10.1261/rna.074724.120] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 10/07/2020] [Indexed: 06/11/2023]
Abstract
High-throughput RNA sequencing unveiled the complexity of transcriptome and significantly increased the records of long noncoding RNAs (lncRNAs), which were reported to participate in a variety of biological processes. Identification of lncRNAs is a key step in lncRNA analysis, and a bunch of bioinformatics tools have been developed for this purpose in recent years. While these tools allow us to identify lncRNA more efficiently and accurately, they may produce inconsistent results, making selection a confusing issue. We compared the performance of 41 analysis models based on 14 software packages and different data sets, including high-quality data and low-quality data from 33 species. In addition, computational efficiency, robustness, and joint prediction of the models were explored. As a practical guidance, key points for lncRNA identification under different situations were summarized. In this investigation, no one of these models could be superior to others under all test conditions. The performance of a model relied to a great extent on the source of transcripts and the quality of assemblies. As general references, FEELnc_all_cl, CPC, and CPAT_mouse work well in most species while COME, CNCI, and lncScore are good choices for model organisms. Since these tools are sensitive to different factors such as the species involved and the quality of assembly, researchers must carefully select the appropriate tool based on the actual data. Alternatively, our test suggests that joint prediction could behave better than any single model if proper models were chosen. All scripts/data used in this research can be accessed at http://bioinfo.ihb.ac.cn/elit.
Collapse
Affiliation(s)
- You Duan
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wanting Zhang
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- The Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing 100101, China
| | - Yingyin Cheng
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- The Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing 100101, China
| | - Mijuan Shi
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- The Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiao-Qin Xia
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- The Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
46
|
Vivek AT, Kumar S. Computational methods for annotation of plant regulatory non-coding RNAs using RNA-seq. Brief Bioinform 2020; 22:6041165. [PMID: 33333550 DOI: 10.1093/bib/bbaa322] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/19/2020] [Accepted: 10/20/2020] [Indexed: 12/19/2022] Open
Abstract
Plant transcriptome encompasses numerous endogenous, regulatory non-coding RNAs (ncRNAs) that play a major biological role in regulating key physiological mechanisms. While studies have shown that ncRNAs are extremely diverse and ubiquitous, the functions of the vast majority of ncRNAs are still unknown. With ever-increasing ncRNAs under study, it is essential to identify, categorize and annotate these ncRNAs on a genome-wide scale. The use of high-throughput RNA sequencing (RNA-seq) technologies provides a broader picture of the non-coding component of transcriptome, enabling the comprehensive identification and annotation of all major ncRNAs across samples. However, the detection of known and emerging class of ncRNAs from RNA-seq data demands complex computational methods owing to their unique as well as similar characteristics. Here, we discuss major plant endogenous, regulatory ncRNAs in an RNA sample followed by computational strategies applied to discover each class of ncRNAs using RNA-seq. We also provide a collection of relevant software packages and databases to present a comprehensive bioinformatics toolbox for plant ncRNA researchers. We assume that the discussions in this review will provide a rationale for the discovery of all major categories of plant ncRNAs.
Collapse
Affiliation(s)
- A T Vivek
- National Institute of Plant Genome Research in New Delhi, India
| | - Shailesh Kumar
- National Institute of Plant Genome Research in New Delhi
| |
Collapse
|
47
|
Automated Prediction and Annotation of Small Open Reading Frames in Microbial Genomes. Cell Host Microbe 2020; 29:121-131.e4. [PMID: 33290720 DOI: 10.1016/j.chom.2020.11.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 09/26/2020] [Accepted: 11/07/2020] [Indexed: 01/21/2023]
Abstract
Small open reading frames (smORFs) and their encoded microproteins play central roles in microbes. However, there is a vast unexplored space of smORFs within human-associated microbes. A recent bioinformatic analysis used evolutionary conservation signals to enhance prediction of small protein families. To facilitate the annotation of specific smORFs, we introduce SmORFinder. This tool combines profile hidden Markov models of each smORF family and deep learning models that better generalize to smORF families not seen in the training set, resulting in predictions enriched for Ribo-seq translation signals. Feature importance analysis reveals that the deep learning models learn to identify Shine-Dalgarno sequences, deprioritize the wobble position in each codon, and group codon synonyms found in the codon table. A core-genome analysis of 26 bacterial species identifies several core smORFs of unknown function. We pre-compute smORF annotations for thousands of RefSeq isolate genomes and Human Microbiome Project metagenomes and provide these data through a public web portal.
Collapse
|
48
|
The Small Toxic Salmonella Protein TimP Targets the Cytoplasmic Membrane and Is Repressed by the Small RNA TimR. mBio 2020; 11:mBio.01659-20. [PMID: 33172998 PMCID: PMC7667032 DOI: 10.1128/mbio.01659-20] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Next-generation sequencing (NGS) has enabled the revelation of a vast number of genomes from organisms spanning all domains of life. To reduce complexity when new genome sequences are annotated, open reading frames (ORFs) shorter than 50 codons in length are generally omitted. However, it has recently become evident that this procedure sorts away ORFs encoding small proteins of high biological significance. For instance, tailored small protein identification approaches have shown that bacteria encode numerous small proteins with important physiological functions. As the number of predicted small ORFs increase, it becomes important to characterize the corresponding proteins. In this study, we discovered a conserved but previously overlooked small enterobacterial protein. We show that this protein, which we dubbed TimP, is a potent toxin that inhibits bacterial growth by targeting the cell membrane. Toxicity is relieved by a small regulatory RNA, which binds the toxin mRNA to inhibit toxin synthesis. Small proteins are gaining increased attention due to their important functions in major biological processes throughout the domains of life. However, their small size and low sequence conservation make them difficult to identify. It is therefore not surprising that enterobacterial ryfA has escaped identification as a small protein coding gene for nearly 2 decades. Since its identification in 2001, ryfA has been thought to encode a noncoding RNA and has been implicated in biofilm formation in Escherichia coli and pathogenesis in Shigella dysenteriae. Although a recent ribosome profiling study suggested ryfA to be translated, the corresponding protein product was not detected. In this study, we provide evidence that ryfA encodes a small toxic inner membrane protein, TimP, overexpression of which causes cytoplasmic membrane leakage. TimP carries an N-terminal signal sequence, indicating that its membrane localization is Sec-dependent. Expression of TimP is repressed by the small RNA (sRNA) TimR, which base pairs with the timP mRNA to inhibit its translation. In contrast to overexpression, endogenous expression of TimP upon timR deletion permits cell growth, possibly indicating a toxicity-independent function in the bacterial membrane.
Collapse
|
49
|
Zhou B, Yang H, Yang C, Bao YL, Yang SM, Liu J, Xiao YF. Translation of noncoding RNAs and cancer. Cancer Lett 2020; 497:89-99. [PMID: 33038492 DOI: 10.1016/j.canlet.2020.10.002] [Citation(s) in RCA: 83] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 09/30/2020] [Accepted: 10/01/2020] [Indexed: 02/07/2023]
Abstract
The human genome contains thousands of noncoding RNAs (ncRNAs), which are thought to lack open reading frames (ORFs) and cannot be translated. Some ncRNAs reportedly have important functions, including epigenetic regulation, chromatin remolding, protein modification, and RNA degradation, but the functions of most ncRNAs remain elusive. Through the application and development of ribosome profiling and sequencing technologies, an increasing number of studies have discovered the translation of ncRNAs. Although ncRNAs were initially defined as noncoding RNAs, a number of ncRNAs actually contain ORFs that are translated into peptides. Here, we summarize the available methods, tools, and databases for identifying and validating ncRNA-encoded peptides/proteins, and the recent findings regarding ncRNA-encoded small peptides/proteins in cancer are compiled and synthesized. Importantly, the role of ncRNA-encoding peptides/proteins has application prospects in cancer research, but some potential challenges remain unresolved. The aim of this review is to provide a theoretical basis that might promote the discovery of more peptides/proteins encoded by ncRNAs and aid the further development of novel diagnostic and prognostic cancer markers and therapeutic targets.
Collapse
Affiliation(s)
- Bo Zhou
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China
| | - Huan Yang
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China
| | - Chuan Yang
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China
| | - Yu-Lu Bao
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China
| | - Shi-Ming Yang
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China
| | - Jiao Liu
- Department of Endoscope, General Hospital of Northern Theater Command, Shenyang, 110016, Liaoning, China.
| | - Yu-Feng Xiao
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China.
| |
Collapse
|
50
|
Choi SW, Kim HW, Nam JW. The small peptide world in long noncoding RNAs. Brief Bioinform 2020; 20:1853-1864. [PMID: 30010717 PMCID: PMC6917221 DOI: 10.1093/bib/bby055] [Citation(s) in RCA: 173] [Impact Index Per Article: 43.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Revised: 05/08/2018] [Indexed: 02/07/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) are a group of transcripts that are longer than 200 nucleotides (nt) without coding potential. Over the past decade, tens of thousands of novel lncRNAs have been annotated in animal and plant genomes because of advanced high-throughput RNA sequencing technologies and with the aid of coding transcript classifiers. Further, a considerable number of reports have revealed the existence of stable, functional small peptides (also known as micropeptides), translated from lncRNAs. In this review, we discuss the methods of lncRNA classification, the investigations regarding their coding potential and the functional significance of the peptides they encode.
Collapse
Affiliation(s)
- Seo-Won Choi
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea
| | - Hyun-Woo Kim
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea
| | - Jin-Wu Nam
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea
| |
Collapse
|