1
|
Naidu P, Holford M. Microscopic marvels: Decoding the role of micropeptides in innate immunity. Immunology 2024; 173:605-621. [PMID: 39188052 DOI: 10.1111/imm.13850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 07/30/2024] [Indexed: 08/28/2024] Open
Abstract
The innate immune response is under selection pressures from changing environments and pathogens. While sequence evolution can be studied by comparing rates of amino acid mutations within and between species, how a gene's birth and death contribute to the evolution of immunity is less known. Short open reading frames, once regarded as untranslated or transcriptional noise, can often produce micropeptides of <100 amino acids with a wide array of biological functions. Some micropeptide sequences are well conserved, whereas others have no evolutionary conservation, potentially representing new functional compounds that arise from species-specific adaptations. To date, few reports have described the discovery of novel micropeptides of the innate immune system. The diversity of immune-related micropeptides is a blind spot for gene and functional annotation. Immune-related micropeptides represent a potential reservoir of untapped compounds for understanding and treating disease. This review consolidates what is currently known about the evolution and function of innate immune-related micropeptides to facilitate their investigation.
Collapse
Affiliation(s)
- Praveena Naidu
- Graduate Center, Programs in Biology, Biochemistry, Chemistry, City University of New York, New York, New York, USA
- Department of Chemistry and Biochemistry, City University of New York, Hunter College, Belfer Research Building, New York, New York, USA
| | - Mandë Holford
- Graduate Center, Programs in Biology, Biochemistry, Chemistry, City University of New York, New York, New York, USA
- Department of Chemistry and Biochemistry, City University of New York, Hunter College, Belfer Research Building, New York, New York, USA
- American Museum of Natural History, Invertebrate Zoology, Sackler Institute for Comparative Genomics, New York, New York, USA
- Weill Cornell Medicine, Department of Biochemistry, New York, New York, USA
| |
Collapse
|
2
|
Dikmen F, Dabak T, Özgişi BD, Özenirler Ç, Kuralay SC, Çay SB, Çınar YU, Obut O, Balcı MA, Akbaba P, Aksel EG, Zararsız G, Solares E, Eldem V. Transcriptome-wide analysis uncovers regulatory elements of the antennal transcriptome repertoire of bumblebee at different life stages. INSECT MOLECULAR BIOLOGY 2024; 33:571-588. [PMID: 38676460 DOI: 10.1111/imb.12914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 04/09/2024] [Indexed: 04/29/2024]
Abstract
Bumblebees are crucial pollinators, providing essential ecosystem services and global food production. The success of pollination services relies on the interaction between sensory organs and the environment. The antenna functions as a versatile multi-sensory organ, pivotal in mediating chemosensory/olfactory information, and governs adaptive responses to environmental changes. Despite an increasing number of RNA-sequencing studies on insect antenna, comprehensive antennal transcriptome studies at the different life stages were not elucidated systematically. Here, we quantified the expression profile and dynamics of coding/microRNA genes of larval head and antennal tissues from early- and late-stage pupa to the adult of Bombus terrestris as suitable model organism among pollinators. We further performed Pearson correlation analyses on the gene expression profiles of the antennal transcriptome from larval head tissue to adult stages, exploring both positive and negative expression trends. The positively correlated coding genes were primarily enriched in sensory perception of chemical stimuli, ion transport, transmembrane transport processes and olfactory receptor activity. Negatively correlated genes were mainly enriched in organic substance biosynthesis and regulatory mechanisms underlying larval body patterning and the formation of juvenile antennal structures. As post-transcriptional regulators, miR-1000-5p, miR-13b-3p, miR-263-5p and miR-252-5p showed positive correlations, whereas miR-315-5p, miR-92b-3p, miR-137-3p, miR-11-3p and miR-10-3p exhibited negative correlations in antennal tissue. Notably, based on the inverse expression relationship, positively and negatively correlated microRNA (miRNA)-mRNA target pairs revealed that differentially expressed miRNAs predictively targeted genes involved in antennal development, shaping antennal structures and regulating antenna-specific functions. Our data serve as a foundation for understanding stage-specific antennal transcriptomes and large-scale comparative analysis of transcriptomes in different insects.
Collapse
Affiliation(s)
- Fatih Dikmen
- Department of Biology, Istanbul University, İstanbul, Turkey
| | - Tunç Dabak
- Department of Biology, The Pennsylvania State University, State College, Pennsylvania, USA
| | | | | | | | | | | | - Onur Obut
- Department of Biology, Istanbul University, İstanbul, Turkey
| | | | - Pınar Akbaba
- Department of Biology, Istanbul University, İstanbul, Turkey
| | - Esma Gamze Aksel
- Faculty of Veterinary Medicine, Department of Genetics, Erciyes University, Kayseri, Turkey
| | - Gökmen Zararsız
- Department of Biostatistics, Erciyes University, Kayseri, Turkey
- Drug Application and Research Center (ERFARMA), Erciyes University, Kayseri, Turkey
| | - Edwin Solares
- Computer Science & Engineering Department, University of California, San Diego, California, USA
| | - Vahap Eldem
- Department of Biology, Istanbul University, İstanbul, Turkey
| |
Collapse
|
3
|
Vrbnjak K, Sewduth RN. Recent Advances in Peptide Drug Discovery: Novel Strategies and Targeted Protein Degradation. Pharmaceutics 2024; 16:1486. [PMID: 39598608 PMCID: PMC11597556 DOI: 10.3390/pharmaceutics16111486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 11/19/2024] [Accepted: 11/20/2024] [Indexed: 11/29/2024] Open
Abstract
Recent technological advancements, including computer-assisted drug discovery, gene-editing techniques, and high-throughput screening approaches, have greatly expanded the palette of methods for the discovery of peptides available to researchers. These emerging strategies, driven by recent advances in bioinformatics and multi-omics, have significantly improved the efficiency of peptide drug discovery when compared with traditional in vitro and in vivo methods, cutting costs and improving their reliability. An added benefit of peptide-based drugs is the ability to precisely target protein-protein interactions, which are normally a particularly challenging aspect of drug discovery. Another recent breakthrough in this field is targeted protein degradation through proteolysis-targeting chimeras. These revolutionary compounds represent a noteworthy advancement over traditional small-molecule inhibitors due to their unique mechanism of action, which allows for the degradation of specific proteins with unprecedented specificity. The inclusion of a peptide as a protein-of-interest-targeting moiety allows for improved versatility and the possibility of targeting otherwise undruggable proteins. In this review, we discuss various novel wet-lab and computational multi-omic methods for peptide drug discovery, provide an overview of therapeutic agents discovered through these cutting-edge techniques, and discuss the potential for the therapeutic delivery of peptide-based drugs.
Collapse
Affiliation(s)
- Katarina Vrbnjak
- VIB-KU Leuven Center for Cancer Biology (VIB), 3000 Leuven, Belgium
| | | |
Collapse
|
4
|
Taveira IC, Carraro CB, Nogueira KMV, Pereira LMS, Bueno JGR, Fiamenghi MB, dos Santos LV, Silva RN. Structural and biochemical insights of xylose MFS and SWEET transporters in microbial cell factories: challenges to lignocellulosic hydrolysates fermentation. Front Microbiol 2024; 15:1452240. [PMID: 39397797 PMCID: PMC11466781 DOI: 10.3389/fmicb.2024.1452240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 09/16/2024] [Indexed: 10/15/2024] Open
Abstract
The production of bioethanol from lignocellulosic biomass requires the efficient conversion of glucose and xylose to ethanol, a process that depends on the ability of microorganisms to internalize these sugars. Although glucose transporters exist in several species, xylose transporters are less common. Several types of transporters have been identified in diverse microorganisms, including members of the Major Facilitator Superfamily (MFS) and Sugars Will Eventually be Exported Transporter (SWEET) families. Considering that Saccharomyces cerevisiae lacks an effective xylose transport system, engineered yeast strains capable of efficiently consuming this sugar are critical for obtaining high ethanol yields. This article reviews the structure-function relationship of sugar transporters from the MFS and SWEET families. It provides information on several tools and approaches used to identify and characterize them to optimize xylose consumption and, consequently, second-generation ethanol production.
Collapse
Affiliation(s)
- Iasmin Cartaxo Taveira
- Molecular Biotechnology Laboratory, Department of Biochemistry and Immunology, Ribeirao Preto Medical School (FMRP), University of São Paulo, São Paulo, Brazil
| | - Cláudia Batista Carraro
- Molecular Biotechnology Laboratory, Department of Biochemistry and Immunology, Ribeirao Preto Medical School (FMRP), University of São Paulo, São Paulo, Brazil
| | - Karoline Maria Vieira Nogueira
- Molecular Biotechnology Laboratory, Department of Biochemistry and Immunology, Ribeirao Preto Medical School (FMRP), University of São Paulo, São Paulo, Brazil
| | - Lucas Matheus Soares Pereira
- Molecular Biotechnology Laboratory, Department of Biochemistry and Immunology, Ribeirao Preto Medical School (FMRP), University of São Paulo, São Paulo, Brazil
| | - João Gabriel Ribeiro Bueno
- Genetics and Molecular Biology Graduate Program, Institute of Biology, University of Campinas (UNICAMP), Campinas, Brazil
| | - Mateus Bernabe Fiamenghi
- Genetics and Molecular Biology Graduate Program, Institute of Biology, University of Campinas (UNICAMP), Campinas, Brazil
| | - Leandro Vieira dos Santos
- Genetics and Molecular Biology Graduate Program, Institute of Biology, University of Campinas (UNICAMP), Campinas, Brazil
- Manchester Institute of Biotechnology, University of Manchester, Manchester, United Kingdom
| | - Roberto N. Silva
- Molecular Biotechnology Laboratory, Department of Biochemistry and Immunology, Ribeirao Preto Medical School (FMRP), University of São Paulo, São Paulo, Brazil
| |
Collapse
|
5
|
Bravo S, Zarate P, Cari I, Clavijo L, Lopez I, Phillips NM, Vidal R. Comparative Tissue Identification and Characterization of Long Non-Coding RNAs in the Globally Distributed Blue Shark Prionace glauca. Life (Basel) 2024; 14:1144. [PMID: 39337927 PMCID: PMC11433378 DOI: 10.3390/life14091144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 08/24/2024] [Accepted: 08/27/2024] [Indexed: 09/30/2024] Open
Abstract
Long non-coding RNAs (lncRNAs) are involved in numerous biological processes and serve crucial regulatory functions in both animals and plants. Nevertheless, there is limited understanding of lncRNAs and their patterns of expression and roles in sharks. In the current study, we systematically identified and characterized lncRNAs in the blue shark (Prionace glauca) from four tissues (liver, spleen, muscle, and kidney) using high-throughput sequencing and bioinformatics tools. A total of 21,932 high-confidence lncRNAs were identified, with 8984 and 3067 stably and tissue-specific expressed lncRNAs, respectively. In addition, a total of 45,007 differentially expressed (DE) lncRNAs were obtained among tissues, with kidney versus muscle having the largest numbers across tissues. DE lncRNAs trans target protein-coding genes were predicted, and functional gene ontology enrichment of these genes showed GO terms such as muscle system processes, cellular/metabolic processes, and stress and immune responses, all of which correspond with the specific biological functions of each tissue analyzed. These results advance our knowledge of lncRNAs in sharks and present novel data on tissue-specific lncRNAs, providing key information to support future functional shark investigations.
Collapse
Affiliation(s)
- Scarleth Bravo
- Laboratory of Genomics, Molecular Ecology and Evolutionary Studies, Department of Biology, Universidad de Santiago de Chile, Santiago 9160000, Chile; (S.B.); (I.L.)
| | - Patricia Zarate
- Departamento de Oceanografía y Medio Ambiente, División de Investigación Pesquera, Instituto de Fomento Pesquero, Valparaíso 2361827, Chile; (P.Z.); (I.C.); (L.C.)
| | - Ilia Cari
- Departamento de Oceanografía y Medio Ambiente, División de Investigación Pesquera, Instituto de Fomento Pesquero, Valparaíso 2361827, Chile; (P.Z.); (I.C.); (L.C.)
| | - Ljubitza Clavijo
- Departamento de Oceanografía y Medio Ambiente, División de Investigación Pesquera, Instituto de Fomento Pesquero, Valparaíso 2361827, Chile; (P.Z.); (I.C.); (L.C.)
| | - Ignacio Lopez
- Laboratory of Genomics, Molecular Ecology and Evolutionary Studies, Department of Biology, Universidad de Santiago de Chile, Santiago 9160000, Chile; (S.B.); (I.L.)
| | - Nicole M. Phillips
- School of Biological, Environmental, and Earth Sciences, University of Southern Mississippi, Hattiesburg, MS 39406, USA;
| | - Rodrigo Vidal
- Laboratory of Genomics, Molecular Ecology and Evolutionary Studies, Department of Biology, Universidad de Santiago de Chile, Santiago 9160000, Chile; (S.B.); (I.L.)
| |
Collapse
|
6
|
Zhang J, Lu H, Jiang Y, Ma Y, Deng L. ncRNA Coding Potential Prediction Using BiLSTM and Transformer Encoder-Based Model. J Chem Inf Model 2024; 64:6712-6722. [PMID: 39120528 DOI: 10.1021/acs.jcim.4c01097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/10/2024]
Abstract
Many noncoding RNAs (ncRNAs) have been identified, and many of them play vital roles in various biological processes, including gene expression regulation, epigenetic regulation, transcription, and control. Recently, a few observations revealed that ncRNAs are translated into functional peptides. Moreover, many computational methods have been developed to predict the coding potential of these transcripts, which contributes to a deeper investigation of their functions. However, most of these are used to distinguish ncRNAs and mRNAs. It is important to develop a highly accurate computational tool for identifying the coding potential of ncRNAs, thereby contributing to the discovery of novel peptides. In this Article, we propose a novel BiLSTM And Transformer encoder-based model (nBAT) with intrinsic features encoded for ncRNA coding potential prediction. In nBAT, we introduce a learnable position encoding mechanism to better obtain the embeddings of the ncRNA sequence. Moreover, we extract 43 intrinsic features from different perspectives and encode these features into the Transformer encoder by calculating their distances. Our performance comparisons show that nBAT achieves a superior performance than the state-of-the-art methods for coding potential prediction on different datasets. We also apply the method to new ncRNAs for identifying the coding potential, and the results further indicate the competitive performance of nBAT. We expect the method can be exploited as a useful tool for high-throughput coding potential prediction for ncRNAs.
Collapse
Affiliation(s)
- Jingpu Zhang
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan 467000, China
| | - Hao Lu
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan 467000, China
| | - Ying Jiang
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Yuanyuan Ma
- School of Computer Engineering, Hubei University of Arts and Science, Xiangyang 441053, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| |
Collapse
|
7
|
Zhang Y. LncRNA-encoded peptides in cancer. J Hematol Oncol 2024; 17:66. [PMID: 39135098 PMCID: PMC11320871 DOI: 10.1186/s13045-024-01591-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 08/05/2024] [Indexed: 08/15/2024] Open
Abstract
Long non-coding RNAs (lncRNAs), once considered transcriptional noise, have emerged as critical regulators of gene expression and key players in cancer biology. Recent breakthroughs have revealed that certain lncRNAs can encode small open reading frame (sORF)-derived peptides, which are now understood to contribute to the pathogenesis of various cancers. This review synthesizes current knowledge on the detection, functional roles, and clinical implications of lncRNA-encoded peptides in cancer. We discuss technological advancements in the detection and validation of sORFs, including ribosome profiling and mass spectrometry, which have facilitated the discovery of these peptides. The functional roles of lncRNA-encoded peptides in cancer processes such as gene transcription, translation regulation, signal transduction, and metabolic reprogramming are explored in various types of cancer. The clinical potential of these peptides is highlighted, with a focus on their utility as diagnostic biomarkers, prognostic indicators, and therapeutic targets. The challenges and future directions in translating these findings into clinical practice are also discussed, including the need for large-scale validation, development of sensitive detection methods, and optimization of peptide stability and delivery.
Collapse
Affiliation(s)
- Yaguang Zhang
- Laboratory of Gastrointestinal Tumor Epigenetics and Genomics, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, People's Republic of China.
| |
Collapse
|
8
|
Schettini GP, Morozyuk M, Biase FH. Identification of novel cattle (Bos taurus) genes and biological insights of their function in pre-implantation embryo development. BMC Genomics 2024; 25:775. [PMID: 39118001 PMCID: PMC11313146 DOI: 10.1186/s12864-024-10685-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 08/02/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND Appropriate regulation of genes expressed in oocytes and embryos is essential for acquisition of developmental competence in mammals. Here, we hypothesized that several genes expressed in oocytes and pre-implantation embryos remain unknown. Our goal was to reconstruct the transcriptome of oocytes (germinal vesicle and metaphase II) and pre-implantation cattle embryos (blastocysts) using short-read and long-read sequences to identify putative new genes. RESULTS We identified 274,342 transcript sequences and 3,033 of those loci do not match a gene present in official annotations and thus are potential new genes. Notably, 63.67% (1,931/3,033) of potential novel genes exhibited coding potential. Also noteworthy, 97.92% of the putative novel genes overlapped annotation with transposable elements. Comparative analysis of transcript abundance identified that 1,840 novel genes (recently added to the annotation) or potential new genes were differentially expressed between developmental stages (FDR < 0.01). We also determined that 522 novel or potential new genes (448 and 34, respectively) were upregulated at eight-cell embryos compared to oocytes (FDR < 0.01). In eight-cell embryos, 102 novel or putative new genes were co-expressed (|r|> 0.85, P < 1 × 10-8) with several genes annotated with gene ontology biological processes related to pluripotency maintenance and embryo development. CRISPR-Cas9 genome editing confirmed that the disruption of one of the novel genes highly expressed in eight-cell embryos reduced blastocyst development (ENSBTAG00000068261, P = 1.55 × 10-7). CONCLUSIONS Our results revealed several putative new genes that need careful annotation. Many of the putative new genes have dynamic regulation during pre-implantation development and are important components of gene regulatory networks involved in pluripotency and blastocyst formation.
Collapse
Affiliation(s)
- Gustavo P Schettini
- School of Animal Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Michael Morozyuk
- School of Animal Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Fernando H Biase
- School of Animal Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA.
| |
Collapse
|
9
|
Camargo AP, Roux S, Schulz F, Babinski M, Xu Y, Hu B, Chain PSG, Nayfach S, Kyrpides NC. Identification of mobile genetic elements with geNomad. Nat Biotechnol 2024; 42:1303-1312. [PMID: 37735266 PMCID: PMC11324519 DOI: 10.1038/s41587-023-01953-y] [Citation(s) in RCA: 125] [Impact Index Per Article: 125.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 08/17/2023] [Indexed: 09/23/2023]
Abstract
Identifying and characterizing mobile genetic elements in sequencing data is essential for understanding their diversity, ecology, biotechnological applications and impact on public health. Here we introduce geNomad, a classification and annotation framework that combines information from gene content and a deep neural network to identify sequences of plasmids and viruses. geNomad uses a dataset of more than 200,000 marker protein profiles to provide functional gene annotation and taxonomic assignment of viral genomes. Using a conditional random field model, geNomad also detects proviruses integrated into host genomes with high precision. In benchmarks, geNomad achieved high classification performance for diverse plasmids and viruses (Matthews correlation coefficient of 77.8% and 95.3%, respectively), substantially outperforming other tools. Leveraging geNomad's speed and scalability, we processed over 2.7 trillion base pairs of sequencing data, leading to the discovery of millions of viruses and plasmids that are available through the IMG/VR and IMG/PR databases. geNomad is available at https://portal.nersc.gov/genomad .
Collapse
Affiliation(s)
- Antonio Pedro Camargo
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| | - Simon Roux
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Frederik Schulz
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Michal Babinski
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Yan Xu
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Bin Hu
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Patrick S G Chain
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Stephen Nayfach
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Nikos C Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| |
Collapse
|
10
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024; 56:1293-1321. [PMID: 38871816 PMCID: PMC11263376 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
11
|
Yu D, Zhou M, Chen W, Ding Z, Wang C, Qian Y, Liu Y, He S, Yang L. Characterization of transcriptome changes in saline stress adaptation on Leuciscus merzbacheri using PacBio Iso-Seq and RNA-Seq. DNA Res 2024; 31:dsae019. [PMID: 38807352 PMCID: PMC11161863 DOI: 10.1093/dnares/dsae019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 05/22/2024] [Accepted: 05/28/2024] [Indexed: 05/30/2024] Open
Abstract
Leuciscus merzbacheri is a native fish species found exclusively in the Junggar Basin in Xinjiang. It exhibits remarkable adaptability, thriving in varying water conditions such as the saline waters, the semi-saline water, and the freshwater. Despite its significant economic and ecological value, the underlying mechanisms of its remarkable salinity tolerance remain elusive. Our study marks the first time the full-length transcriptome of L. merzbacheri has been reported, utilizing RNA-Seq and PacBio Iso-Seq technologies. We found that the average length of the full-length transcriptome is 1,780 bp, with an N50 length of 2,358 bp. We collected RNA-Seq data from gill, liver, and kidney tissues of L. merzbacheri from both saline water and freshwater environments and conducted comparative analyses across these tissues. Further analysis revealed significant enrichment in several key functional gene categories and signalling pathways related to stress response and environmental adaptation. The findings provide a valuable genetic resource for further investigation into saline-responsive candidate genes, which will deepen our understanding of teleost adaptation to extreme environmental stress. This knowledge is crucial for the future breeding and conservation of native fish species.
Collapse
Affiliation(s)
- Dan Yu
- School of Ecology and Environment, Tibet University, Lhasa, 850000, China
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, 430072, China
| | - Min Zhou
- School of Life Sciences, Jianghan Universily, Wuhan 430056, China
| | - Wenjun Chen
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, 430072, China
- College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zufa Ding
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, 430072, China
- College of Fisheries, Huazhong Agricultural University, Wuhan, 430070, China
| | - Cheng Wang
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, 430072, China
- College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yuting Qian
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, 430072, China
- College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yang Liu
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, 430072, China
- College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shunping He
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, 430072, China
| | - Liandong Yang
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, 430072, China
| |
Collapse
|
12
|
Hafezqorani S, Nip KM, Birol I. ntEmbd: Deep learning embedding for nucleotide sequences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.30.591806. [PMID: 38746190 PMCID: PMC11092672 DOI: 10.1101/2024.04.30.591806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Enabled by the explosion of data and substantial increase in computational power, deep learning has transformed fields such as computer vision and natural language processing (NLP) and it has become a successful method to be applied to many transcriptomic analysis tasks. A core advantage of deep learning is its inherent capability to incorporate feature computation within the machine learning models. This results in a comprehensive and machine-readable representation of sequences, facilitating the downstream classification and clustering tasks. Compared to machine translation problems in NLP, feature embedding is particularly challenging for transcriptomic studies as the sequences are string of thousands of nucleotides in length, which make the long-term dependencies between features from different parts of the sequence even more difficult to capture. This highlights the need for nucleotide sequence embedding methods that are capable of learning input sequence features implicitly. Here we introduce ntEmbd, a deep learning embedding tool that captures dependencies between different features of the sequences and learns a latent representation for given nucleotide sequences. We further provide two sample use cases, describing how learned RNA features can be used in downstream analysis. The first use case demonstrates ntEmbd's utility in classifying coding and noncoding RNA benchmarked against existing tools, and the second one explores the utility of learned representations in identifying adapter sequences in nanopore RNA-seq reads. The tool as well as the trained models are freely available on GitHub at https://github.com/bcgsc/ntEmbd.
Collapse
Affiliation(s)
- Saber Hafezqorani
- 570 W 7 Ave, Michael Smith Genome Sciences Centre, BC Cancer, V5Z 4S6, Vancouver, BC, Canada
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada
| | - Ka Ming Nip
- 570 W 7 Ave, Michael Smith Genome Sciences Centre, BC Cancer, V5Z 4S6, Vancouver, BC, Canada
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada
| | - Inanc Birol
- 570 W 7 Ave, Michael Smith Genome Sciences Centre, BC Cancer, V5Z 4S6, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
13
|
da Silva RH, Silva MDD, Ferreira-Neto JRC, Souza BDB, de Araújo FN, Oliveira EJDS, Benko-Iseppon AM, da Costa AF, Kido ÉA. DEAD-Box RNA Helicase Family in Physic Nut ( Jatropha curcas L.): Structural Characterization and Response to Salinity. PLANTS (BASEL, SWITZERLAND) 2024; 13:905. [PMID: 38592921 PMCID: PMC10974417 DOI: 10.3390/plants13060905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/16/2024] [Accepted: 03/18/2024] [Indexed: 04/11/2024]
Abstract
Helicases, motor proteins present in both prokaryotes and eukaryotes, play a direct role in various steps of RNA metabolism. Specifically, SF2 RNA helicases, a subset of the DEAD-box family, are essential players in plant developmental processes and responses to biotic and abiotic stresses. Despite this, information on this family in the physic nut (Jatropha curcas L.) remains limited, spanning from structural patterns to stress responses. We identified 79 genes encoding DEAD-box RNA helicases (JcDHX) in the J. curcas genome. These genes were further categorized into three subfamilies: DEAD (42 genes), DEAH (30 genes), and DExH/D (seven genes). Characterization of the encoded proteins revealed a remarkable diversity, with observed patterns in domains, motifs, and exon-intron structures suggesting that the DEAH and DExH/D subfamilies in J. curcas likely contribute to the overall versatility of the family. Three-dimensional modeling of the candidates showed characteristic hallmarks, highlighting the expected functional performance of these enzymes. The promoter regions of the JcDHX genes revealed potential cis-elements such as Dof-type, BBR-BPC, and AP2-ERF, indicating their potential involvement in the response to abiotic stresses. Analysis of RNA-Seq data from the roots of physic nut accessions exposed to 150 mM of NaCl for 3 h showed most of the JcDHX candidates repressed. The protein-protein interaction network indicated that JcDHX proteins occupy central positions, connecting events associated with RNA metabolism. Quantitative PCR analysis validated the expression of nine DEAD-box RNA helicase transcripts, showing significant associations with key components of the stress response, including RNA turnover, ribosome biogenesis, DNA repair, clathrin-mediated vesicular transport, phosphatidyl 3,5-inositol synthesis, and mitochondrial translation. Furthermore, the induced expression of one transcript (JcDHX44) was confirmed, suggesting that it is a potential candidate for future functional analyses to better understand its role in salinity stress tolerance. This study represents the first global report on the DEAD-box family of RNA helicases in physic nuts and displays structural characteristics compatible with their functions, likely serving as a critical component of the plant's response pathways.
Collapse
Affiliation(s)
- Rahisa Helena da Silva
- Plant Molecular Genetics Laboratory, Genetics Department, Center of Biosciences, Federal University of Pernambuco, Recife CEP 50670-901, PE, Brazil
| | - Manassés Daniel da Silva
- Plant Molecular Genetics Laboratory, Genetics Department, Center of Biosciences, Federal University of Pernambuco, Recife CEP 50670-901, PE, Brazil
| | - José Ribamar Costa Ferreira-Neto
- Plant Genetics and Biotechnology Laboratory, Genetics Department, Center of Biosciences, Federal University of Pernambuco, Recife CEP 50670-901, PE, Brazil
| | - Bruna de Brito Souza
- Plant Molecular Genetics Laboratory, Genetics Department, Center of Biosciences, Federal University of Pernambuco, Recife CEP 50670-901, PE, Brazil
| | - Francielly Negreiros de Araújo
- Plant Molecular Genetics Laboratory, Genetics Department, Center of Biosciences, Federal University of Pernambuco, Recife CEP 50670-901, PE, Brazil
| | - Elvia Jéssica da Silva Oliveira
- Plant Molecular Genetics Laboratory, Genetics Department, Center of Biosciences, Federal University of Pernambuco, Recife CEP 50670-901, PE, Brazil
| | - Ana Maria Benko-Iseppon
- Plant Genetics and Biotechnology Laboratory, Genetics Department, Center of Biosciences, Federal University of Pernambuco, Recife CEP 50670-901, PE, Brazil
| | | | - Éderson Akio Kido
- Plant Molecular Genetics Laboratory, Genetics Department, Center of Biosciences, Federal University of Pernambuco, Recife CEP 50670-901, PE, Brazil
| |
Collapse
|
14
|
Valdivia-Francia F, Sendoel A. No country for old methods: New tools for studying microproteins. iScience 2024; 27:108972. [PMID: 38333695 PMCID: PMC10850755 DOI: 10.1016/j.isci.2024.108972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024] Open
Abstract
Microproteins encoded by small open reading frames (sORFs) have emerged as a fascinating frontier in genomics. Traditionally overlooked due to their small size, recent technological advancements such as ribosome profiling, mass spectrometry-based strategies and advanced computational approaches have led to the annotation of more than 7000 sORFs in the human genome. Despite the vast progress, only a tiny portion of these microproteins have been characterized and an important challenge in the field lies in identifying functionally relevant microproteins and understanding their role in different cellular contexts. In this review, we explore the recent advancements in sORF research, focusing on the new methodologies and computational approaches that have facilitated their identification and functional characterization. Leveraging these new tools hold great promise for dissecting the diverse cellular roles of microproteins and will ultimately pave the way for understanding their role in the pathogenesis of diseases and identifying new therapeutic targets.
Collapse
Affiliation(s)
- Fabiola Valdivia-Francia
- University of Zurich, Institute for Regenerative Medicine (IREM), Wagistrasse 12, 8952 Schlieren-Zurich, Switzerland
- Life Science Zurich Graduate School, Molecular Life Science Program, University of Zurich/ ETH Zurich, Schlieren-Zurich, Switzerland
| | - Ataman Sendoel
- University of Zurich, Institute for Regenerative Medicine (IREM), Wagistrasse 12, 8952 Schlieren-Zurich, Switzerland
| |
Collapse
|
15
|
Bonilauri B, Ribeiro AL, Spangenberg L, Dallagiovanna B. Unveiling Polysomal Long Non-Coding RNA Expression on the First Day of Adipogenesis and Osteogenesis in Human Adipose-Derived Stem Cells. Int J Mol Sci 2024; 25:2013. [PMID: 38396700 PMCID: PMC10888724 DOI: 10.3390/ijms25042013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 01/06/2024] [Accepted: 01/11/2024] [Indexed: 02/25/2024] Open
Abstract
Understanding the intricate molecular mechanisms governing the fate of human adipose-derived stem cells (hASCs) is essential for elucidating the delicate balance between adipogenic and osteogenic differentiation in both healthy and pathological conditions. Long non-coding RNAs (lncRNAs) have emerged as key regulators involved in lineage commitment and differentiation of stem cells, operating at various levels of gene regulation, including transcriptional, post-transcriptional, and post-translational processes. To gain deeper insights into the role of lncRNAs' in hASCs' differentiation, we conducted a comprehensive analysis of the lncRNA transcriptome (RNA-seq) and translatome (polysomal-RNA-seq) during a 24 h period of adipogenesis and osteogenesis. Our findings revealed distinct expression patterns between the transcriptome and translatome during both differentiation processes, highlighting 90 lncRNAs that are exclusively regulated in the polysomal fraction. These findings underscore the significance of investigating lncRNAs associated with ribosomes, considering their unique expression patterns and potential mechanisms of action, such as translational regulation and potential coding capacity for microproteins. Additionally, we identified specific lncRNA gene expression programs associated with adipogenesis and osteogenesis during the early stages of cell differentiation. By shedding light on the expression and potential functions of these polysome-associated lncRNAs, we aim to deepen our understanding of their involvement in the regulation of adipogenic and osteogenic differentiation, ultimately paving the way for novel therapeutic strategies and insights into regenerative medicine.
Collapse
Affiliation(s)
- Bernardo Bonilauri
- Stem Cell Basic Biology Laboratory (LABCET), Carlos Chagas Institute—Fiocruz/PR, Curitiba 81350-010, PR, Brazil;
- Stanford Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Annanda Lyra Ribeiro
- Stem Cell Basic Biology Laboratory (LABCET), Carlos Chagas Institute—Fiocruz/PR, Curitiba 81350-010, PR, Brazil;
| | - Lucía Spangenberg
- Bioinformatics Unit, Institut Pasteur de Montevideo, Montevideo 11400, Uruguay;
| | - Bruno Dallagiovanna
- Stem Cell Basic Biology Laboratory (LABCET), Carlos Chagas Institute—Fiocruz/PR, Curitiba 81350-010, PR, Brazil;
| |
Collapse
|
16
|
Mohsen JJ, Martel AA, Slavoff SA. Microproteins-Discovery, structure, and function. Proteomics 2023; 23:e2100211. [PMID: 37603371 PMCID: PMC10841188 DOI: 10.1002/pmic.202100211] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/22/2023]
Abstract
Advances in proteogenomic technologies have revealed hundreds to thousands of translated small open reading frames (sORFs) that encode microproteins in genomes across evolutionary space. While many microproteins have now been shown to play critical roles in biology and human disease, a majority of recently identified microproteins have little or no experimental evidence regarding their functionality. Computational tools have some limitations for analysis of short, poorly conserved microprotein sequences, so additional approaches are needed to determine the role of each member of this recently discovered polypeptide class. A currently underexplored avenue in the study of microproteins is structure prediction and determination, which delivers a depth of functional information. In this review, we provide a brief overview of microprotein discovery methods, then examine examples of microprotein structures (and, conversely, intrinsic disorder) that have been experimentally determined using crystallography, cryo-electron microscopy, and NMR, which provide insight into their molecular functions and mechanisms. Additionally, we discuss examples of predicted microprotein structures that have provided insight or context regarding their function. Analysis of microprotein structure at the angstrom level, and confirmation of predicted structures, therefore, has potential to identify translated microproteins that are of biological importance and to provide molecular mechanism for their in vivo roles.
Collapse
Affiliation(s)
- Jessica J. Mohsen
- Department of Chemistry, Yale University, New Haven, CT, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Alina A. Martel
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Sarah A. Slavoff
- Department of Chemistry, Yale University, New Haven, CT, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| |
Collapse
|
17
|
Wang Y, Pan Z, Mou M, Xia W, Zhang H, Zhang H, Liu J, Zheng L, Luo Y, Zheng H, Yu X, Lian X, Zeng Z, Li Z, Zhang B, Zheng M, Li H, Hou T, Zhu F. A task-specific encoding algorithm for RNAs and RNA-associated interactions based on convolutional autoencoder. Nucleic Acids Res 2023; 51:e110. [PMID: 37889083 PMCID: PMC10682500 DOI: 10.1093/nar/gkad929] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 08/01/2023] [Accepted: 10/10/2023] [Indexed: 10/28/2023] Open
Abstract
RNAs play essential roles in diverse physiological and pathological processes by interacting with other molecules (RNA/protein/compound), and various computational methods are available for identifying these interactions. However, the encoding features provided by existing methods are limited and the existing tools does not offer an effective way to integrate the interacting partners. In this study, a task-specific encoding algorithm for RNAs and RNA-associated interactions was therefore developed. This new algorithm was unique in (a) realizing comprehensive RNA feature encoding by introducing a great many of novel features and (b) enabling task-specific integration of interacting partners using convolutional autoencoder-directed feature embedding. Compared with existing methods/tools, this novel algorithm demonstrated superior performances in diverse benchmark testing studies. This algorithm together with its source code could be readily accessed by all user at: https://idrblab.org/corain/ and https://github.com/idrblab/corain/.
Collapse
Affiliation(s)
- Yunxia Wang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Weiqi Xia
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Hongning Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Jin Liu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Hanqi Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Xinyuan Yu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Xichen Lian
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Zhenyu Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Mingyue Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Honglin Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
- School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Polytechnic Institute, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-ZJU Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China
| |
Collapse
|
18
|
Valencia JD, Hendrix DA. Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task. PLoS Comput Biol 2023; 19:e1011526. [PMID: 37824580 PMCID: PMC10597526 DOI: 10.1371/journal.pcbi.1011526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 10/24/2023] [Accepted: 09/18/2023] [Indexed: 10/14/2023] Open
Abstract
Ribosomes are information-processing macromolecular machines that integrate complex sequence patterns in messenger RNA (mRNA) transcripts to synthesize proteins. Studies of the sequence features that distinguish mRNAs from long noncoding RNAs (lncRNAs) may yield insight into the information that directs and regulates translation. Computational methods for calculating protein-coding potential are important for distinguishing mRNAs from lncRNAs during genome annotation, but most machine learning methods for this task rely on previously known rules to define features. Sequence-to-sequence (seq2seq) models, particularly ones using transformer networks, have proven capable of learning complex grammatical relationships between words to perform natural language translation. Seeking to leverage these advancements in the biological domain, we present a seq2seq formulation for predicting protein-coding potential with deep neural networks and demonstrate that simultaneously learning translation from RNA to protein improves classification performance relative to a classification-only training objective. Inspired by classical signal processing methods for gene discovery and Fourier-based image-processing neural networks, we introduce LocalFilterNet (LFNet). LFNet is a network architecture with an inductive bias for modeling the three-nucleotide periodicity apparent in coding sequences. We incorporate LFNet within an encoder-decoder framework to test whether the translation task improves the classification of transcripts and the interpretation of their sequence features. We use the resulting model to compute nucleotide-resolution importance scores, revealing sequence patterns that could assist the cellular machinery in distinguishing mRNAs and lncRNAs. Finally, we develop a novel approach for estimating mutation effects from Integrated Gradients, a backpropagation-based feature attribution, and characterize the difficulty of efficient approximations in this setting.
Collapse
Affiliation(s)
- Joseph D. Valencia
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America
| | - David A. Hendrix
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon, United States of America
| |
Collapse
|
19
|
Klapproth C, Zötzsche S, Kühnl F, Fallmann J, Stadler P, Findeiß S. Tailored machine learning models for functional RNA detection in genome-wide screens. NAR Genom Bioinform 2023; 5:lqad072. [PMID: 37608800 PMCID: PMC10440787 DOI: 10.1093/nargab/lqad072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 06/28/2023] [Accepted: 07/30/2023] [Indexed: 08/24/2023] Open
Abstract
The in silico prediction of non-coding and protein-coding genetic loci has received considerable attention in comparative genomics aiming in particular at the identification of properties of nucleotide sequences that are informative of their biological role in the cell. We present here a software framework for the alignment-based training, evaluation and application of machine learning models with user-defined parameters. Instead of focusing on the one-size-fits-all approach of pervasive in silico annotation pipelines, we offer a framework for the structured generation and evaluation of models based on arbitrary features and input data, focusing on stable and explainable results. Furthermore, we showcase the usage of our software package in a full-genome screen of Drosophila melanogaster and evaluate our results against the well-known but much less flexible program RNAz.
Collapse
Affiliation(s)
- Christopher Klapproth
- Leipzig University, Department of Computer Science and Interdisciplinary Center of Bioinformatics, Bioinformatics Group, Härtelstrasse 16-18, D-04107 Leipzig, Germany
- ScaDS.AI Leipzig (Center for Scalable Data Analytics and Artificial Intelligence), Humboldtstraße 25, D-04105 Leipzig, Germany
| | - Siegfried Zötzsche
- Leipzig University, Department of Computer Science and Interdisciplinary Center of Bioinformatics, Bioinformatics Group, Härtelstrasse 16-18, D-04107 Leipzig, Germany
| | - Felix Kühnl
- Leipzig University, Department of Computer Science and Interdisciplinary Center of Bioinformatics, Bioinformatics Group, Härtelstrasse 16-18, D-04107 Leipzig, Germany
| | - Jörg Fallmann
- Leipzig University, Department of Computer Science and Interdisciplinary Center of Bioinformatics, Bioinformatics Group, Härtelstrasse 16-18, D-04107 Leipzig, Germany
| | - Peter F Stadler
- Leipzig University, Department of Computer Science and Interdisciplinary Center of Bioinformatics, Bioinformatics Group, Härtelstrasse 16-18, D-04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Science, Inselstraße 22, D-04103 Leipzig, Germany
- University of Vienna, Institute for Theoretical Chemistry, Währingerstraße 17, A-1090 Vienna, Austria
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe NM 97501, USA
- Universidad Nacional de Colombia, Facultad de Ciencias, Bogotá, D.C., Colombia
| | - Sven Findeiß
- Leipzig University, Department of Computer Science and Interdisciplinary Center of Bioinformatics, Bioinformatics Group, Härtelstrasse 16-18, D-04107 Leipzig, Germany
| |
Collapse
|
20
|
Batista da Silva I, Aciole Barbosa D, Kavalco KF, Nunes LR, Pasa R, Menegidio FB. Discovery of putative long non-coding RNAs expressed in the eyes of Astyanax mexicanus (Actinopterygii: Characidae). Sci Rep 2023; 13:12051. [PMID: 37491348 PMCID: PMC10368750 DOI: 10.1038/s41598-023-34198-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 04/25/2023] [Indexed: 07/27/2023] Open
Abstract
Astyanax mexicanus is a well-known model species, that has two morphotypes, cavefish, from subterranean rivers and surface fish, from surface rivers. They are morphologically distinct due to many troglomorphic traits in the cavefish, such as the absence of eyes. Most studies on A. mexicanus are focused on eye development and protein-coding genes involved in the process. However, lncRNAs did not get the same attention and very little is known about them. This study aimed to fill this knowledge gap, identifying, describing, classifying, and annotating lncRNAs expressed in the embryo's eye tissue of cavefish and surface fish. To do so, we constructed a concise workflow to assemble and evaluate transcriptomes, annotate protein-coding genes, ncRNAs families, predict the coding potential, identify putative lncRNAs, map them and predict interactions. This approach resulted in the identification of 33,069 and 19,493 putative lncRNAs respectively mapped in cavefish and surface fish. Thousands of these lncRNAs were annotated and identified as conserved in human and several species of fish. Hundreds of them were validated in silico, through ESTs. We identified lncRNAs associated with genes related to eye development. This is the case of a few lncRNAs associated with sox2, which we suggest being isomorphs of the SOX2-OT, a lncRNA that can regulate the expression of sox2. This work is one of the first studies to focus on the description of lncRNAs in A. mexicanus, highlighting several lncRNA targets and opening an important precedent for future studies focusing on lncRNAs expressed in A. mexicanus.
Collapse
Affiliation(s)
- Iuri Batista da Silva
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, MG, 31270-901, Brazil
- Laboratory of Ecological and Evolutionary Genetics, Institute of Biological and Health Sciences, Federal University of Viçosa Campus Rio Paranaíba, Rio Paranaíba, MG, 38810-000, Brazil
| | - David Aciole Barbosa
- Integrated Biotechnology Center, University of Mogi das Cruzes (UMC), Av. Dr. Cândido X. de Almeida and Souza, 200 - Centro Cívico, Mogi das Cruzes, SP, 08780-911, Brazil
| | - Karine Frehner Kavalco
- Laboratory of Ecological and Evolutionary Genetics, Institute of Biological and Health Sciences, Federal University of Viçosa Campus Rio Paranaíba, Rio Paranaíba, MG, 38810-000, Brazil
| | - Luiz R Nunes
- Center for Natural and Human Sciences, Federal University of ABC, São Bernardo do Campo, SP, 09606-045, Brazil
| | - Rubens Pasa
- Laboratory of Ecological and Evolutionary Genetics, Institute of Biological and Health Sciences, Federal University of Viçosa Campus Rio Paranaíba, Rio Paranaíba, MG, 38810-000, Brazil.
| | - Fabiano B Menegidio
- Integrated Biotechnology Center, University of Mogi das Cruzes (UMC), Av. Dr. Cândido X. de Almeida and Souza, 200 - Centro Cívico, Mogi das Cruzes, SP, 08780-911, Brazil.
| |
Collapse
|
21
|
Dong X, Zhang K, Xun C, Chu T, Liang S, Zeng Y, Liu Z. Small Open Reading Frame-Encoded Micro-Peptides: An Emerging Protein World. Int J Mol Sci 2023; 24:10562. [PMID: 37445739 DOI: 10.3390/ijms241310562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 06/20/2023] [Accepted: 06/21/2023] [Indexed: 07/15/2023] Open
Abstract
Small open reading frames (sORFs) are often overlooked features in genomes. In the past, they were labeled as noncoding or "transcriptional noise". However, accumulating evidence from recent years suggests that sORFs may be transcribed and translated to produce sORF-encoded polypeptides (SEPs) with less than 100 amino acids. The vigorous development of computational algorithms, ribosome profiling, and peptidome has facilitated the prediction and identification of many new SEPs. These SEPs were revealed to be involved in a wide range of basic biological processes, such as gene expression regulation, embryonic development, cellular metabolism, inflammation, and even carcinogenesis. To effectively understand the potential biological functions of SEPs, we discuss the history and development of the newly emerging research on sORFs and SEPs. In particular, we review a range of recently discovered bioinformatics tools for identifying, predicting, and validating SEPs as well as a variety of biochemical experiments for characterizing SEP functions. Lastly, this review underlines the challenges and future directions in identifying and validating sORFs and their encoded micropeptides, providing a significant reference for upcoming research on sORF-encoded peptides.
Collapse
Affiliation(s)
- Xiaoping Dong
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Kun Zhang
- The State Key Laboratory of Developmental Biology of Freshwater Fish, College of Life Science, Hunan Normal University, Changsha 410081, China
| | - Chengfeng Xun
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Tianqi Chu
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Songping Liang
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| | - Yong Zeng
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
- The State Key Laboratory of Developmental Biology of Freshwater Fish, College of Life Science, Hunan Normal University, Changsha 410081, China
| | - Zhonghua Liu
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, China
- Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha 410081, China
| |
Collapse
|
22
|
Deng L, Jiang Y, Hu X, Zheng R, Huang Z, Zhang J. ABLNCPP: Attention Mechanism-Based Bidirectional Long Short-Term Memory for Noncoding RNA Coding Potential Prediction. J Chem Inf Model 2023. [PMID: 37294848 DOI: 10.1021/acs.jcim.3c00366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
With the continuous development of ribosome profiling, sequencing technology, and proteomics, evidence is mounting that noncoding RNA (ncRNA) may be a novel source of peptides or proteins. These peptides and proteins play crucial roles in inhibiting tumor progression and interfering with cancer metabolism and other essential physiological processes. Therefore, identifying ncRNAs with coding potential is vital to ncRNA functional research. However, existing studies perform well in classifying ncRNAs and mRNAs, and no research has been explicitly raised to distinguish whether ncRNA transcripts have coding potential. For this reason, we propose an attention mechanism-based bidirectional LSTM network called ABLNCPP to assess the coding possibility of ncRNA sequences. Considering the sequential information loss in previous methods, we introduce a novel nonoverlapping trinucleotide embedding (NOLTE) method for ncRNAs to obtain embeddings containing sequential features. The extensive evaluations show that ABLNCPP outperforms other state-of-the-art models. In general, ABLNCPP overcomes the bottleneck of ncRNA coding potential prediction and is expected to provide valuable contributions to cancer discovery and treatment in the future. The source code and data sets are freely available at https://github.com/YinggggJ/ABLNCPP.
Collapse
Affiliation(s)
- Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Ying Jiang
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Xiaowen Hu
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Rongtao Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Zhijian Huang
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Jingpu Zhang
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan 467000, China
| |
Collapse
|
23
|
Ste-Croix DT, Bélanger RR, Mimee B. Single Nematode Transcriptomic Analysis, Using Long-Read Technology, Reveals Two Novel Virulence Gene Candidates in the Soybean Cyst Nematode, Heterodera glycines. Int J Mol Sci 2023; 24:ijms24119440. [PMID: 37298400 DOI: 10.3390/ijms24119440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 05/18/2023] [Accepted: 05/24/2023] [Indexed: 06/12/2023] Open
Abstract
The soybean cyst nematode (Heterodera glycines, SCN), is the most damaging disease of soybean in North America. While management of this pest using resistant soybean is generally still effective, prolonged exposure to cultivars derived from the same source of resistance (PI 88788) has led to the emergence of virulence. Currently, the underlying mechanisms responsible for resistance breakdown remain unknown. In this study, we combined a single nematode transcriptomic profiling approach with long-read sequencing to reannotate the SCN genome. This resulted in the annotation of 1932 novel transcripts and 281 novel gene features. Using a transcript-level quantification approach, we identified eight novel effector candidates overexpressed in PI 88788 virulent nematodes in the late infection stage. Among these were the novel gene Hg-CPZ-1 and a pioneer effector transcript generated through the alternative splicing of the non-effector gene Hetgly21698. While our results demonstrate that alternative splicing in effectors does occur, we found limited evidence of direct involvement in the breakdown of resistance. However, our analysis highlighted a distinct pattern of effector upregulation in response to PI 88788 resistance indicative of a possible adaptation process by SCN to host resistance.
Collapse
Affiliation(s)
- Dave T Ste-Croix
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC J3B 3E6, Canada
- Département de Phytologie, Université Laval, Québec, QC G1V 0A6, Canada
| | - Richard R Bélanger
- Département de Phytologie, Université Laval, Québec, QC G1V 0A6, Canada
- Centre de Recherche et d'Innovation sur les Végétaux (CRIV), Université Laval, Québec, QC G1V 0A6, Canada
| | - Benjamin Mimee
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC J3B 3E6, Canada
| |
Collapse
|
24
|
Valencia JD, Hendrix DA. Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.03.535488. [PMID: 37066250 PMCID: PMC10104019 DOI: 10.1101/2023.04.03.535488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Ribosomes are information-processing macromolecular machines that integrate complex sequence patterns in messenger RNA (mRNA) transcripts to synthesize proteins. Studies of the sequence features that distinguish mRNAs from long noncoding RNAs (lncRNAs) may yield insight into the information that directs and regulates translation. Computational methods for calculating protein-coding potential are important for distinguishing mRNAs from lncRNAs during genome annotation, but most machine learning methods for this task rely on previously known rules to define features. Sequence-to-sequence (seq2seq) models, particularly ones using transformer networks, have proven capable of learning complex grammatical relationships between words to perform natural language translation. Seeking to leverage these advancements in the biological domain, we present a seq2seq formulation for predicting protein-coding potential with deep neural networks and demonstrate that simultaneously learning translation from RNA to protein improves classification performance relative to a classification-only training objective. Inspired by classical signal processing methods for gene discovery and Fourier-based image-processing neural networks, we introduce LocalFilterNet (LFNet). LFNet is a network architecture with an inductive bias for modeling the three-nucleotide periodicity apparent in coding sequences. We incorporate LFNet within an encoder-decoder framework to test whether the translation task improves the classification of transcripts and the interpretation of their sequence features. We use the resulting model to compute nucleotide-resolution importance scores, revealing sequence patterns that could assist the cellular machinery in distinguishing mRNAs and lncRNAs. Finally, we develop a novel approach for estimating mutation effects from Integrated Gradients, a backpropagation-based feature attribution, and characterize the difficulty of efficient approximations in this setting.
Collapse
Affiliation(s)
- Joseph D. Valencia
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - David A. Hendrix
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
25
|
Katsushima K, Pokhrel R, Mahmud I, Yuan M, Murad R, Baral P, Zhou R, Chapagain P, Garrett T, Stapleton S, Jallo G, Bettegowda C, Raabe E, Wechsler-Reya RJ, Eberhart CG, Perera RJ. The oncogenic circular RNA circ_63706 is a potential therapeutic target in sonic hedgehog-subtype childhood medulloblastomas. Acta Neuropathol Commun 2023; 11:38. [PMID: 36899402 PMCID: PMC10007801 DOI: 10.1186/s40478-023-01521-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 01/24/2023] [Indexed: 03/12/2023] Open
Abstract
Medulloblastoma (MB) develops through various genetic, epigenetic, and non-coding (nc) RNA-related mechanisms, but the roles played by ncRNAs, particularly circular RNAs (circRNAs), remain poorly defined. CircRNAs are increasingly recognized as stable non-coding RNA therapeutic targets in many cancers, but little is known about their function in MBs. To determine medulloblastoma subgroup-specific circRNAs, publicly available RNA sequencing (RNA-seq) data from 175 MB patients were interrogated to identify circRNAs that differentiate between MB subgroups. circ_63706 was identified as sonic hedgehog (SHH) group-specific, with its expression confirmed by RNA-FISH analysis in clinical tissue samples. The oncogenic function of circ_63706 was characterized in vitro and in vivo. Further, circ_63706-depleted cells were subjected to RNA-seq and lipid profiling to identify its molecular function. Finally, we mapped the circ_63706 secondary structure using an advanced random forest classification model and modeled a 3D structure to identify its interacting miRNA partner molecules. Circ_63706 regulates independently of the host coding gene pericentrin (PCNT), and its expression is specific to the SHH subgroup. circ_63706-deleted cells implanted into mice produced smaller tumors, and mice lived longer than parental cell implants. At the molecular level, circ_63706-deleted cells elevated total ceramide and oxidized lipids and reduced total triglyceride. Our study implicates a novel oncogenic circular RNA in the SHH medulloblastoma subgroup and establishes its molecular function and potential as a future therapeutic target.
Collapse
Affiliation(s)
- Keisuke Katsushima
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, School of Medicine, Johns Hopkins University, 1650 Orleans St., Baltimore, MD, 21231, USA.,Johns Hopkins All Children's Hospital, St. Petersburg, USA
| | - Rudramani Pokhrel
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, School of Medicine, Johns Hopkins University, 1650 Orleans St., Baltimore, MD, 21231, USA.,Johns Hopkins All Children's Hospital, St. Petersburg, USA
| | - Iqbal Mahmud
- Department Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, USA.,Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Houston, USA
| | - Menglang Yuan
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, School of Medicine, Johns Hopkins University, 1650 Orleans St., Baltimore, MD, 21231, USA.,Johns Hopkins All Children's Hospital, St. Petersburg, USA
| | - Rabi Murad
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, USA
| | - Prabin Baral
- Department of Physics, Florida International University, Miami, USA
| | - Rui Zhou
- Johns Hopkins All Children's Hospital, St. Petersburg, USA
| | - Prem Chapagain
- Department of Physics, Florida International University, Miami, USA.,Biomolecular Sciences Institute, Florida International University, Miami, USA
| | - Timothy Garrett
- Department Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, USA
| | | | - George Jallo
- Johns Hopkins All Children's Hospital, St. Petersburg, USA.,Department of Neurosurgery, Johns Hopkins University School of Medicine, Baltimore, USA
| | - Chetan Bettegowda
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, School of Medicine, Johns Hopkins University, 1650 Orleans St., Baltimore, MD, 21231, USA.,Department of Neurosurgery, Johns Hopkins University School of Medicine, Baltimore, USA
| | - Eric Raabe
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, School of Medicine, Johns Hopkins University, 1650 Orleans St., Baltimore, MD, 21231, USA.,Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, USA
| | | | - Charles G Eberhart
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, School of Medicine, Johns Hopkins University, 1650 Orleans St., Baltimore, MD, 21231, USA.,Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, USA
| | - Ranjan J Perera
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, School of Medicine, Johns Hopkins University, 1650 Orleans St., Baltimore, MD, 21231, USA. .,Johns Hopkins All Children's Hospital, St. Petersburg, USA. .,Department of Neurosurgery, Johns Hopkins University School of Medicine, Baltimore, USA.
| |
Collapse
|
26
|
Feng H, Wang S, Wang Y, Ni X, Yang Z, Hu X, Sen Yang. LncCat: An ORF attention model to identify LncRNA based on ensemble learning strategy and fused sequence information. Comput Struct Biotechnol J 2023; 21:1433-1447. [PMID: 36824229 PMCID: PMC9941877 DOI: 10.1016/j.csbj.2023.02.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 02/06/2023] [Accepted: 02/06/2023] [Indexed: 02/10/2023] Open
Abstract
Background Long non-coding RNA (lncRNA) is one of the most essential forms of transcripts, playing crucial regulatory roles in the development of cancers and diseases without protein-coding ability. It was assumed that short ORFs (sORFs) in lncRNA were weak to translate proteins. However, recent research has shown that sORFs can encode peptides, which increases the difficulty to identify lncRNA. Therefore, identifying lncRNAs with sORFs facilitates finding novel regulatory factors. Results In this paper, we propose LncCat for identifying lncRNA based on category boosting (CatBoost) and ORF-attention features. LncCat combines five types of features to encode transcript sequences and employs CatBoost to build a prediction model. In addition, the visualization comparison reveals that the ORF-attention features between lncRNAs and protein-coding transcripts are significantly distinct. The comparison results show that LncCat outperforms competing methods on several benchmark datasets. For Matthew's Correlation Coefficient (MCC), LncCat achieves 0.9503, 0.9219, 0.8591, 0.8672, and 0.9047 on the human, mouse, zebrafish, wheat, and chicken datasets, with improvements ranging from 1.90% to 7.82%, 1.49-17.63%, 6.11-21.50%, 3.02-51.64% and 5.35-26.90%, respectively. Moreover, LncCat dramatically improves the MCC by at least 11.90%, 12.96% and 42.61% on sORF test datasets of human, mouse, and zebrafish, respectively. Conclusions Experiments indicate that LncCat performs better both on long ORF and sORF datasets, and ORF-attention features show positive effects on predicting lncRNA. In brief, LncCat is a reliable method for identifying lncRNA. Additionally, a user-friendly web server is developed for academics at http://cczubio.top/lnccat.
Collapse
Affiliation(s)
- Hongqi Feng
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
| | - Shaocong Wang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Xinye Ni
- The Affiliated Changzhou No.2 People’s Hospital of Nanjing Medical University, Changzhou 213164, China
| | - Zexi Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
| | - Xuemei Hu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
- The Affiliated Changzhou No.2 People’s Hospital of Nanjing Medical University, Changzhou 213164, China
| |
Collapse
|
27
|
Transcriptomic Analysis of Long Non-Coding RNA during Candida albicans Infection. Genes (Basel) 2023; 14:genes14020251. [PMID: 36833177 PMCID: PMC9956080 DOI: 10.3390/genes14020251] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 01/07/2023] [Accepted: 01/16/2023] [Indexed: 01/20/2023] Open
Abstract
Candida albicans is one of the most commonly found species in fungal infections. Due to its clinical importance, molecular aspects of the host immune defense against the fungus are of interest to biomedical sciences. Long non-coding RNAs (lncRNAs) have been investigated in different pathologies and gained widespread attention regarding their role as gene regulators. However, the biological processes in which most lncRNAs perform their function are still unclear. This study investigates the association between lncRNAs with host response to C. albicans using a public RNA-Seq dataset from lung samples of female C57BL/6J wild-type Mus musculus with induced C. albicans infection. The animals were exposed to the fungus for 24 h before sample collection. We selected lncRNAs and protein-coding genes related to the host immune response by combining the results from different computational approaches used for gene selection: differential expression gene analysis, co-expression genes network analysis, and machine learning-based gene selection. Using a guilt by association strategy, we inferred connections between 41 lncRNAs and 25 biological processes. Our results indicated that nine up-regulated lncRNAs were associated with biological processes derived from the response to wounding: 1200007C13Rik, 4833418N02Rik, Gm12840, Gm15832, Gm20186, Gm38037, Gm45774, Gm4610, Mir22hg, and Mirt1. Additionally, 29 lncRNAs were related to genes involved in immune response, while 22 lncRNAs were associated with processes related to reactive species production. These results support the participation of lncRNAs during C. albicans infection, and may contribute to new studies investigating lncRNA functions in the immune response.
Collapse
|
28
|
Nabi A, Dilekoglu B, Adebali O, Tastan O. Discovering misannotated lncRNAs using deep learning training dynamics. Bioinformatics 2023; 39:6960922. [PMID: 36571493 PMCID: PMC9825752 DOI: 10.1093/bioinformatics/btac821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 10/05/2022] [Accepted: 12/23/2022] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION Recent experimental evidence has shown that some long non-coding RNAs (lncRNAs) contain small open reading frames (sORFs) that are translated into functional micropeptides, suggesting that these lncRNAs are misannotated as non-coding. Current methods to detect misannotated lncRNAs rely on ribosome-profiling (Ribo-Seq) and mass-spectrometry experiments, which are cell-type dependent and expensive. RESULTS Here, we propose a computational method to identify possible misannotated lncRNAs from sequence information alone. Our approach first builds deep learning models to discriminate coding and non-coding transcripts and leverages these models' training dynamics to identify misannotated lncRNAs-i.e. lncRNAs with coding potential. The set of misannotated lncRNAs we identified significantly overlap with experimentally validated ones and closely resemble coding protein sequences as evidenced by significant BLAST hits. Our analysis on a subset of misannotated lncRNA candidates also shows that some ORFs they contain yield high confidence folded structures as predicted by AlphaFold2. This methodology offers promising potential for assisting experimental efforts in characterizing the hidden proteome encoded by misannotated lncRNAs and for curating better datasets for building coding potential predictors. AVAILABILITY AND IMPLEMENTATION Source code is available at https://github.com/nabiafshan/DetectingMisannotatedLncRNAs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Afshan Nabi
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Berke Dilekoglu
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Ogun Adebali
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | | |
Collapse
|
29
|
Singh D, Roy J. A large-scale benchmark study of tools for the classification of protein-coding and non-coding RNAs. Nucleic Acids Res 2022; 50:12094-12111. [PMID: 36420898 PMCID: PMC9757047 DOI: 10.1093/nar/gkac1092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 10/22/2022] [Accepted: 10/28/2022] [Indexed: 11/27/2022] Open
Abstract
Identification of protein-coding and non-coding transcripts is paramount for understanding their biological roles. Computational approaches have been addressing this task for over a decade; however, generalized and high-performance models are still unreliable. This benchmark study assessed the performance of 24 tools producing >55 models on the datasets covering a wide range of species. We have collected 135 small and large transcriptomic datasets from existing studies for comparison and identified the potential bottlenecks hampering the performance of current tools. The key insights of this study include lack of standardized training sets, reliance on homogeneous training data, gradual changes in annotated data, lack of augmentation with homology searches, the presence of false positives and negatives in datasets and the lower performance of end-to-end deep learning models. We also derived a new dataset, RNAChallenge, from the benchmark considering hard instances that may include potential false alarms. The best and least well performing models under- and overfit the dataset, respectively, thereby serving a dual purpose. For computational approaches, it will be valuable to develop accurate and unbiased models. The identification of false alarms will be of interest for genome annotators, and experimental study of hard RNAs will help to untangle the complexity of the RNA world.
Collapse
Affiliation(s)
- Dalwinder Singh
- To whom correspondence should be addressed. Tel: +91 172 5221206;
| | - Joy Roy
- Correspondence may also be addressed to Joy Roy.
| |
Collapse
|
30
|
Zhang M, Zhao J, Li C, Ge F, Wu J, Jiang B, Song J, Song X. csORF-finder: an effective ensemble learning framework for accurate identification of multi-species coding short open reading frames. Brief Bioinform 2022; 23:bbac392. [PMID: 36094083 PMCID: PMC9677467 DOI: 10.1093/bib/bbac392] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 08/03/2022] [Accepted: 08/11/2022] [Indexed: 12/14/2022] Open
Abstract
Short open reading frames (sORFs) refer to the small nucleic fragments no longer than 303 nt in length that probably encode small peptides. To date, translatable sORFs have been found in both untranslated regions of messenger ribonucleic acids (RNAs; mRNAs) and long non-coding RNAs (lncRNAs), playing vital roles in a myriad of biological processes. As not all sORFs are translated or essentially translatable, it is important to develop a highly accurate computational tool for characterizing the coding potential of sORFs, thereby facilitating discovery of novel functional peptides. In light of this, we designed a series of ensemble models by integrating Efficient-CapsNet and LightGBM, collectively termed csORF-finder, to differentiate the coding sORFs (csORFs) from non-coding sORFs in Homo sapiens, Mus musculus and Drosophila melanogaster, respectively. To improve the performance of csORF-finder, we introduced a novel feature encoding scheme named trinucleotide deviation from expected mean (TDE) and computed all types of in-frame sequence-based features, such as i-framed-3mer, i-framed-CKSNAP and i-framed-TDE. Benchmarking results showed that these features could significantly boost the performance compared to the original 3-mer, CKSNAP and TDE features. Our performance comparisons showed that csORF-finder achieved a superior performance than the state-of-the-art methods for csORF prediction on multi-species and non-ATG initiation independent test datasets. Furthermore, we applied csORF-finder to screen the lncRNA datasets for identifying potential csORFs. The resulting data serve as an important computational repository for further experimental validation. We hope that csORF-finder can be exploited as a powerful platform for high-throughput identification of csORFs and functional characterization of these csORFs encoded peptides.
Collapse
Affiliation(s)
- Meng Zhang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
| | - Jian Zhao
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Fang Ge
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Jing Wu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Bin Jiang
- College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Xiaofeng Song
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
| |
Collapse
|
31
|
Lin R, Wichadakul D. Interpretable Deep Learning Model Reveals Subsequences of Various Functions for Long Non-Coding RNA Identification. Front Genet 2022; 13:876721. [PMID: 35685437 PMCID: PMC9173695 DOI: 10.3389/fgene.2022.876721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 04/11/2022] [Indexed: 11/13/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) play crucial roles in many biological processes and are implicated in several diseases. With the next-generation sequencing technologies, substantial unannotated transcripts have been discovered. Classifying unannotated transcripts using biological experiments are more time-consuming and expensive than computational approaches. Several tools are available for identifying long non-coding RNAs. These tools, however, did not explain the features in their tools that contributed to the prediction results. Here, we present Xlnc1DCNN, a tool for distinguishing long non-coding RNAs (lncRNAs) from protein-coding transcripts (PCTs) using a one-dimensional convolutional neural network with prediction explanations. The evaluation results of the human test set showed that Xlnc1DCNN outperformed other state-of-the-art tools in terms of accuracy and F1-score. The explanation results revealed that lncRNA transcripts were mainly identified as sequences with no conserved regions, short patterns with unknown functions, or only regions of transmembrane helices while protein-coding transcripts were mostly classified by conserved protein domains or families. The explanation results also conveyed the probably inconsistent annotations among the public databases, lncRNA transcripts which contain protein domains, protein families, or intrinsically disordered regions (IDRs). Xlnc1DCNN is freely available at https://github.com/cucpbioinfo/Xlnc1DCNN.
Collapse
Affiliation(s)
- Rattaphon Lin
- Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Pathumwan, Thailand
| | - Duangdao Wichadakul
- Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Pathumwan, Thailand
- Center of Excellence in Systems Biology, Faculty of Medicine, Chulalongkorn University, Pathumwan, Thailand
| |
Collapse
|
32
|
Žarković M, Hufsky F, Markert UR, Marz M. The Role of Non-Coding RNAs in the Human Placenta. Cells 2022; 11:1588. [PMID: 35563893 PMCID: PMC9104507 DOI: 10.3390/cells11091588] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 05/01/2022] [Accepted: 05/03/2022] [Indexed: 12/11/2022] Open
Abstract
Non-coding RNAs (ncRNAs) play a central and regulatory role in almost all cells, organs, and species, which has been broadly recognized since the human ENCODE project and several other genome projects. Nevertheless, a small fraction of ncRNAs have been identified, and in the placenta they have been investigated very marginally. To date, most examples of ncRNAs which have been identified to be specific for fetal tissues, including placenta, are members of the group of microRNAs (miRNAs). Due to their quantity, it can be expected that the fairly larger group of other ncRNAs exerts far stronger effects than miRNAs. The syncytiotrophoblast of fetal origin forms the interface between fetus and mother, and releases permanently extracellular vesicles (EVs) into the maternal circulation which contain fetal proteins and RNA, including ncRNA, for communication with neighboring and distant maternal cells. Disorders of ncRNA in placental tissue, especially in trophoblast cells, and in EVs seem to be involved in pregnancy disorders, potentially as a cause or consequence. This review summarizes the current knowledge on placental ncRNA, their transport in EVs, and their involvement and pregnancy pathologies, as well as their potential for novel diagnostic tools.
Collapse
Affiliation(s)
- Milena Žarković
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany; (M.Ž.); (F.H.)
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
- Placenta Lab, Department of Obstetrics, University Hospital Jena, Am Klinikum 1, 07747 Jena, Germany;
| | - Franziska Hufsky
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany; (M.Ž.); (F.H.)
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| | - Udo R. Markert
- Placenta Lab, Department of Obstetrics, University Hospital Jena, Am Klinikum 1, 07747 Jena, Germany;
| | - Manja Marz
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany; (M.Ž.); (F.H.)
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
- FLI Leibniz Institute for Age Research, Beutenbergstraße 11, 07745 Jena, Germany
- Aging Research Center (ARC), 07745 Jena, Germany
| |
Collapse
|
33
|
Jiménez-Gómez I, Valdés-Muñoz G, Moreno-Ulloa A, Pérez-Llano Y, Moreno-Perlín T, Silva-Jiménez H, Barreto-Curiel F, Sánchez-Carbente MDR, Folch-Mallol JL, Gunde-Cimerman N, Lago-Lestón A, Batista-García RA. Surviving in the Brine: A Multi-Omics Approach for Understanding the Physiology of the Halophile Fungus Aspergillus sydowii at Saturated NaCl Concentration. Front Microbiol 2022; 13:840408. [PMID: 35586858 PMCID: PMC9108488 DOI: 10.3389/fmicb.2022.840408] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 04/07/2022] [Indexed: 11/30/2022] Open
Abstract
Although various studies have investigated osmoadaptations of halophilic fungi to saline conditions, only few analyzed the fungal mechanisms occurring at saturated NaCl concentrations. Halophilic Aspergillus sydowii is a model organism for the study of molecular adaptations of filamentous fungi to hyperosmolarity. For the first time a multi-omics approach (i.e., transcriptomics and metabolomics) was used to compare A. sydowii at saturated concentration (5.13 M NaCl) to optimal salinity (1 M NaCl). Analysis revealed 1,842 genes differentially expressed of which 704 were overexpressed. Most differentially expressed genes were involved in metabolism and signal transduction. A gene ontology multi-scale network showed that ATP binding constituted the main network node with direct interactions to phosphorelay signal transduction, polysaccharide metabolism, and transferase activity. Free amino acids significantly decreased and amino acid metabolism was reprogrammed at 5.13 M NaCl. mRNA transcriptional analysis revealed upregulation of genes involved in methionine and cysteine biosynthesis at extreme water deprivation by NaCl. No modifications of membrane fatty acid composition occurred. Upregulated genes were involved in high-osmolarity glycerol signal transduction pathways, biosynthesis of β-1,3-glucans, and cross-membrane ion transporters. Downregulated genes were related to the synthesis of chitin, mannose, cell wall proteins, starvation, pheromone synthesis, and cell cycle. Non-coding RNAs represented the 20% of the total transcripts with 7% classified as long non-coding RNAs (lncRNAs). The 42% and 69% of the total lncRNAs and RNAs encoding transcription factors, respectively, were differentially expressed. A network analysis showed that differentially expressed lncRNAs and RNAs coding transcriptional factors were mainly related to the regulation of metabolic processes, protein phosphorylation, protein kinase activity, and plasma membrane composition. Metabolomic analyses revealed more complex and unknown metabolites at saturated NaCl concentration than at optimal salinity. This study is the first attempt to unravel the molecular ecology of an ascomycetous fungus at extreme water deprivation by NaCl (5.13 M). This work also represents a pioneer study to investigate the importance of lncRNAs and transcriptional factors in the transcriptomic response to high NaCl stress in halophilic fungi.
Collapse
Affiliation(s)
- Irina Jiménez-Gómez
- Centro de Investigación en Dinámica Celular, Instituto de Investigación en Ciencias Básicas y Aplicadas, Universidad Autónoma del Estado de Morelos, Cuernavaca, Mexico
| | - Gisell Valdés-Muñoz
- Centro de Investigación en Dinámica Celular, Instituto de Investigación en Ciencias Básicas y Aplicadas, Universidad Autónoma del Estado de Morelos, Cuernavaca, Mexico
| | - Aldo Moreno-Ulloa
- Departamento de Innovación Biomédica, Centro de Investigación Científica y de Educación Superior de Ensenada, Ensenada, Mexico
| | - Yordanis Pérez-Llano
- Centro de Investigación en Dinámica Celular, Instituto de Investigación en Ciencias Básicas y Aplicadas, Universidad Autónoma del Estado de Morelos, Cuernavaca, Mexico
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Tonatiuh Moreno-Perlín
- Centro de Investigación en Dinámica Celular, Instituto de Investigación en Ciencias Básicas y Aplicadas, Universidad Autónoma del Estado de Morelos, Cuernavaca, Mexico
| | - Hortencia Silva-Jiménez
- Instituto de Investigaciones Oceanológicas, Universidad Autónoma de Baja California, Ensenada, Mexico
| | | | | | - Jorge Luis Folch-Mallol
- Centro de Investigación en Biotecnología, Universidad Autónoma del Estado de Morelos, Cuernavaca, Mexico
| | - Nina Gunde-Cimerman
- Department of Biology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Asunción Lago-Lestón
- Departamento de Innovación Biomédica, Centro de Investigación Científica y de Educación Superior de Ensenada, Ensenada, Mexico
| | - Ramón Alberto Batista-García
- Centro de Investigación en Dinámica Celular, Instituto de Investigación en Ciencias Básicas y Aplicadas, Universidad Autónoma del Estado de Morelos, Cuernavaca, Mexico
- *Correspondence: Ramón Alberto Batista-García, ;
| |
Collapse
|
34
|
Aciole Barbosa D, Araújo BC, Branco GS, Simeone AS, Hilsdorf AWS, Jabes DL, Nunes LR, Moreira RG, Menegidio FB. Transcriptomic Profiling and Microsatellite Identification in Cobia (Rachycentron canadum), Using High-Throughput RNA Sequencing. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2022; 24:255-262. [PMID: 34855031 DOI: 10.1007/s10126-021-10081-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 11/03/2021] [Indexed: 06/13/2023]
Abstract
Cobia (Rachycentron canadum) is a marine teleost species with great productive potential worldwide. However, the genomic information currently available for this species in public databases is limited. Such lack of information hinders gene expression assessments that might bring forward novel insights into the physiology, ecology, evolution, and genetics of this potential aquaculture species. In this study, we report the first de novo transcriptome assembly of R. canadum liver, improving the availability of novel gene sequences for this species. Illumina sequencing of liver transcripts generated 1,761,965,794 raw reads, which were filtered into 1,652,319,304 high-quality reads. De novo assembly resulted in 101,789 unigenes and 163,096 isoforms, with an average length of 950.61 and 1,617.34 nt, respectively. Moreover, we found that 126,013 of these transcripts bear potentially coding sequences, and 125,993 of these elements (77.3%) correspond to functionally annotated genes found in six different databases. We also identified 701 putative ncRNA and 35,414 putative lncRNA. Interestingly, homologues for 410 of these putative lncRNAs have already been observed in previous analyses with Danio rerio, Lates calcarifer, Seriola lalandi dorsalis, Seriola dumerili, or Echeneis naucrates. Finally, we identified 7894 microsatellites related to cobia's putative lncRNAs. Thus, the information derived from the transcriptome assembly described herein will likely assist future nutrigenomics and breeding programs involving this important fish farming species.
Collapse
Affiliation(s)
- David Aciole Barbosa
- Center of Biotechnology, University of Mogi das Cruzes, Av. Dr. Cândido X. de Almeida e Souza, 200 - Centro Cívico, Mogi das Cruzes, SP, 08780-911, Brazil
| | | | - Giovana Souza Branco
- Department of Physiology, Bioscience Institute, University of São Paulo, São Paulo, SP, 05508-090, Brazil
| | - Alexandre S Simeone
- Center of Biotechnology, University of Mogi das Cruzes, Av. Dr. Cândido X. de Almeida e Souza, 200 - Centro Cívico, Mogi das Cruzes, SP, 08780-911, Brazil
| | - Alexandre W S Hilsdorf
- Center of Biotechnology, University of Mogi das Cruzes, Av. Dr. Cândido X. de Almeida e Souza, 200 - Centro Cívico, Mogi das Cruzes, SP, 08780-911, Brazil
| | - Daniela L Jabes
- Center of Biotechnology, University of Mogi das Cruzes, Av. Dr. Cândido X. de Almeida e Souza, 200 - Centro Cívico, Mogi das Cruzes, SP, 08780-911, Brazil
| | - Luiz R Nunes
- Center for Natural and Human Sciences, Federal University of ABC, Santo André, SP, 09210-580, Brazil
| | - Renata G Moreira
- Department of Physiology, Bioscience Institute, University of São Paulo, São Paulo, SP, 05508-090, Brazil
| | - Fabiano B Menegidio
- Center of Biotechnology, University of Mogi das Cruzes, Av. Dr. Cândido X. de Almeida e Souza, 200 - Centro Cívico, Mogi das Cruzes, SP, 08780-911, Brazil.
| |
Collapse
|
35
|
Matsuno Y, Kusama K, Imakawa K. Characterization of lncRNA functioning in ovine conceptuses and endometria during the peri-implantation period. Biochem Biophys Res Commun 2022; 594:22-30. [PMID: 35066376 DOI: 10.1016/j.bbrc.2022.01.064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Accepted: 01/16/2022] [Indexed: 11/26/2022]
Abstract
In ruminants, RNA-sequence analyses have revealed many characteristics of transcripts expressed in conceptuses (embryo and extraembryonic membrane) during peri-implantation periods; however, lncRNA profiles are yet characterized. In this study, we aimed to characterize the lncRNA expression profile in conceptuses during peri-implantation periods in sheep. We analyzed the RNA-sequence data of ovine conceptuses and endometria obtained from pregnant animals on days 15, 17, 19 and 21 (day 0 = day of estrus, n = 3 or 4/day). We predicted the protein coding ability of the assembled transcripts to identify the lncRNA candidates. This analysis identified 8808 lncRNAs, 3423 of which were novel lncRNAs. Gene ontology analysis revealed that lncRNA target genes were enriched for biological processes involved in the respiratory electron transport chain (RETC). qPCR analysis demonstrated that the expression levels on transcripts encoding RETC such as mitochondrially encoded cytochrome c oxidase II (MTCO2) and mitochondria DNA copy number in conceptuses were not increased on P21, although western blotting analysis and immunohistochemistry demonstrated that MTCO2 protein in conceptuses was increased on P21. NAD/NADH assay revealed that NADH level in conceptuses was increased on P21. These results indicate that lncRNAs could regulate the RETC through post-transcriptional levels in the conceptuses. Therefore, lncRNA is a potential new regulator in ovine conceptus development during peri-implantation periods.
Collapse
Affiliation(s)
- Yuta Matsuno
- Laboratory of Molecular Reproduction, Research Institute of Agriculture, Tokai University, Kumamoto, Kumamoto, Japan
| | - Kazuya Kusama
- Department of Endocrine Pharmacology, Tokyo University of Pharmacy and Life Sciences, Hachioji, Tokyo, Japan
| | - Kazuhiko Imakawa
- Laboratory of Molecular Reproduction, Research Institute of Agriculture, Tokai University, Kumamoto, Kumamoto, Japan.
| |
Collapse
|
36
|
Contreras-Moreira B, Del Río ÁR, Cantalapiedra CP, Sancho R, Vinuesa P. Pangenome Analysis of Plant Transcripts and Coding Sequences. Methods Mol Biol 2022; 2512:121-152. [PMID: 35818004 DOI: 10.1007/978-1-0716-2429-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The pangenome of a species is the sum of the genomes of its individuals. As coding sequences often represent only a small fraction of each genome, analyzing the pangene set can be a cost-effective strategy for plants with large genomes or highly heterozygous species. Here, we describe a step-by-step protocol to analyze plant pangene sets with the software GET_HOMOLOGUES-EST . After a short introduction, where the main concepts are illustrated, the remaining sections cover the installation and typical operations required to analyze and annotate pantranscriptomes and gene sets of plants. The recipes include instructions on how to call core and accessory genes, how to compute a presence-absence pangenome matrix, and how to identify and analyze private genes, present only in some genotypes. Downstream phylogenetic analyses are also discussed.
Collapse
Affiliation(s)
| | | | | | - Rubén Sancho
- Estación Experimental de Aula Dei-CSIC, Zaragoza, Spain
- Escuela Politécnica Superior, Universidad de Zaragoza, Huesca, Spain
| | - Pablo Vinuesa
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| |
Collapse
|
37
|
Lohse K, García-Berro A, Talavera G. The genome sequence of the red admiral, Vanessa atalanta (Linnaeus, 1758). Wellcome Open Res 2021. [DOI: 10.12688/wellcomeopenres.17524.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
We present a genome assembly from an individual female Vanessa atalanta (the red admiral; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 370 megabases in span. The majority of the assembly (99.44%) is scaffolded into 32 chromosomal pseudomolecules, with the W and Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 12,493 protein coding genes.
Collapse
|
38
|
Zhang Y, Long Y, Kwoh CK. Class similarity network for coding and long non-coding RNA classification. BMC Bioinformatics 2021; 22:609. [PMID: 34930120 PMCID: PMC8691036 DOI: 10.1186/s12859-021-04517-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 12/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) play significant roles in varieties of physiological and pathological processes.The premise of the lncRNA functional study is that the lncRNAs are identified correctly. Recently, deep learning method like convolutional neural network (CNN) has been successfully applied to identify the lncRNAs. However, the traditional CNN considers little relationships among samples via an indirect way. RESULTS Inspired by the Siamese Neural Network (SNN), here we propose a novel network named Class Similarity Network in coding RNA and lncRNA classification. Class Similarity Network considers more relationships among input samples in a direct way. It focuses on exploring the potential relationships between input samples and samples from both the same class and the different classes. To achieve this, Class Similarity Network trains the parameters specific to each class to obtain the high-level features and represents the general similarity to each class in a node. The comparison results on the validation dataset under the same conditions illustrate the superiority of our Class Similarity Network to the baseline CNN. Besides, our method performs effectively and achieves state-of-the-art performances on two test datasets. CONCLUSIONS We construct Class Similarity Network in coding RNA and lncRNA classification, which is shown to work effectively on two different datasets by achieving accuracy, precision, and F1-score as 98.43%, 0.9247, 0.9374, and 97.54%, 0.9990, 0.9860, respectively.
Collapse
Affiliation(s)
- Yu Zhang
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore.,Wellcome Trust - Medical Research Council Cambridge Stem Cell Institute, Cambridge, CB2 0AW, UK
| | - Yahui Long
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410000, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore.
| |
Collapse
|
39
|
Hayward A, Wright C. The genome sequence of the holly blue, Celastrina argiolus (Linnaeus, 1758). Wellcome Open Res 2021; 6:340. [PMID: 35028429 PMCID: PMC8729184 DOI: 10.12688/wellcomeopenres.17478.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/07/2021] [Indexed: 11/22/2022] Open
Abstract
We present a genome assembly from an individual male Celastrina argiolus) (the holly blue; Arthropoda; Insecta; Lepidoptera; Lycaenidae). The genome sequence is 499 megabases in span. The majority (99.99%) of the assembly is scaffolded into 26 chromosomal pseudomolecules, with the Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 12,199 protein coding genes.
Collapse
Affiliation(s)
- Alex Hayward
- College of Life and Environmental Sciences, Department of Biosciences, University of Exeter, Penryn, UK
| | - Charlotte Wright
- Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | - Darwin Tree of Life Barcoding collective
- College of Life and Environmental Sciences, Department of Biosciences, University of Exeter, Penryn, UK
- Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | | | | | - Tree of Life Core Informatics collective
- College of Life and Environmental Sciences, Department of Biosciences, University of Exeter, Penryn, UK
- Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | | |
Collapse
|
40
|
Chen L, Yang Y, Zhang Y, Li K, Cai H, Wang H, Zhao Q. The Small Open Reading Frame-Encoded Peptides: Advances in Methodologies and Functional Studies. Chembiochem 2021; 23:e202100534. [PMID: 34862721 DOI: 10.1002/cbic.202100534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 11/15/2021] [Indexed: 11/07/2022]
Abstract
Small open reading frames (sORFs) are an important class of genes with less than 100 codons. They were historically annotated as noncoding or even junk sequences. In recent years, accumulating evidence suggests that sORFs could encode a considerable number of polypeptides, many of which play important roles in both physiology and disease pathology. However, it has been technically challenging to directly detect sORF-encoded peptides (SEPs). Here, we discuss the latest advances in methodologies for identifying SEPs with mass spectrometry, as well as the progress on functional studies of SEPs.
Collapse
Affiliation(s)
- Lei Chen
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China.,Laboratory for Synthetic Chemistry and Chemical Biology Limited, Hong Kong Science and Technology Park, New Territories, Hong Kong SAR, 999077, P. R. China
| | - Ying Yang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Yuanliang Zhang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Kecheng Li
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510623, P. R. China
| | - Hongwei Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510623, P. R. China
| | - Qian Zhao
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| |
Collapse
|
41
|
Lohse K, Wright C, Talavera G, García-Berro A. The genome sequence of the painted lady, Vanessa cardui Linnaeus 1758. Wellcome Open Res 2021; 6:324. [PMID: 37008186 PMCID: PMC10061037 DOI: 10.12688/wellcomeopenres.17358.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/02/2021] [Indexed: 11/20/2022] Open
Abstract
We present a genome assembly from an individual female Vanessa cardui (the painted lady; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 425 megabases in span. The majority of the assembly is scaffolded into 32 chromosomal pseudomolecules, with the W and Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 12,821 protein coding genes.
Collapse
Affiliation(s)
- Konrad Lohse
- Institute of Evolutionary Biology, University of Edinburgh, Edingburgh, UK
| | | | - Gerard Talavera
- Institut Botànic de Barcelona (IBB, CSIC-Ajuntament de Barcelona), Barcelona, Spain
| | - Aurora García-Berro
- Institut Botànic de Barcelona (IBB, CSIC-Ajuntament de Barcelona), Barcelona, Spain
| | - Darwin Tree of Life Barcoding collective
- Institute of Evolutionary Biology, University of Edinburgh, Edingburgh, UK
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
- Institut Botànic de Barcelona (IBB, CSIC-Ajuntament de Barcelona), Barcelona, Spain
| | - Wellcome Sanger Institute Tree of Life programme
- Institute of Evolutionary Biology, University of Edinburgh, Edingburgh, UK
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
- Institut Botànic de Barcelona (IBB, CSIC-Ajuntament de Barcelona), Barcelona, Spain
| | | | - Tree of Life Core Informatics collective
- Institute of Evolutionary Biology, University of Edinburgh, Edingburgh, UK
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
- Institut Botànic de Barcelona (IBB, CSIC-Ajuntament de Barcelona), Barcelona, Spain
| | | |
Collapse
|
42
|
Hayward A, Vila R, Laetsch DR, Lohse K, Baril T. The genome sequence of the heath fritillary, Melitaea athalia (Rottemburg, 1775). Wellcome Open Res 2021; 6:304. [PMID: 35136843 PMCID: PMC8796007 DOI: 10.12688/wellcomeopenres.17280.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/06/2021] [Indexed: 11/23/2022] Open
Abstract
We present a genome assembly from an individual female
Melitaea athalia (also known as
Mellicta athalia;
the heath fritillary; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 610 megabases in span. In total, 99.98% of the assembly is scaffolded into 32 chromosomal pseudomolecules, with the W and Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 12,824 protein coding genes.
Collapse
Affiliation(s)
| | - Roger Vila
- Institut de Biologia Evolutiva (CSIC - Universitat Pompeu Fabra), Barcelona, Spain
| | - Dominik R. Laetsch
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | - Konrad Lohse
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | | | | | | | | | | | | |
Collapse
|
43
|
Kim P, Tan H, Liu J, Lee H, Jung H, Kumar H, Zhou X. FusionGDB 2.0: fusion gene annotation updates aided by deep learning. Nucleic Acids Res 2021; 50:D1221-D1230. [PMID: 34755868 PMCID: PMC8728198 DOI: 10.1093/nar/gkab1056] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/10/2021] [Accepted: 11/03/2021] [Indexed: 01/08/2023] Open
Abstract
A knowledgebase of the systematic functional annotation of fusion genes is critical for understanding genomic breakage context and developing therapeutic strategies. FusionGDB is a unique functional annotation database of human fusion genes and has been widely used for studies with diverse aims. In this study, we report fusion gene annotation updates aided by deep learning (FusionGDB 2.0) available at https://compbio.uth.edu/FusionGDB2/. FusionGDB 2.0 has substantial updates of contents such as up-to-date human fusion genes, fusion gene breakage tendency score with FusionAI deep learning model based on 20 kb DNA sequence around BP, investigation of overlapping between fusion breakpoints with 44 human genomic features across five cellular role's categories, transcribed chimeric sequence and following open reading frame analysis with coding potential based on deep learning approach with Ribo-seq read features, and rigorous investigation of the protein feature retention of individual fusion partner genes in the protein level. Among ∼102k fusion genes, about 15k kept their ORF as In-frames, which is two times compared to the previous version, FusionGDB. FusionGDB 2.0 will be used as the reference knowledgebase of fusion gene annotations. FusionGDB 2.0 provides eight categories of annotations and it will be helpful for diverse human genomic studies.
Collapse
Affiliation(s)
- Pora Kim
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Hua Tan
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Jiajia Liu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Haeseung Lee
- Intellectual Information Team, Future Medicine Division, Korea Institute of Oriental Medicine, Daejeon, South Korea
| | - Hyesoo Jung
- Department of Neurology, Asan Medical Center, Seoul, Korea
| | - Himanshu Kumar
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Xiaobo Zhou
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.,McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.,School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
44
|
Ji Z, Tang T, Chen M, Dong B, Sun W, Wu N, Chen H, Feng Q, Yang X, Jin R, Jiang L. C-Myc-activated long non-coding RNA LINC01050 promotes gastric cancer growth and metastasis by sponging miR-7161-3p to regulate SPZ1 expression. J Exp Clin Cancer Res 2021; 40:351. [PMID: 34749766 PMCID: PMC8573944 DOI: 10.1186/s13046-021-02155-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 10/25/2021] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Growing evidence shows that long non-coding RNAs (lncRNAs) play significant roles in cancer development. However, the functions of most lncRNAs in human gastric cancer are still not fully understood. Here, we explored the role of a novel c-Myc-activated lncRNA, LINC01050, in gastric cancer progression. METHODS The expression of LINC01050 in the context of gastric cancer was assessed using The Cancer Genome Atlas datasets. Its functions in gastric cancer were investigated through gain- and loss-of-function experiments combined with the Cell Counting Kit-8 assays, colony-forming assays, Transwell assays, flow cytometry, Western blot analyses, and xenograft tumor and mouse metastasis models. Potential LINC01050 transcription activators were screened via bioinformatics and validated by chromatin immunoprecipitation and luciferase assays. The interaction between LINC01050 and miR-7161-3p and the targets of miR-7161-3p were predicted by bioinformatics analysis and confirmed by a luciferase assay, RNA immunoprecipitation, RNA pull-down, and rescue experiments. RESULTS LINC01050 was significantly up-regulated in gastric cancer, and its high expression was positively correlated with a poor prognosis. The transcription factor c-Myc was found to directly bind to the LINC01050 promoter region and activate its transcription. Furthermore, overexpression of LINC01050 was confirmed to promote gastric cancer cell proliferation, migration, invasion, and epithelial-mesenchymal transition in vitro and tumor growth in vivo. At the same time, its knockdown inhibited gastric cancer cell proliferation, migration, invasion, and epithelial-mesenchymal transition in vitro along with tumor growth and metastasis in vivo. Moreover, mechanistic investigations revealed that LINC01050 functions as a molecular sponge to absorb cytosolic miR-7161-3p, which reduces the miR-7161-3p-mediated translational repression of SPZ1, thus contributing to gastric cancer progression. CONCLUSIONS Taken together, our results identified a novel gastric cancer-associated lncRNA, LINC01050, which is activated by c-Myc. LINC01050 may be considered a potential therapeutic target for gastric cancer.
Collapse
Affiliation(s)
- Ziwei Ji
- Department of Gastroenterology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Tianbin Tang
- Central Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Mengxia Chen
- Department of Gastroenterology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Buyuan Dong
- Department of Gastroenterology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Wenjing Sun
- Central Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Nan Wu
- Department of Gastroenterology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Hao Chen
- Department of Gastroenterology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Qian Feng
- Department of Gastroenterology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Xingyi Yang
- Department of Gastroenterology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Rong Jin
- Department of Gastroenterology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.
| | - Lei Jiang
- Central Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.
| |
Collapse
|
45
|
Lohse K, Weir J. The genome sequence of the meadow brown, Maniola jurtina (Linnaeus, 1758). Wellcome Open Res 2021; 6:296. [PMID: 36866280 PMCID: PMC9971652 DOI: 10.12688/wellcomeopenres.17304.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/22/2021] [Indexed: 11/20/2022] Open
Abstract
We present a genome assembly from an individual female Maniola jurtina (the meadow brown; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 402 megabases in span. The complete assembly is scaffolded into 30 chromosomal pseudomolecules, with the W and Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 12,502 protein coding genes.
Collapse
Affiliation(s)
- Konrad Lohse
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | - Jamie Weir
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | | | | | | | | | | |
Collapse
|
46
|
Lohse K, Laetsch DR, Vila R. The genome sequence of the small copper, Lycaena phlaeas (Linnaeus, 1760). Wellcome Open Res 2021. [DOI: 10.12688/wellcomeopenres.17289.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
We present a genome assembly from an individual male Lycaena phlaeas (the small copper; Arthropoda; Insecta; Lepidoptera; Lycaenidae). The genome sequence is 420 megabases in span. The whole of the assembly is scaffolded into 24 chromosomal pseudomolecules, with the Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 12,147 protein coding genes.
Collapse
|
47
|
Lohse K, Hayward A, Ebdon S. The genome sequences of the male and female green-veined white, Pieris napi (Linnaeus, 1758). Wellcome Open Res 2021; 6:288. [PMID: 35846179 PMCID: PMC9257262 DOI: 10.12688/wellcomeopenres.17277.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/06/2021] [Indexed: 11/20/2022] Open
Abstract
We present genome assemblies from a male and female
Pieris napi (the green-veined white; Arthropoda; Insecta; Lepidoptera; Pieridae). The genome sequences of the male and female are 320 and 319 megabases in span, respectively. The majority of the assembly (99.79% of the male assembly, 99.88% of the female) is scaffolded into 24 autosomal pseudomolecules, with the Z sex chromosome assembled for the male and Z and W chromosomes assembled for the female. Gene annotation of the male assembly on Ensembl has identified 13,221 protein coding genes.
Collapse
Affiliation(s)
- Konrad Lohse
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | | | - Sam Ebdon
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | | | | | | | | | | |
Collapse
|
48
|
Lohse K, Taylor-Cox E. The genome sequence of the speckled wood butterfly, Pararge aegeria (Linnaeus, 1758). Wellcome Open Res 2021. [DOI: 10.12688/wellcomeopenres.17278.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
We present a genome assembly from an individual female Pararge aegeria (the speckled wood butterfly; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 517 megabases in span. The majority of the assembly (99.68%) is scaffolded into 29 chromosomal pseudomolecules, with the W and Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 12,288 protein coding genes.
Collapse
|
49
|
Ebdon S, Mackintosh A, Hayward A, Wotton K. The genome sequence of the clouded yellow, Colias crocea (Geoffroy, 1785). Wellcome Open Res 2021; 6:284. [PMID: 36157970 PMCID: PMC9490288 DOI: 10.12688/wellcomeopenres.17292.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/13/2021] [Indexed: 11/28/2022] Open
Abstract
We present a genome assembly from an individual female
Colias crocea (also known as
Colias croceus; the clouded yellow; Arthropoda; Insecta; Lepidoptera; Pieridae). The genome sequence is 325 megabases in span. The complete assembly is scaffolded into 32 chromosomal pseudomolecules, with the W and Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 13,803 protein coding genes.
Collapse
Affiliation(s)
- Sam Ebdon
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | - Alex Mackintosh
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | | | | | | | | | | | | | | |
Collapse
|
50
|
Lohse K, Ebdon S, Vila R. The genome sequence of the small white, Pieris rapae (Linnaeus, 1758). Wellcome Open Res 2021. [DOI: 10.12688/wellcomeopenres.17288.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
We present a genome assembly from an individual female Pieris rapae (the small white; Arthropoda; Insecta; Lepidoptera; Pieridae). The genome sequence is 256 megabases in span. The majority of the assembly is scaffolded into 26 chromosomal pseudomolecules, with the W and Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 12,390 protein coding genes.
Collapse
|