1
|
Wong EWP, Sahin M, Yang R, Lee U, Zhan YA, Misra R, Tomas F, Alomran N, Polyzos A, Lee CJ, Trieu T, Fundichely AM, Wiesner T, Rosowicz A, Cheng S, Liu C, Lallo M, Merghoub T, Hamard PJ, Koche R, Khurana E, Apostolou E, Zheng D, Chen Y, Leslie CS, Chi P. TAD hierarchy restricts poised LTR activation and loss of TAD hierarchy promotes LTR co-option in cancer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.31.596845. [PMID: 38895201 PMCID: PMC11185511 DOI: 10.1101/2024.05.31.596845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Transposable elements (TEs) are abundant in the human genome, and they provide the sources for genetic and functional diversity. The regulation of TEs expression and their functional consequences in physiological conditions and cancer development remain to be fully elucidated. Previous studies suggested TEs are repressed by DNA methylation and chromatin modifications. The effect of 3D chromatin topology on TE regulation remains elusive. Here, by integrating transcriptome and 3D genome architecture studies, we showed that haploinsufficient loss of NIPBL selectively activates alternative promoters at the long terminal repeats (LTRs) of the TE subclasses. This activation occurs through the reorganization of topologically associating domain (TAD) hierarchical structures and recruitment of proximal enhancers. These observations indicate that TAD hierarchy restricts transcriptional activation of LTRs that already possess open chromatin features. In cancer, perturbation of the hierarchical chromatin topology can lead to co-option of LTRs as functional alternative promoters in a context-dependent manner and drive aberrant transcriptional activation of novel oncogenes and other divergent transcripts. These data uncovered a new layer of regulatory mechanism of TE expression beyond DNA and chromatin modification in human genome. They also posit the TAD hierarchy dysregulation as a novel mechanism for alternative promoter-mediated oncogene activation and transcriptional diversity in cancer, which may be exploited therapeutically.
Collapse
|
2
|
Chenna S, Ivanov M, Nielsen TK, Chalenko K, Olsen E, Jørgensen K, Sandelin A, Marquardt S. A data-driven genome annotation approach for cassava. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024. [PMID: 38831668 DOI: 10.1111/tpj.16856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 05/13/2024] [Accepted: 05/17/2024] [Indexed: 06/05/2024]
Abstract
Genome annotation files play a critical role in dictating the quality of downstream analyses by providing essential predictions for gene positions and structures. These files are pivotal in decoding the complex information encoded within DNA sequences. Here, we generated experimental data resolving RNA 5'- and 3'-ends as well as full-length RNAs for cassava TME12 sticklings in ambient temperature and cold. We used these data to generate genome annotation files using the TranscriptomeReconstructoR (TR) tool. A careful comparison to high-quality genome annotations suggests that our new TR genome annotations identified additional genes, resolved the transcript boundaries more accurately and identified additional RNA isoforms. We enhanced existing cassava genome annotation files with the information from TR that maintained the different transcript models as RNA isoforms. The resultant merged annotation was subsequently utilized for comprehensive analysis. To examine the effects of genome annotation files on gene expression studies, we compared the detection of differentially expressed genes during cold using the same RNA-seq data but alternative genome annotation files. We found that our merged genome annotation that included cold-specific TR gene models identified about twice as many cold-induced genes. These data indicate that environmentally induced genes may be missing in off-the-shelf genome annotation files. In conclusion, TR offers the opportunity to enhance crop genome annotations with implications for the discovery of differentially expressed candidate genes during plant-environment interactions.
Collapse
Affiliation(s)
- Swetha Chenna
- Department of Plant and Environmental Sciences, Copenhagen Plant Science Centre, University of Copenhagen, Thorvaldsensvej 40, Frederiskberg C, 1871, Denmark
| | - Maxim Ivanov
- Department of Plant and Environmental Sciences, Copenhagen Plant Science Centre, University of Copenhagen, Thorvaldsensvej 40, Frederiskberg C, 1871, Denmark
| | - Tue Kjærgaard Nielsen
- Department of Plant and Environmental Sciences, Copenhagen Plant Science Centre, University of Copenhagen, Thorvaldsensvej 40, Frederiskberg C, 1871, Denmark
| | - Karina Chalenko
- Department of Plant and Environmental Sciences, Copenhagen Plant Science Centre, University of Copenhagen, Thorvaldsensvej 40, Frederiskberg C, 1871, Denmark
| | - Evy Olsen
- Department of Plant and Environmental Sciences, Copenhagen Plant Science Centre, University of Copenhagen, Thorvaldsensvej 40, Frederiskberg C, 1871, Denmark
| | - Kirsten Jørgensen
- Department of Plant and Environmental Sciences, Copenhagen Plant Science Centre, University of Copenhagen, Thorvaldsensvej 40, Frederiskberg C, 1871, Denmark
| | - Albin Sandelin
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen N, DK2200, Denmark
- Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen N, DK2200, Denmark
| | - Sebastian Marquardt
- Department of Plant and Environmental Sciences, Copenhagen Plant Science Centre, University of Copenhagen, Thorvaldsensvej 40, Frederiskberg C, 1871, Denmark
| |
Collapse
|
3
|
Patowary A, Zhang P, Jops C, Vuong CK, Ge X, Hou K, Kim M, Gong N, Margolis M, Vo D, Wang X, Liu C, Pasaniuc B, Li JJ, Gandal MJ, de la Torre-Ubieta L. Developmental isoform diversity in the human neocortex informs neuropsychiatric risk mechanisms. Science 2024; 384:eadh7688. [PMID: 38781356 DOI: 10.1126/science.adh7688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 03/13/2024] [Indexed: 05/25/2024]
Abstract
RNA splicing is highly prevalent in the brain and has strong links to neuropsychiatric disorders; yet, the role of cell type-specific splicing and transcript-isoform diversity during human brain development has not been systematically investigated. In this work, we leveraged single-molecule long-read sequencing to deeply profile the full-length transcriptome of the germinal zone and cortical plate regions of the developing human neocortex at tissue and single-cell resolution. We identified 214,516 distinct isoforms, of which 72.6% were novel (not previously annotated in Gencode version 33), and uncovered a substantial contribution of transcript-isoform diversity-regulated by RNA binding proteins-in defining cellular identity in the developing neocortex. We leveraged this comprehensive isoform-centric gene annotation to reprioritize thousands of rare de novo risk variants and elucidate genetic risk mechanisms for neuropsychiatric disorders.
Collapse
Affiliation(s)
- Ashok Patowary
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA 90095, USA
- Intellectual and Developmental Disabilities Research Center, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Pan Zhang
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA 90095, USA
- Intellectual and Developmental Disabilities Research Center, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Connor Jops
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute at Penn Med and the Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Celine K Vuong
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA 90095, USA
- Intellectual and Developmental Disabilities Research Center, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Xinzhou Ge
- Department of Statistics, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Minsoo Kim
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Naihua Gong
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Michael Margolis
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Daniel Vo
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute at Penn Med and the Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Xusheng Wang
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38103, USA
- Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA
- Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Bogdan Pasaniuc
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Institute for Precision Health, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Jingyi Jessica Li
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Statistics, University of California Los Angeles, Los Angeles, CA 90095, USA
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Biostatistics, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Michael J Gandal
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA 90095, USA
- Intellectual and Developmental Disabilities Research Center, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute at Penn Med and the Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Luis de la Torre-Ubieta
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA 90095, USA
- Intellectual and Developmental Disabilities Research Center, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
4
|
Huang S, Shi W, Li S, Fan Q, Yang C, Cao J, Wu L. Advanced sequencing-based high-throughput and long-read single-cell transcriptome analysis. LAB ON A CHIP 2024; 24:2601-2621. [PMID: 38669201 DOI: 10.1039/d4lc00105b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
Cells are the fundamental building blocks of living systems, exhibiting significant heterogeneity. The transcriptome connects the cellular genotype and phenotype, and profiling single-cell transcriptomes is critical for uncovering distinct cell types, states, and the interplay between cells in development, health, and disease. Nevertheless, single-cell transcriptome analysis faces daunting challenges due to the low abundance and diverse nature of RNAs in individual cells, as well as their heterogeneous expression. The advent and continuous advancements of next-generation sequencing (NGS) and third-generation sequencing (TGS) technologies have solved these problems and facilitated the high-throughput, sensitive, full-length, and rapid profiling of single-cell RNAs. In this review, we provide a broad introduction to current methodologies for single-cell transcriptome sequencing. First, state-of-the-art advancements in high-throughput and full-length single-cell RNA sequencing (scRNA-seq) platforms using NGS are reviewed. Next, TGS-based long-read scRNA-seq methods are summarized. Finally, a brief conclusion and perspectives for comprehensive single-cell transcriptome analysis are discussed.
Collapse
Affiliation(s)
- Shanqing Huang
- Discipline of Intelligent Instrument and Equipment, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Weixiong Shi
- Discipline of Intelligent Instrument and Equipment, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Shiyu Li
- Discipline of Intelligent Instrument and Equipment, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Qian Fan
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China.
| | - Chaoyong Yang
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China.
- Discipline of Intelligent Instrument and Equipment, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Jiao Cao
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China.
| | - Lingling Wu
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China.
| |
Collapse
|
5
|
Nagy G, Bojcsuk D, Tzerpos P, Cseh T, Nagy L. Lineage-determining transcription factor-driven promoters regulate cell type-specific macrophage gene expression. Nucleic Acids Res 2024; 52:4234-4256. [PMID: 38348998 PMCID: PMC11077085 DOI: 10.1093/nar/gkae088] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 01/18/2024] [Accepted: 01/29/2024] [Indexed: 05/09/2024] Open
Abstract
Mammalian promoters consist of multifarious elements, which make them unique and support the selection of the proper transcript variants required under diverse conditions in distinct cell types. However, their direct DNA-transcription factor (TF) interactions are mostly unidentified. Murine bone marrow-derived macrophages (BMDMs) are a widely used model for studying gene expression regulation. Thus, this model serves as a rich source of various next-generation sequencing data sets, including a large number of TF cistromes. By processing and integrating the available cistromic, epigenomic and transcriptomic data from BMDMs, we characterized the macrophage-specific direct DNA-TF interactions, with a particular emphasis on those specific for promoters. Whilst active promoters are enriched for certain types of typically methylatable elements, more than half of them contain non-methylatable and prototypically promoter-distal elements. In addition, circa 14% of promoters-including that of Csf1r-are composed exclusively of 'distal' elements that provide cell type-specific gene regulation by specialized TFs. Similar to CG-rich promoters, these also contain methylatable CG sites that are demethylated in a significant portion and show high polymerase activity. We conclude that this unusual class of promoters regulates cell type-specific gene expression in macrophages, and such a mechanism might exist in other cell types too.
Collapse
Affiliation(s)
- Gergely Nagy
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Dóra Bojcsuk
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Petros Tzerpos
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Tímea Cseh
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - László Nagy
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
- Departments of Medicine and Biological Chemistry, Johns Hopkins University School of Medicine, Institute for Fundamental Biomedical Research, Johns Hopkins All Children's Hospital, St. Petersburg, FL, USA
| |
Collapse
|
6
|
Uemura K, Ohyama T. Distinctive physical properties of DNA shared by RNA polymerase II gene promoters and 5'-flanking regions of tRNA genes. J Biochem 2024; 175:395-404. [PMID: 38102732 PMCID: PMC11005993 DOI: 10.1093/jb/mvad111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 10/30/2023] [Accepted: 11/26/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous noncoding (nc)RNAs have been identified. Similar to the transcription of protein-coding (mRNA) genes, long noncoding (lnc)RNA genes and most of micro (mi)RNA genes are transcribed by RNA polymerase II (Pol II). In the transcription of mRNA genes, core promoters play an indispensable role; they support the assembly of the preinitiation complex (PIC). However, the structural and/or physical properties of the core promoters of lncRNA and miRNA genes remain largely unexplored, in contrast with those of mRNA genes. Using the core promoters of human genes, we analyzed the repertoire and population ratios of residing core promoter elements (CPEs) and calculated the following five DNA physical properties (DPPs): duplex DNA free energy, base stacking energy, protein-induced deformability, rigidity and stabilizing energy of Z-DNA. Here, we show that their CPE and DPP profiles are similar to those of mRNA gene promoters. Importantly, the core promoters of these three classes of genes have two highly distinctive sites in their DPP profiles around the TSS and position -27. Similar characteristics in DPPs are also found in the 5'-flanking regions of tRNA genes, indicating their common essential roles in transcription initiation over the kingdom of RNA polymerases.
Collapse
Affiliation(s)
- Kohei Uemura
- Major in Integrative Bioscience and Biomedical Engineering, Graduate School of Science and Engineering, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
| | - Takashi Ohyama
- Major in Integrative Bioscience and Biomedical Engineering, Graduate School of Science and Engineering, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
- Department of Biology, Faculty of Education and Integrated Arts and Sciences, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
| |
Collapse
|
7
|
Peng Y, Huang Q, Liu D, Kong S, Kamada R, Ozato K, Zhang Y, Zhu J. A single-cell genomic strategy for alternative transcript start sites identification. Biotechnol J 2024; 19:e2300516. [PMID: 38472100 DOI: 10.1002/biot.202300516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 01/20/2024] [Accepted: 01/30/2024] [Indexed: 03/14/2024]
Abstract
Alternative transcription start sites (TSSs) usage plays a critical role in gene transcription regulation in mammals. However, precisely identifying alternative TSSs remains challenging at the genome-wide level. We report a single-cell genomic technology for alternative TSSs annotation and cell heterogeneity detection. In the method, we utilize Fluidigm C1 system to capture individual cells of interest, SMARTer cDNA synthesis kit to recover full-length cDNAs, then dual priming oligonucleotide system to specifically enrich TSSs for genomic analysis. We apply this method to a genome-wide study of alternative TSSs identification in two different IFN-β stimulated mouse embryonic fibroblasts (MEFs). The data clearly discriminate two IFN-β stimulated MEFs. Moreover, our results indicate 81% expressed genes in these two cell types containing multiple TSSs, which is much higher than previous predictions based on Cap-Analysis Gene Expression (CAGE) (58%) or empirical determination (54%) in various cell types. This indicates that alternative TSSs are more pervasive than expected and implies our strategy could position them at an unprecedented sensitivity. It would be helpful for elucidating their biological insights in future.
Collapse
Affiliation(s)
- Yanling Peng
- Animal Functional Genomics Group, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Qitong Huang
- Animal Functional Genomics Group, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, Netherlands
| | - Danli Liu
- Animal Functional Genomics Group, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Siyuan Kong
- Animal Functional Genomics Group, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Rui Kamada
- Department of Chemistry, Faculty of Science, Hokkaido University, Sapporo, Japan
| | - Keiko Ozato
- Division of Developmental Biology, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, USA
| | - Yubo Zhang
- Animal Functional Genomics Group, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- Kunpeng Institute of Modern Agriculture at Foshan, Foshan, China
| | - Jun Zhu
- DNA Sequencing and Genomics Core, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
8
|
Seki M, Kuze Y, Zhang X, Kurotani KI, Notaguchi M, Nishio H, Kudoh H, Suzaki T, Yoshida S, Sugano S, Matsushita T, Suzuki Y. An improved method for the highly specific detection of transcription start sites. Nucleic Acids Res 2024; 52:e7. [PMID: 37994784 PMCID: PMC10810191 DOI: 10.1093/nar/gkad1116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/17/2023] [Accepted: 11/06/2023] [Indexed: 11/24/2023] Open
Abstract
Precise detection of the transcriptional start site (TSS) is a key for characterizing transcriptional regulation of genes and for annotation of newly sequenced genomes. Here, we describe the development of an improved method, designated 'TSS-seq2.' This method is an iterative improvement of TSS-seq, a previously published enzymatic cap-structure conversion method to detect TSSs in base sequences. By modifying the original procedure, including by introducing split ligation at the key cap-selection step, the yield and the accuracy of the reaction has been substantially improved. For example, TSS-seq2 can be conducted using as little as 5 ng of total RNA with an overall accuracy of 96%; this yield a less-biased and more precise detection of TSS. We then applied TSS-seq2 for TSS analysis of four plant species that had not yet been analyzed by any previous TSS method.
Collapse
Affiliation(s)
- Masahide Seki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Yuta Kuze
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Xiang Zhang
- Division of Biological Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| | - Ken-ichi Kurotani
- Bioscience and Biotechnology Center, Nagoya University, Aichi, Japan
| | - Michitaka Notaguchi
- Bioscience and Biotechnology Center, Nagoya University, Aichi, Japan
- Department of Botany, Graduate School of Science, Kyoto University, Kyoto, Japan
- Graduate School of Bioagricultural Sciences, Nagoya University, Aichi, Nagoya, Japan
| | - Haruki Nishio
- Data Science and AI Innovation Research Promotion Center, Shiga University, Shiga, Japan
| | - Hiroshi Kudoh
- Center for Ecological Research, Kyoto University, Shiga, Japan
| | - Takuya Suzaki
- Faculty of Life and Environmental Sciences, University of Tsukuba, Ibaraki, Japan
- Tsukuba Plant-Innovation Research Center, University of Tsukuba, Ibaraki, Japan
| | - Satoko Yoshida
- Division of Biological Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| | - Sumio Sugano
- Institute of Kashiwa-no-ha Omics Gate, Chiba, Japan
- Future Medicine Education and Research Organization, Chiba University, Chiba, Japan
| | - Tomonao Matsushita
- Department of Botany, Graduate School of Science, Kyoto University, Kyoto, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| |
Collapse
|
9
|
Uemura K, Ohyama T. Physical Peculiarity of Two Sites in Human Promoters: Universality and Diverse Usage in Gene Function. Int J Mol Sci 2024; 25:1487. [PMID: 38338773 PMCID: PMC10855393 DOI: 10.3390/ijms25031487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 01/15/2024] [Accepted: 01/18/2024] [Indexed: 02/12/2024] Open
Abstract
Since the discovery of physical peculiarities around transcription start sites (TSSs) and a site corresponding to the TATA box, research has revealed only the average features of these sites. Unsettled enigmas include the individual genes with these features and whether they relate to gene function. Herein, using 10 physical properties of DNA, including duplex DNA free energy, base stacking energy, protein-induced deformability, and stabilizing energy of Z-DNA, we clarified for the first time that approximately 97% of the promoters of 21,056 human protein-coding genes have distinctive physical properties around the TSS and/or position -27; of these, nearly 65% exhibited such properties at both sites. Furthermore, about 55% of the 21,056 genes had a minimum value of regional duplex DNA free energy within TSS-centered ±300 bp regions. Notably, distinctive physical properties within the promoters and free energies of the surrounding regions separated human protein-coding genes into five groups; each contained specific gene ontology (GO) terms. The group represented by immune response genes differed distinctly from the other four regarding the parameter of the free energies of the surrounding regions. A vital suggestion from this study is that physical-feature-based analyses of genomes may reveal new aspects of the organization and regulation of genes.
Collapse
Affiliation(s)
- Kohei Uemura
- Major in Integrative Bioscience and Biomedical Engineering, Graduate School of Science and Engineering, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan;
| | - Takashi Ohyama
- Major in Integrative Bioscience and Biomedical Engineering, Graduate School of Science and Engineering, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan;
- Department of Biology, Faculty of Education and Integrated Arts and Sciences, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
| |
Collapse
|
10
|
Ruggiero V, Fagioli C, de Pretis S, Di Carlo V, Landsberger N, Zacchetti D. Complex CDKL5 translational regulation and its potential role in CDKL5 deficiency disorder. Front Cell Neurosci 2023; 17:1231493. [PMID: 37964795 PMCID: PMC10642286 DOI: 10.3389/fncel.2023.1231493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 10/10/2023] [Indexed: 11/16/2023] Open
Abstract
CDKL5 is a kinase with relevant functions in correct neuronal development and in the shaping of synapses. A decrease in its expression or activity leads to a severe neurodevelopmental condition known as CDKL5 deficiency disorder (CDD). CDD arises from CDKL5 mutations that lie in the coding region of the gene. However, the identification of a SNP in the CDKL5 5'UTR in a patient with symptoms consistent with CDD, together with the complexity of the CDKL5 transcript leader, points toward a relevant translational regulation of CDKL5 expression with important consequences in physiological processes as well as in the pathogenesis of CDD. We performed a bioinformatics and molecular analysis of the 5'UTR of CDKL5 to identify translational regulatory features. We propose an important role for structural cis-acting elements, with the involvement of the eukaryotic translational initiation factor eIF4B. By evaluating both cap-dependent and cap-independent translation initiation, we suggest the presence of an IRES supporting the translation of CDKL5 mRNA and propose a pathogenic effect of the C>T -189 SNP in decreasing the translation of the downstream protein.
Collapse
Affiliation(s)
- Valeria Ruggiero
- Vita-Salute San Raffaele University, Milan, Italy
- Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Claudio Fagioli
- Vita-Salute San Raffaele University, Milan, Italy
- Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Stefano de Pretis
- Vita-Salute San Raffaele University, Milan, Italy
- Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Valerio Di Carlo
- Department of Medical Biotechnology and Translational Medicine, University of Milan, Segrate, Italy
| | - Nicoletta Landsberger
- Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
- Department of Medical Biotechnology and Translational Medicine, University of Milan, Segrate, Italy
| | - Daniele Zacchetti
- Vita-Salute San Raffaele University, Milan, Italy
- Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| |
Collapse
|
11
|
Patowary A, Zhang P, Jops C, Vuong CK, Ge X, Hou K, Kim M, Gong N, Margolis M, Vo D, Wang X, Liu C, Pasaniuc B, Li JJ, Gandal MJ, de la Torre-Ubieta L. Developmental isoform diversity in the human neocortex informs neuropsychiatric risk mechanisms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.25.534016. [PMID: 36993726 PMCID: PMC10055310 DOI: 10.1101/2023.03.25.534016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
RNA splicing is highly prevalent in the brain and has strong links to neuropsychiatric disorders, yet the role of cell-type-specific splicing or transcript-isoform diversity during human brain development has not been systematically investigated. Here, we leveraged single-molecule long-read sequencing to deeply profile the full-length transcriptome of the germinal zone (GZ) and cortical plate (CP) regions of the developing human neocortex at tissue and single-cell resolution. We identified 214,516 unique isoforms, of which 72.6% are novel (unannotated in Gencode-v33), and uncovered a substantial contribution of transcript-isoform diversity, regulated by RNA binding proteins, in defining cellular identity in the developing neocortex. We leveraged this comprehensive isoform-centric gene annotation to re-prioritize thousands of rare de novo risk variants and elucidate genetic risk mechanisms for neuropsychiatric disorders. One-Sentence Summary A cell-specific atlas of gene isoform expression helps shape our understanding of brain development and disease. Structured Abstract INTRODUCTION: The development of the human brain is regulated by precise molecular and genetic mechanisms driving spatio-temporal and cell-type-specific transcript expression programs. Alternative splicing, a major mechanism increasing transcript diversity, is highly prevalent in the human brain, influences many aspects of brain development, and has strong links to neuropsychiatric disorders. Despite this, the cell-type-specific transcript-isoform diversity of the developing human brain has not been systematically investigated.RATIONALE: Understanding splicing patterns and isoform diversity across the developing neocortex has translational relevance and can elucidate genetic risk mechanisms in neurodevelopmental disorders. However, short-read sequencing, the prevalent technology for transcriptome profiling, is not well suited to capturing alternative splicing and isoform diversity. To address this, we employed third-generation long-read sequencing, which enables capture and sequencing of complete individual RNA molecules, to deeply profile the full-length transcriptome of the germinal zone (GZ) and cortical plate (CP) regions of the developing human neocortex at tissue and single-cell resolution.RESULTS: We profiled microdissected GZ and CP regions of post-conception week (PCW) 15-17 human neocortex in bulk and at single-cell resolution across six subjects using high-fidelity long-read sequencing (PacBio IsoSeq). We identified 214,516 unique isoforms, of which 72.6% were novel (unannotated in Gencode), and >7,000 novel exons, expanding the proteome by 92,422 putative proteoforms. We uncovered thousands of isoform switches during cortical neurogenesis predicted to impact RNA regulatory domains or protein structure and implicating previously uncharacterized RNA-binding proteins in cellular identity and neuropsychiatric disease. At the single-cell level, early-stage excitatory neurons exhibited the greatest isoform diversity, and isoform-centric single-cell clustering led to the identification of previously uncharacterized cell states. We systematically assessed the contribution of transcriptomic features, and localized cell and spatio-temporal transcript expression signatures across neuropsychiatric disorders, revealing predominant enrichments in dynamic isoform expression and utilization patterns and that the number and complexity of isoforms per gene is strongly predictive of disease. Leveraging this resource, we re-prioritized thousands of rare de novo risk variants associated with autism spectrum disorders (ASD), intellectual disability (ID), and neurodevelopmental disorders (NDDs), more broadly, to potentially more severe consequences and revealed a larger proportion of cryptic splice variants with the expanded transcriptome annotation provided in this study.CONCLUSION: Our study offers a comprehensive landscape of isoform diversity in the human neocortex during development. This extensive cataloging of novel isoforms and splicing events sheds light on the underlying mechanisms of neurodevelopmental disorders and presents an opportunity to explore rare genetic variants linked to these conditions. The implications of our findings extend beyond fundamental neuroscience, as they provide crucial insights into the molecular basis of developmental brain disorders and pave the way for targeted therapeutic interventions. To facilitate exploration of this dataset we developed an online portal ( https://sciso.gandallab.org/ ).
Collapse
|
12
|
Kharytonchyk S, Burnett C, Gc K, Telesnitsky A. Transcription start site heterogeneity and its role in RNA fate determination distinguish HIV-1 from other retroviruses and are mediated by core promoter elements. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.22.541776. [PMID: 37292892 PMCID: PMC10245945 DOI: 10.1101/2023.05.22.541776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
HIV-1 uses heterogeneous transcription start sites (TSSs) to generate two RNA 5' isoforms that adopt radically different structures and perform distinct replication functions. Although these RNAs differ in length by only two bases, exclusively the shorter RNA is encapsidated while the longer RNA is excluded from virions and provides intracellular functions. The current study examined TSS usage and packaging selectivity for a broad range of retroviruses and found that heterogenous TSS usage was a conserved feature of all tested HIV-1 strains, but all other retroviruses examined displayed unique TSSs. Phylogenetic csomparisons and chimeric viruses' properties provided evidence that this mechanism of RNA fate determination was an innovation of the HIV-1 lineage, with determinants mapping to core promoter elements. Fine-tuning differences between HIV-1 and HIV-2, which uses a unique TSS, implicated purine residue positioning plus a specific TSS-adjacent dinucleotide in specifying multiplicity of TSS usage. Based on these findings, HIV-1 expression constructs were generated that differed from the parental strain by only two point mutations yet each expressed only one of HIV-1's two RNAs. Replication defects of the variant with only the presumptive founder TSS were less severe than those for the virus with only the secondary start site.
Collapse
Affiliation(s)
- Siarhei Kharytonchyk
- Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, Michigan 48109-5620, USA
| | - Cleo Burnett
- Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, Michigan 48109-5620, USA
| | - Keshav Gc
- Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, Michigan 48109-5620, USA
| | - Alice Telesnitsky
- Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, Michigan 48109-5620, USA
| |
Collapse
|
13
|
Ogi DA, Jin S. Transcriptome-Powered Pluripotent Stem Cell Differentiation for Regenerative Medicine. Cells 2023; 12:1442. [PMID: 37408278 DOI: 10.3390/cells12101442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Revised: 05/15/2023] [Accepted: 05/18/2023] [Indexed: 07/07/2023] Open
Abstract
Pluripotent stem cells are endless sources for in vitro engineering human tissues for regenerative medicine. Extensive studies have demonstrated that transcription factors are the key to stem cell lineage commitment and differentiation efficacy. As the transcription factor profile varies depending on the cell type, global transcriptome analysis through RNA sequencing (RNAseq) has been a powerful tool for measuring and characterizing the success of stem cell differentiation. RNAseq has been utilized to comprehend how gene expression changes as cells differentiate and provide a guide to inducing cellular differentiation based on promoting the expression of specific genes. It has also been utilized to determine the specific cell type. This review highlights RNAseq techniques, tools for RNAseq data interpretation, RNAseq data analytic methods and their utilities, and transcriptomics-enabled human stem cell differentiation. In addition, the review outlines the potential benefits of the transcriptomics-aided discovery of intrinsic factors influencing stem cell lineage commitment, transcriptomics applied to disease physiology studies using patients' induced pluripotent stem cell (iPSC)-derived cells for regenerative medicine, and the future outlook on the technology and its implementation.
Collapse
Affiliation(s)
- Derek A Ogi
- Department of Biomedical Engineering, Thomas J. Watson College of Engineering and Applied Sciences, State University of New York at Binghamton, Binghamton, NY 13902, USA
| | - Sha Jin
- Department of Biomedical Engineering, Thomas J. Watson College of Engineering and Applied Sciences, State University of New York at Binghamton, Binghamton, NY 13902, USA
- Center of Biomanufacturing for Regenerative Medicine, State University of New York at Binghamton, Binghamton, NY 13902, USA
| |
Collapse
|
14
|
Inchingolo MA, Diman A, Adamczewski M, Humphreys T, Jaquier-Gubler P, Curran JA. TP53BP1, a dual-coding gene, uses promoter switching and translational reinitiation to express a smORF protein. iScience 2023; 26:106757. [PMID: 37216125 PMCID: PMC10193022 DOI: 10.1016/j.isci.2023.106757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 03/07/2023] [Accepted: 04/24/2023] [Indexed: 05/24/2023] Open
Abstract
The complexity of the metazoan proteome is significantly increased by the expression of small proteins (<100 aa) derived from smORFs within lncRNAs, uORFs, 3' UTRs and, reading frames overlapping the CDS. These smORF encoded proteins (SEPs) have diverse roles, ranging from the regulation of cellular physiological to essential developmental functions. We report the characterization of a new member of this protein family, SEP53BP1, derived from a small internal ORF that overlaps the CDS encoding 53BP1. Its expression is coupled to the utilization of an alternative, cell-type specific promoter coupled to translational reinitiation events mediated by a uORF in the alternative 5' TL of the mRNA. This uORF-mediated reinitiation at an internal ORF is also observed in zebrafish. Interactome studies indicate that the human SEP53BP1 associates with components of the protein turnover pathway including the proteasome, and the TRiC/CCT chaperonin complex, suggesting that it may play a role in cellular proteostasis.
Collapse
Affiliation(s)
- Marta A. Inchingolo
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Aurélie Diman
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Maxime Adamczewski
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Faculté de Médecine et Pharmacie, Université Grenoble Alpes, Grenoble, France
| | - Tom Humphreys
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Pascale Jaquier-Gubler
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Joseph A. Curran
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Institute of Genetics and Genomics of Geneva (iGE3), University of Geneva, Geneva, Switzerland
| |
Collapse
|
15
|
Li Y, Huang Z, Zhang Z, Wang Q, Li F, Wang S, Ji X, Shu S, Fang X, Jiang L. FIPRESCI: droplet microfluidics based combinatorial indexing for massive-scale 5'-end single-cell RNA sequencing. Genome Biol 2023; 24:70. [PMID: 37024957 PMCID: PMC10078054 DOI: 10.1186/s13059-023-02893-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 03/01/2023] [Indexed: 04/08/2023] Open
Abstract
Single-cell RNA sequencing methods focusing on the 5'-end of transcripts can reveal promoter and enhancer activity and efficiently profile immune receptor repertoire. However, ultra-high-throughput 5'-end single-cell RNA sequencing methods have not been described. We introduce FIPRESCI, 5'-end single-cell combinatorial indexing RNA-Seq, enabling massive sample multiplexing and increasing the throughput of the droplet microfluidics system by over tenfold. We demonstrate FIPRESCI enables the generation of approximately 100,000 single-cell transcriptomes from E10.5 whole mouse embryos in a single-channel experiment, and simultaneous identification of subpopulation differences and T cell receptor signatures of peripheral blood T cells from 12 cancer patients.
Collapse
Affiliation(s)
- Yun Li
- China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zheng Huang
- China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhaojun Zhang
- China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Sino-Danish College, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Qifei Wang
- China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Fengxian Li
- The Blood Transfusion Department, First Medical Center of Chinese, PLA General Hospital, Beijing, 100853, China
| | - Shufang Wang
- The Blood Transfusion Department, First Medical Center of Chinese, PLA General Hospital, Beijing, 100853, China
| | - Xin Ji
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Gastrointestinal Cancer Center, Peking University Cancer Hospital & Institute, No. 52 Fucheng Road, Beijing, 100142, China
| | - Shaokun Shu
- Peking University International Cancer Institute & Peking University Cancer Hospital & Institute, Beijing, 100191, China
| | - Xiangdong Fang
- China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing, 100101, China
- Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing, 100101, China
| | - Lan Jiang
- China National Center for Bioinformation, Beijing, 100101, China.
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- Sino-Danish College, University of Chinese Academy of Sciences, Beijing, 100049, China.
- Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing, 100101, China.
- College of Future Technology College, University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
16
|
Conlon FL, Arnold AP. Sex chromosome mechanisms in cardiac development and disease. NATURE CARDIOVASCULAR RESEARCH 2023; 2:340-350. [PMID: 37808586 PMCID: PMC10558115 DOI: 10.1038/s44161-023-00256-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 02/13/2023] [Indexed: 10/10/2023]
Abstract
Many human diseases, including cardiovascular disease, show differences between men and women in pathology and treatment outcomes. In the case of cardiac disease, sex differences are exemplified by differences in the frequency of specific types of congenital and adult-onset heart disease. Clinical studies have suggested that gonadal hormones are a factor in sex bias. However, recent research has shown that gene and protein networks under non-hormonal control also account for cardiac sex differences. In this review, we describe the sex chromosome pathways that lead to sex differences in the development and function of the heart and highlight how these findings affect future care and treatment of cardiac disease.
Collapse
Affiliation(s)
- Frank L Conlon
- Departments of Biology and Genetics, McAllister Heart Institute, UNC-Chapel Hill, Chapel Hill, NC 27599, USA
| | - Arthur P Arnold
- Department of Integrative Biology & Physiology, University of California, Los Angeles, CA, 90095, USA
| |
Collapse
|
17
|
Murray A, Mendieta JP, Vollmers C, Schmitz RJ. Simple and accurate transcriptional start site identification using Smar2C2 and examination of conserved promoter features. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 112:583-596. [PMID: 36030508 PMCID: PMC9827901 DOI: 10.1111/tpj.15957] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/12/2022] [Accepted: 08/22/2022] [Indexed: 06/15/2023]
Abstract
The precise and accurate identification and quantification of transcriptional start sites (TSSs) is key to understanding the control of transcription. The core promoter consists of the TSS and proximal non-coding sequences, which are critical in transcriptional regulation. Therefore, the accurate identification of TSSs is important for understanding the molecular regulation of transcription. Existing protocols for TSS identification are challenging and expensive, leaving high-quality data available for a small subset of organisms. This sparsity of data impairs study of TSS usage across tissues or in an evolutionary context. To address these shortcomings, we developed Smart-Seq2 Rolling Circle to Concatemeric Consensus (Smar2C2), which identifies and quantifies TSSs and transcription termination sites. Smar2C2 incorporates unique molecular identifiers that allowed for the identification of as many as 70 million sites, with no known upper limit. We have also generated TSS data sets from as little as 40 pg of total RNA, which was the smallest input tested. In this study, we used Smar2C2 to identify TSSs in Glycine max (soybean), Oryza sativa (rice), Sorghum bicolor (sorghum), Triticum aestivum (wheat) and Zea mays (maize) across multiple tissues. This wide panel of plant TSSs facilitated the identification of evolutionarily conserved features, such as novel patterns in the dinucleotides that compose the initiator element (Inr), that correlated with promoter expression levels across all species examined. We also discovered sequence variations in known promoter motifs that are positioned reliably close to the TSS, such as differences in the TATA box and in the Inr that may prove significant to our understanding and control of transcription initiation. Smar2C2 allows for the easy study of these critical sequences, providing a tool to facilitate discovery.
Collapse
Affiliation(s)
- Andrew Murray
- Department of Plant BiologyUniversity of GeorgiaAthensGA30602USA
| | | | - Chris Vollmers
- Deparment of Biomolecular EngineeringUniversity of California Santa CruzSanta CruzCA95064USA
| | | |
Collapse
|
18
|
Moody J, Kouno T, Chang JC, Ando Y, Carninci P, Shin JW, Hon CC. SCAFE: a software suite for analysis of transcribed cis-regulatory elements in single cells. Bioinformatics 2022; 38:5126-5128. [PMID: 36173306 PMCID: PMC9665856 DOI: 10.1093/bioinformatics/btac644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 08/30/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Cell type-specific activities of cis-regulatory elements (CRE) are central to understanding gene regulation and disease predisposition. Single-cell RNA 5'end sequencing (sc-end5-seq) captures the transcription start sites (TSS) which can be used as a proxy to measure the activity of transcribed CREs (tCREs). However, a substantial fraction of TSS identified from sc-end5-seq data may not be genuine due to various artifacts, hindering the use of sc-end5-seq for de novo discovery of tCREs. RESULTS We developed SCAFE-Single-Cell Analysis of Five-prime Ends-a software suite that processes sc-end5-seq data to de novo identify TSS clusters based on multiple logistic regression. It annotates tCREs based on the identified TSS clusters and generates a tCRE-by-cell count matrix for downstream analyses. The software suite consists of a set of flexible tools that could either be run independently or as pre-configured workflows. AVAILABILITY AND IMPLEMENTATION SCAFE is implemented in Perl and R. The source code and documentation are freely available for download under the MIT License from https://github.com/chung-lab/SCAFE. Docker images are available from https://hub.docker.com/r/cchon/scafe. The submitted software version and test data are archived at https://doi.org/10.5281/zenodo.7023163 and https://doi.org/10.5281/zenodo.7024060, respectively. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Jen-Chien Chang
- RIKEN Center for Integrative Medical Sciences, Yokohama City, Kanagawa 230-0045, Japan
| | - Yoshinari Ando
- RIKEN Center for Integrative Medical Sciences, Yokohama City, Kanagawa 230-0045, Japan
| | - Piero Carninci
- RIKEN Center for Integrative Medical Sciences, Yokohama City, Kanagawa 230-0045, Japan,Human Technopole, Milan 20157, Italy
| | - Jay W Shin
- To whom correspondence should be addressed. or
| | | |
Collapse
|
19
|
Ueno D, Yamasaki S, Sadakiyo Y, Teruyama T, Demura T, Kato K. Sequence features around cleavage sites are highly conserved among different species and a critical determinant for RNA cleavage position across eukaryotes. J Biosci Bioeng 2022; 134:450-461. [PMID: 36137896 DOI: 10.1016/j.jbiosc.2022.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Revised: 07/18/2022] [Accepted: 08/05/2022] [Indexed: 10/14/2022]
Abstract
RNA degradation is one of the critical steps for control of gene expression, and endonucleolytic cleavage-dependent RNA degradation is conserved among eukaryotes. Some cleavage sites are secondarily capped in the cytoplasm and identified using the Cap analysis of gene expression (CAGE) method. Although uncapped cleavage sites are widespread in eukaryotes, comparatively little information has been obtained about these sites using CAGE-based degradome analysis. Previously, we developed the truncated RNA-end sequencing (TREseq) method in plant species and used it to acquire comprehensive information about uncapped cleavage sites; we observed G-rich sequences near cleavage sites. However, it remains unclear whether this finding is general to other eukaryotes. In this study, we conducted TREseq analyses in fruit flies (Drosophila melanogaster) and budding yeast (Saccharomyces cerevisiae). The results revealed specific sequence features related to RNA cleavage in D. melanogaster and S. cerevisiae that were similar to sequence patterns in Arabidopsis thaliana. Although previous studies suggest that ribosome movements are important for determining cleavage position, feature selection using a random forest classifier showed that sequences around cleavage sites were major determinant for cleaved or uncleaved sites. Together, our results suggest that sequence features around cleavage sites are critical for determining cleavage position, and that sequence-specific endonucleolytic cleavage-dependent RNA degradation is highly conserved across eukaryotes.
Collapse
Affiliation(s)
- Daishin Ueno
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
| | - Shotaro Yamasaki
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
| | - Yuta Sadakiyo
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
| | - Takumi Teruyama
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
| | - Taku Demura
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
| | - Ko Kato
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan.
| |
Collapse
|
20
|
Wu C, Chaw S. Evolution of mitochondrial RNA editing in extant gymnosperms. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 111:1676-1687. [PMID: 35877596 PMCID: PMC9545813 DOI: 10.1111/tpj.15916] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 07/18/2022] [Accepted: 07/21/2022] [Indexed: 06/01/2023]
Abstract
To unveil the evolution of mitochondrial RNA editing in gymnosperms, we characterized mitochondrial genomes (mitogenomes), plastid genomes, RNA editing sites, and pentatricopeptide repeat (PPR) proteins from 10 key taxa representing four of the five extant gymnosperm clades. The assembled mitogenomes vary in gene content due to massive gene losses in Gnetum and Conifer II clades. Mitochondrial gene expression levels also vary according to protein function, with the most highly expressed genes involved in the respiratory complex. We identified 9132 mitochondrial C-to-U editing sites, as well as 2846 P-class and 8530 PLS-class PPR proteins. Regains of editing sites were demonstrated in Conifer II rps3 transcripts whose corresponding mitogenomic sequences lack introns due to retroprocessing. Our analyses reveal that non-synonymous editing is efficient and results in more codons encoding hydrophobic amino acids. In contrast, synonymous editing, although performed with variable efficiency, can increase the number of U-ending codons that are preferentially utilized in gymnosperm mitochondria. The inferred loss-to-gain ratio of mitochondrial editing sites in gymnosperms is 2.1:1, of which losses of non-synonymous editing are mainly due to genomic C-to-T substitutions. However, such substitutions only explain a small fraction of synonymous editing site losses, indicating distinct evolutionary mechanisms. We show that gymnosperms have experienced multiple lineage-specific duplications in PLS-class PPR proteins. These duplications likely contribute to accumulated RNA editing sites, as a mechanistic correlation between RNA editing and PLS-class PPR proteins is statistically supported.
Collapse
Affiliation(s)
- Chung‐Shien Wu
- Biodiversity Research CenterAcademia SinicaTaipei11529Taiwan
| | - Shu‐Miaw Chaw
- Biodiversity Research CenterAcademia SinicaTaipei11529Taiwan
| |
Collapse
|
21
|
Schon MA, Lutzmayer S, Hofmann F, Nodine MD. Bookend: precise transcript reconstruction with end-guided assembly. Genome Biol 2022; 23:143. [PMID: 35768836 PMCID: PMC9245221 DOI: 10.1186/s13059-022-02700-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 06/05/2022] [Indexed: 12/29/2022] Open
Abstract
We developed Bookend, a package for transcript assembly that incorporates data from different RNA-seq techniques, with a focus on identifying and utilizing RNA 5′ and 3′ ends. We demonstrate that correct identification of transcript start and end sites is essential for precise full-length transcript assembly. Utilization of end-labeled reads present in full-length single-cell RNA-seq datasets dramatically improves the precision of transcript assembly in single cells. Finally, we show that hybrid assembly across short-read, long-read, and end-capture RNA-seq datasets from Arabidopsis thaliana, as well as meta-assembly of RNA-seq from single mouse embryonic stem cells, can produce reference-quality end-to-end transcript annotations.
Collapse
Affiliation(s)
- Michael A Schon
- Cluster of Plant Developmental Biology, Laboratory of Molecular Biology, Wageningen University & Research, Wageningen, 6708, PB, The Netherlands. .,Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria.
| | - Stefan Lutzmayer
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria
| | - Falko Hofmann
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria
| | - Michael D Nodine
- Cluster of Plant Developmental Biology, Laboratory of Molecular Biology, Wageningen University & Research, Wageningen, 6708, PB, The Netherlands. .,Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria.
| |
Collapse
|
22
|
Ueno D, Yamasaki S, Kato K. Methods for detecting RNA degradation intermediates in plants. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2022; 318:111241. [PMID: 35351296 DOI: 10.1016/j.plantsci.2022.111241] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 01/12/2022] [Accepted: 02/26/2022] [Indexed: 06/14/2023]
Abstract
RNA degradation is an important process for controlling gene expression and is mediated by decapping / deadenylation-dependent or endonucleolytic cleavage-dependent RNA degradation mechanisms. High-throughput sequencing of RNA degradation intermediates was initially developed in Arabidopsis thaliana and similar RNA degradome sequencing methods were conducted in other eukaryotes. However, interpreting results obtained by these sequencing methods is fragmented, and an overview is needed. Here we review the findings and limitations of these sequencing methods and discuss the missing experiments needed to understand RNA degradation intermediates accurately. This review provides direction for future research on RNA degradation and is a reference for RNA degradome studies in other species.
Collapse
Affiliation(s)
- Daishin Ueno
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
| | - Shotaro Yamasaki
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
| | - Ko Kato
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan.
| |
Collapse
|
23
|
Exogenous artificial DNA forms chromatin structure with active transcription in yeast. SCIENCE CHINA. LIFE SCIENCES 2022; 65:851-860. [PMID: 34970711 DOI: 10.1007/s11427-021-2044-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 12/10/2021] [Indexed: 12/25/2022]
Abstract
Yeast artificial chromosomes (YACs) are important tools for sequencing, gene cloning, and transferring large quantities of genetic information. However, the structure and activity of YAC chromatin, as well as the unintended impacts of introducing foreign DNA sequences on DNA-associated biochemical events, have not been widely explored. Here, we showed that abundant genetic elements like TATA box and transcription factor-binding motifs occurred unintentionally in a previously reported data-carrying chromosome (dChr). In addition, we used state-of-the-art sequencing technologies to comprehensively profile the genetic, epigenetic, transcriptional, and proteomic characteristics of the exogenous dChr. We found that the data-carrying DNA formed active chromatin with high chromatin accessibility and H3K4 tri-methylation levels. The dChr also displayed highly pervasive transcriptional ability and transcribed hundreds of noncoding RNAs. The results demonstrated that exogenous artificial chromosomes formed chromatin structures and did not remain as naked or loose plasmids. A better understanding of the YAC chromatin nature will improve our ability to design better data-storage chromosomes.
Collapse
|
24
|
Ugolini C, Mulroney L, Leger A, Castelli M, Criscuolo E, Williamson MK, Davidson AD, Almuqrin A, Giambruno R, Jain M, Frigè G, Olsen H, Tzertzinis G, Schildkraut I, Wulf MG, Corrêa IR, Ettwiller L, Clementi N, Clementi M, Mancini N, Birney E, Akeson M, Nicassio F, Matthews D, Leonardi T. Nanopore ReCappable sequencing maps SARS-CoV-2 5' capping sites and provides new insights into the structure of sgRNAs. Nucleic Acids Res 2022; 50:3475-3489. [PMID: 35244721 PMCID: PMC8989550 DOI: 10.1093/nar/gkac144] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 02/05/2022] [Accepted: 02/16/2022] [Indexed: 01/09/2023] Open
Abstract
The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested subgenomic RNAsused to express structural and accessory proteins. Long-read sequencing technologies such as nanopore direct RNA sequencing can recover full-length transcripts, greatly simplifying the assembly of structurally complex RNAs. However, these techniques do not detect the 5' cap, thus preventing reliable identification and quantification of full-length, coding transcript models. Here we used Nanopore ReCappable Sequencing (NRCeq), a new technique that can identify capped full-length RNAs, to assemble a complete annotation of SARS-CoV-2 sgRNAs and annotate the location of capping sites across the viral genome. We obtained robust estimates of sgRNA expression across cell lines and viral isolates and identified novel canonical and non-canonical sgRNAs, including one that uses a previously un-annotated leader-to-body junction site. The data generated in this work constitute a useful resource for the scientific community and provide important insights into the mechanisms that regulate the transcription of SARS-CoV-2 sgRNAs.
Collapse
Affiliation(s)
- Camilla Ugolini
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, 20139 Milano, Italy
| | - Logan Mulroney
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, 20139 Milano, Italy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- Biomolecular Engineering Department, UC Santa Cruz, CA 95064, USA
| | - Adrien Leger
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Matteo Castelli
- Laboratory of Microbiology and Virology, Vita-Salute San Raffaele University; via Olgettina 58, 20132 Milan, Italy
| | - Elena Criscuolo
- Laboratory of Microbiology and Virology, Vita-Salute San Raffaele University; via Olgettina 58, 20132 Milan, Italy
| | - Maia Kavanagh Williamson
- School of Cellular and Molecular Medicine, Faculty of Life Sciences, University Walk, University of Bristol, Bristol BS8 1TD, UK
| | - Andrew D Davidson
- School of Cellular and Molecular Medicine, Faculty of Life Sciences, University Walk, University of Bristol, Bristol BS8 1TD, UK
| | - Abdulaziz Almuqrin
- School of Cellular and Molecular Medicine, Faculty of Life Sciences, University Walk, University of Bristol, Bristol BS8 1TD, UK
- Department of Clinical Laboratory Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Roberto Giambruno
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, 20139 Milano, Italy
| | - Miten Jain
- Biomolecular Engineering Department, UC Santa Cruz, CA 95064, USA
| | - Gianmaria Frigè
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, 20139 Milano, Italy
| | - Hugh Olsen
- Biomolecular Engineering Department, UC Santa Cruz, CA 95064, USA
| | | | | | | | | | | | - Nicola Clementi
- Laboratory of Microbiology and Virology, Vita-Salute San Raffaele University; via Olgettina 58, 20132 Milan, Italy
- Laboratory of Medical Microbiology and Virology, IRCCS San Raffaele Scientific Institute; via Olgettina 60, 20132 Milan, Italy
| | - Massimo Clementi
- Laboratory of Microbiology and Virology, Vita-Salute San Raffaele University; via Olgettina 58, 20132 Milan, Italy
- Laboratory of Medical Microbiology and Virology, IRCCS San Raffaele Scientific Institute; via Olgettina 60, 20132 Milan, Italy
| | - Nicasio Mancini
- Laboratory of Microbiology and Virology, Vita-Salute San Raffaele University; via Olgettina 58, 20132 Milan, Italy
- Laboratory of Medical Microbiology and Virology, IRCCS San Raffaele Scientific Institute; via Olgettina 60, 20132 Milan, Italy
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Mark Akeson
- Biomolecular Engineering Department, UC Santa Cruz, CA 95064, USA
| | - Francesco Nicassio
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, 20139 Milano, Italy
| | - David A Matthews
- School of Cellular and Molecular Medicine, Faculty of Life Sciences, University Walk, University of Bristol, Bristol BS8 1TD, UK
| | - Tommaso Leonardi
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, 20139 Milano, Italy
| |
Collapse
|
25
|
Kidaka T, Sugi T, Hayashida K, Suzuki Y, Xuan X, Dubey JP, Yamagishi J. TSS-seq of Toxoplasma gondii sporozoites revealed a novel motif in stage-specific promoters. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2022; 98:105213. [PMID: 35041968 DOI: 10.1016/j.meegid.2022.105213] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 01/06/2022] [Accepted: 01/11/2022] [Indexed: 06/14/2023]
Abstract
Toxoplasma gondii is one of the most common zoonotic protozoan parasites. It has three major infectious stages: rapidly multiplying tachyzoites (Tz), slowly replicating bradyzoites (Bz) and a resting/free-living stage, sporozoites (Sz). The regulatory mechanisms governing stage-specific gene expression are not fully understood. Few transcriptional start sites (TSS) are known for Sz. In this study, we obtained TSS of Sz using an oligo-capping method and RNA-seq analysis. We identified 1,043,503 TSS in the Sz transcriptome. These defined 38,973 TSS clusters, of which, 11,925 were expressed in Sz and 1535 TSS differentially expressed in Sz. Based on these data, we defined promoter regions and novel sporozoite stage-specific motifs using MEME. TGTANNTACA was distributed around -55 to -75 regions from each TSS. Interestingly, the same motif was reported in another apicomplexan, Plasmodium berghei, as a cis-element of female-specific gametocyte genes, implying the presence of common regulatory machinery. Further comparative analysis should better define the distribution and function of these elements in other members of this important parasitic phylum.
Collapse
Affiliation(s)
- Taishi Kidaka
- Division of Collaboration and Education, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Hokkaido 001-0020, Japan
| | - Tatsuki Sugi
- Division of Collaboration and Education, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Hokkaido 001-0020, Japan
| | - Kyoko Hayashida
- Division of Collaboration and Education, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Hokkaido 001-0020, Japan; International Collaboration Unit, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Hokkaido 001-0020, Japan
| | - Yutaka Suzuki
- Department of Medical Genome Sciences, University of Tokyo, Kashiwa, Chiba 277-8562, Japan
| | - Xuenan Xuan
- National Research Center for Protozoan Diseases, Obihiro University of Agriculture and Veterinary Medicine, Obihiro, Hokkaido 080-8555, Japan
| | - Jitender P Dubey
- United States Department of Agriculture, Agricultural Research Service, Animal Parasitic Diseases Laboratory, Beltsville, MD, 20705-2350, USA
| | - Junya Yamagishi
- Division of Collaboration and Education, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Hokkaido 001-0020, Japan; International Collaboration Unit, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Hokkaido 001-0020, Japan.
| |
Collapse
|
26
|
A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers. Nat Biotechnol 2022; 40:1056-1065. [PMID: 35177836 PMCID: PMC9288987 DOI: 10.1038/s41587-022-01211-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 01/06/2022] [Indexed: 01/15/2023]
Abstract
Mounting evidence supports the idea that transcriptional patterns serve as more specific identifiers of active enhancers than histone marks; however, the optimal strategy to identify active enhancers both experimentally and computationally has not been determined. Here, we compared 13 genome-wide RNA sequencing assays in K562 cells and showed that the nuclear run-on followed by cap-selection assay (GRO/PRO-cap) has advantages in eRNA detection and active enhancer identification. We also introduced a tool, Peak Identifier for Nascent Transcript Starts (PINTS), to identify active promoters and enhancers genome-wide and pinpoint the precise location of the 5′ transcription start sites. Finally, we compiled a comprehensive enhancer candidate compendium based on the detected eRNA TSSs available in 120 cell and tissue types that can be accessed at https://pints.yulab.org. With the knowledge of the best available assays and pipelines, this large-scale annotation of candidate enhancers will pave the way for selection and characterization of their functions in a time- and labor-efficient manner in the future.
Collapse
|
27
|
Mulroney L, Wulf MG, Schildkraut I, Tzertzinis G, Buswell J, Jain M, Olsen H, Diekhans M, Corrêa IR, Akeson M, Ettwiller L. Identification of high-confidence human poly(A) RNA isoform scaffolds using nanopore sequencing. RNA (NEW YORK, N.Y.) 2022; 28:162-176. [PMID: 34728536 PMCID: PMC8906549 DOI: 10.1261/rna.078703.121] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 10/13/2021] [Indexed: 06/13/2023]
Abstract
Nanopore sequencing devices read individual RNA strands directly. This facilitates identification of exon linkages and nucleotide modifications; however, using conventional direct RNA nanopore sequencing, the 5' and 3' ends of poly(A) RNA cannot be identified unambiguously. This is due in part to RNA degradation in vivo and in vitro that can obscure transcription start and end sites. In this study, we aimed to identify individual full-length human RNA isoforms among ∼4 million nanopore poly(A)-selected RNA reads. First, to identify RNA strands bearing 5' m7G caps, we exchanged the biological cap for a modified cap attached to a 45-nt oligomer. This oligomer adaptation method improved 5' end sequencing and ensured correct identification of the 5' m7G capped ends. Second, among these 5'-capped nanopore reads, we screened for features consistent with a 3' polyadenylation site. Combining these two steps, we identified 294,107 individual high-confidence full-length RNA scaffolds from human GM12878 cells, most of which (257,721) aligned to protein-coding genes. Of these, 4876 scaffolds indicated unannotated isoforms that were often internal to longer, previously identified RNA isoforms. Orthogonal data for m7G caps and open chromatin, such as CAGE and DNase-HS seq, confirmed the validity of these high-confidence RNA scaffolds.
Collapse
Affiliation(s)
- Logan Mulroney
- Biomolecular Engineering Department, UC Santa Cruz, California 95064, USA
| | | | | | | | - John Buswell
- New England Biolabs, Ipswich, Massachusetts 01938, USA
| | - Miten Jain
- Biomolecular Engineering Department, UC Santa Cruz, California 95064, USA
| | - Hugh Olsen
- Biomolecular Engineering Department, UC Santa Cruz, California 95064, USA
| | - Mark Diekhans
- Genomics Institute, UC Santa Cruz, California 95064, USA
| | - Ivan R Corrêa
- New England Biolabs, Ipswich, Massachusetts 01938, USA
| | - Mark Akeson
- Biomolecular Engineering Department, UC Santa Cruz, California 95064, USA
| | | |
Collapse
|
28
|
Kyzar EJ, Bohnsack JP, Pandey SC. Current and Future Perspectives of Noncoding RNAs in Brain Function and Neuropsychiatric Disease. Biol Psychiatry 2022; 91:183-193. [PMID: 34742545 PMCID: PMC8959010 DOI: 10.1016/j.biopsych.2021.08.013] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 08/05/2021] [Accepted: 08/12/2021] [Indexed: 02/07/2023]
Abstract
Noncoding RNAs (ncRNAs) represent the majority of the transcriptome and play important roles in regulating neuronal functions. ncRNAs are exceptionally diverse in both structure and function and include enhancer RNAs, long ncRNAs, and microRNAs, all of which demonstrate specific temporal and regional expression in the brain. Here, we review recent studies demonstrating that ncRNAs modulate chromatin structure, act as chaperone molecules, and contribute to synaptic remodeling and behavior. In addition, we discuss ncRNA function within the context of neuropsychiatric diseases, particularly focusing on addiction and schizophrenia, and the recent methodological developments that allow for better understanding of ncRNA function in the brain. Overall, ncRNAs represent an underrecognized molecular contributor to complex neuronal processes underlying neuropsychiatric disorders.
Collapse
Affiliation(s)
- Evan J Kyzar
- Center for Alcohol Research in Epigenetics, Department of Psychiatry, University of Illinois at Chicago, Chicago, Illinois; Department of Psychiatry, Columbia University Irving Medical Center, New York State Psychiatric Institute, New York, New York
| | - John Peyton Bohnsack
- Center for Alcohol Research in Epigenetics, Department of Psychiatry, University of Illinois at Chicago, Chicago, Illinois
| | - Subhash C Pandey
- Center for Alcohol Research in Epigenetics, Department of Psychiatry, University of Illinois at Chicago, Chicago, Illinois; Jesse Brown Veterans Affairs Medical Center, University of Illinois at Chicago, Chicago, Illinois; Department of Anatomy and Cell Biology, University of Illinois at Chicago, Chicago, Illinois.
| |
Collapse
|
29
|
Yan B, Tzertzinis G, Schildkraut I, Ettwiller L. Comprehensive determination of transcription start sites derived from all RNA polymerases using ReCappable-seq. Genome Res 2021; 32:162-174. [PMID: 34815308 PMCID: PMC8744680 DOI: 10.1101/gr.275784.121] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 11/19/2021] [Indexed: 11/24/2022]
Abstract
Determination of eukaryotic transcription start sites (TSSs) has been based on methods that require the cap structure at the 5' end of transcripts derived from Pol II RNA polymerase. Consequently, these methods do not reveal TSSs derived from the other RNA polymerases that also play critical roles in various cell functions. To address this limitation, we developed ReCappable-seq, which comprehensively identifies TSS for both Pol II and non-Pol II transcripts at single-nucleotide resolution. The method relies on specific enzymatic exchange of 5' m7G caps and 5' triphosphates with a selectable tag. When applied to human transcriptomes, ReCappable-seq identifies Pol II TSSs that are in agreement with orthogonal methods such as CAGE. Additionally, ReCappable-seq reveals a rich landscape of TSSs associated with Pol III transcripts that have not previously been amenable to study at genome-wide scale. Novel TSS from non-Pol II transcription can be located in the nuclear and mitochondrial genomes. ReCappable-seq interrogates the regulatory landscape of coding and noncoding RNA concurrently and enables the classification of epigenetic profiles associated with Pol II and non-Pol II TSS.
Collapse
Affiliation(s)
- Bo Yan
- New England Biolabs Incorporated, Ipswich, Massachusetts 01938, USA
| | | | - Ira Schildkraut
- New England Biolabs Incorporated, Ipswich, Massachusetts 01938, USA
| | | |
Collapse
|
30
|
Spontaneous pulmonary emphysema in mice lacking all three nitric oxide synthase isoforms. Sci Rep 2021; 11:22088. [PMID: 34764368 PMCID: PMC8586362 DOI: 10.1038/s41598-021-01453-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Accepted: 10/28/2021] [Indexed: 12/13/2022] Open
Abstract
The roles of endogenous nitric oxide (NO) derived from the entire NO synthases (NOSs) system have yet to be fully elucidated. We addressed this issue in mice in which all three NOS isoforms were deleted. Under basal conditions, the triple n/i/eNOSs−/− mice displayed significantly longer mean alveolar linear intercept length, increased alveolar destructive index, reduced lung elastic fiber content, lower lung field computed tomographic value, and greater end-expiratory lung volume as compared with wild-type (WT) mice. None of single NOS−/− or double NOSs−/− genotypes showed such features. These findings were observed in the triple n/i/eNOSs−/− mice as early as 4 weeks after birth. Cyclopaedic and quantitative comparisons of mRNA expression levels between the lungs of WT and triple n/i/eNOSs−/− mice by cap analysis of gene expression (CAGE) revealed that mRNA expression levels of three Wnt ligands and ten Wnt/β-catenin signaling components were significantly reduced in the lungs of triple n/i/eNOSs−/− mice. These results provide the first direct evidence that complete disruption of all three NOS genes results in spontaneous pulmonary emphysema in juvenile mice in vivo possibly through down-regulation of the Wnt/β-catenin signaling pathway, demonstrating a novel preventive role of the endogenous NO/NOS system in the occurrence of pulmonary emphysema.
Collapse
|
31
|
Abstract
Transcription start site (TSS) selection influences transcript stability and translation as well as protein sequence. Alternative TSS usage is pervasive in organismal development, is a major contributor to transcript isoform diversity in humans, and is frequently observed in human diseases including cancer. In this review, we discuss the breadth of techniques that have been used to globally profile TSSs and the resulting insights into gene regulation, as well as future prospects in this area of inquiry.
Collapse
Affiliation(s)
| | - Gabriel E. Zentner
- Department of Biology, Indiana University, Bloomington, IN 47401, USA
- Indiana University Melvin and Bren Simon Comprehensive Cancer Center, Indianapolis, IN 46202, USA
| |
Collapse
|
32
|
Shaw PJ, Piriyapongsa J, Kaewprommal P, Wongsombat C, Chaosrikul C, Teeravajanadet K, Boonbangyang M, Uthaipibull C, Kamchonwongpaisan S, Tongsima S. Identifying transcript 5' capped ends in Plasmodium falciparum. PeerJ 2021; 9:e11983. [PMID: 34527439 PMCID: PMC8401752 DOI: 10.7717/peerj.11983] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 07/26/2021] [Indexed: 12/15/2022] Open
Abstract
Background The genome of the human malaria parasite Plasmodium falciparum is poorly annotated, in particular, the 5' capped ends of its mRNA transcripts. New approaches are needed to fully catalog P. falciparum transcripts for understanding gene function and regulation in this organism. Methods We developed a transcriptomic method based on next-generation sequencing of complementary DNA (cDNA) enriched for full-length fragments using eIF4E, a 5' cap-binding protein, and an unenriched control. DNA sequencing adapter was added after enrichment of full-length cDNA using two different ligation protocols. From the mapped sequence reads, enrichment scores were calculated for all transcribed nucleotides and used to calculate P-values of 5' capped nucleotide enrichment. Sensitivity and accuracy were increased by combining P-values from replicate experiments. Data were obtained for P. falciparum ring, trophozoite and schizont stages of intra-erythrocytic development. Results 5' capped nucleotide signals were mapped to 17,961 non-overlapping P. falciparum genomic intervals. Analysis of the dominant 5' capped nucleotide in these genomic intervals revealed the presence of two groups with distinctive epigenetic features and sequence patterns. A total of 4,512 transcripts were annotated as 5' capped based on the correspondence of 5' end with 5' capped nucleotide annotated from full-length cDNA data. Discussion The presence of two groups of 5' capped nucleotides suggests that alternative mechanisms may exist for producing 5' capped transcript ends in P. falciparum. The 5' capped transcripts that are antisense, outside of, or partially overlapping coding regions may be important regulators of gene function in P. falciparum.
Collapse
Affiliation(s)
- Philip J Shaw
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Jittima Piriyapongsa
- National Biobank of Thailand (NBT), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Pavita Kaewprommal
- National Biobank of Thailand (NBT), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Chayaphat Wongsombat
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Chadapohn Chaosrikul
- National Biobank of Thailand (NBT), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Krirkwit Teeravajanadet
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Manon Boonbangyang
- National Biobank of Thailand (NBT), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Chairat Uthaipibull
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Sumalee Kamchonwongpaisan
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Sissades Tongsima
- National Biobank of Thailand (NBT), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| |
Collapse
|
33
|
Guerrini MM, Oguchi A, Suzuki A, Murakawa Y. Cap analysis of gene expression (CAGE) and noncoding regulatory elements. Semin Immunopathol 2021; 44:127-136. [PMID: 34468849 DOI: 10.1007/s00281-021-00886-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 08/13/2021] [Indexed: 01/06/2023]
Abstract
Cap analysis of gene expression (CAGE) was developed to detect the 5' end of RNA. Trapping of the RNA 5'-cap structure enables the enrichment and selective sequencing of complete transcripts. Upscaled high-throughput versions of CAGE have enabled the genome-wide identification of transcription start sites, including transcriptionally active promoters and enhancers. CAGE sequencing can be exploited to draw comprehensive maps of active genomic regulatory elements in a cell type- and activation-specific manner. The cells of the immune system are among the best candidates to be analyzed in humans, since they are easily accessible. In this review, we discuss how CAGE data are instrumental for integrative analyses with quantitative trait loci and omics data, and their usefulness in the mechanistic interpretation of the effects of genetic variations over the entire human genome. Integrating CAGE data with the currently available omics information will contribute to better understanding of the genome-wide association study variants that lie outside of annotated genes, deepening our knowledge on human diseases, and enabling the targeted design of more specific therapeutic interventions.
Collapse
Affiliation(s)
- Matteo Maurizio Guerrini
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
| | - Akiko Oguchi
- RIKEN-IFOM Joint Laboratory for Cancer Genomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Akari Suzuki
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Yasuhiro Murakawa
- RIKEN-IFOM Joint Laboratory for Cancer Genomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- IFOM-the FIRC Institute of Molecular Oncology, Milan, Italy
| |
Collapse
|
34
|
Ibrahim F, Oppelt J, Maragkakis M, Mourelatos Z. TERA-Seq: true end-to-end sequencing of native RNA molecules for transcriptome characterization. Nucleic Acids Res 2021; 49:e115. [PMID: 34428294 PMCID: PMC8599856 DOI: 10.1093/nar/gkab713] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 07/31/2021] [Accepted: 08/18/2021] [Indexed: 11/14/2022] Open
Abstract
Direct sequencing of single, native RNA molecules through nanopores has a strong potential to transform research in all aspects of RNA biology and clinical diagnostics. The existing platform from Oxford Nanopore Technologies is unable to sequence the very 5′ ends of RNAs and is limited to polyadenylated molecules. Here, we develop True End-to-end RNA Sequencing (TERA-Seq), a platform that addresses these limitations, permitting more thorough transcriptome characterization. TERA-Seq describes both poly- and non-polyadenylated RNA molecules and accurately identifies their native 5′ and 3′ ends by ligating uniquely designed adapters that are sequenced along with the transcript. We find that capped, full-length mRNAs in human cells show marked variation of poly(A) tail lengths at the single molecule level. We report prevalent capping downstream of canonical transcriptional start sites in otherwise fully spliced and polyadenylated molecules. We reveal RNA processing and decay at single molecule level and find that mRNAs decay cotranslationally, often from their 5′ ends, while frequently retaining poly(A) tails. TERA-Seq will prove useful in many applications where true end-to-end direct sequencing of single, native RNA molecules and their isoforms is desirable.
Collapse
Affiliation(s)
- Fadia Ibrahim
- Department of Pathology and Laboratory Medicine, Division of Neuropathology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.,Department of Biochemistry and Molecular Biology, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA 19107, USA
| | - Jan Oppelt
- Department of Pathology and Laboratory Medicine, Division of Neuropathology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Manolis Maragkakis
- Laboratory of Genetics and Genomics, National Institute on Aging, Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| | - Zissimos Mourelatos
- Department of Pathology and Laboratory Medicine, Division of Neuropathology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
35
|
Banerjee S, Bhandary P, Woodhouse M, Sen TZ, Wise RP, Andorf CM. FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences. BMC Bioinformatics 2021; 22:205. [PMID: 33879057 PMCID: PMC8056616 DOI: 10.1186/s12859-021-04120-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 04/07/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Gene annotation in eukaryotes is a non-trivial task that requires meticulous analysis of accumulated transcript data. Challenges include transcriptionally active regions of the genome that contain overlapping genes, genes that produce numerous transcripts, transposable elements and numerous diverse sequence repeats. Currently available gene annotation software applications depend on pre-constructed full-length gene sequence assemblies which are not guaranteed to be error-free. The origins of these sequences are often uncertain, making it difficult to identify and rectify errors in them. This hinders the creation of an accurate and holistic representation of the transcriptomic landscape across multiple tissue types and experimental conditions. Therefore, to gauge the extent of diversity in gene structures, a comprehensive analysis of genome-wide expression data is imperative. RESULTS We present FINDER, a fully automated computational tool that optimizes the entire process of annotating genes and transcript structures. Unlike current state-of-the-art pipelines, FINDER automates the RNA-Seq pre-processing step by working directly with raw sequence reads and optimizes gene prediction from BRAKER2 by supplementing these reads with associated proteins. The FINDER pipeline (1) reports transcripts and recognizes genes that are expressed under specific conditions, (2) generates all possible alternatively spliced transcripts from expressed RNA-Seq data, (3) analyzes read coverage patterns to modify existing transcript models and create new ones, and (4) scores genes as high- or low-confidence based on the available evidence across multiple datasets. We demonstrate the ability of FINDER to automatically annotate a diverse pool of genomes from eight species. CONCLUSIONS FINDER takes a completely automated approach to annotate genes directly from raw expression data. It is capable of processing eukaryotic genomes of all sizes and requires no manual supervision-ideal for bench researchers with limited experience in handling computational tools.
Collapse
Affiliation(s)
- Sagnik Banerjee
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, 50011, USA
- Department of Statistics, Iowa State University, Ames, IA, 50011, USA
| | - Priyanka Bhandary
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, 50011, USA
- Department of Genetics, Developmental and Cell Biology, Iowa State University, Ames, IA, 50011, USA
| | - Margaret Woodhouse
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA
| | - Taner Z Sen
- Crop Improvement and Genetics Research Unit, USDA-Agricultural Research Service, Albany, CA, 94710, USA
| | - Roger P Wise
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA
- Department of Plant Pathology and Microbiology, Iowa State University, Ames, IA, 50011, USA
| | - Carson M Andorf
- Corn Insects and Crop Genetics Research Unit, USDA-Agricultural Research Service, Ames, IA, 50011, USA.
- Department of Computer Science, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
36
|
Goszczynski DE, Halstead MM, Islas-Trejo AD, Zhou H, Ross PJ. Transcription initiation mapping in 31 bovine tissues reveals complex promoter activity, pervasive transcription, and tissue-specific promoter usage. Genome Res 2021; 31:732-744. [PMID: 33722934 PMCID: PMC8015843 DOI: 10.1101/gr.267336.120] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 02/01/2021] [Indexed: 01/04/2023]
Abstract
Characterizing transcription start sites is essential for understanding the regulatory mechanisms that control gene expression. Recently, a new bovine genome assembly (ARS-UCD1.2) with high continuity, accuracy, and completeness was released; however, the functional annotation of the bovine genome lacks precise transcription start sites and contains a low number of transcripts in comparison to human and mouse. By using the RAMPAGE approach, this study identified transcription start sites at high resolution in a large collection of bovine tissues. We found several known and novel transcription start sites attributed to promoters of protein-coding and lncRNA genes that were validated through experimental and in silico evidence. With these findings, the annotation of transcription start sites in cattle reached a level comparable to the mouse and human genome annotations. In addition, we identified and characterized transcription start sites for antisense transcripts derived from bidirectional promoters, potential lncRNAs, mRNAs, and pre-miRNAs. We also analyzed the quantitative aspects of RAMPAGE to produce a promoter activity atlas, reaching highly reproducible results comparable to traditional RNA-seq. Coexpression networks revealed considerable use of tissue-specific promoters, especially between brain and testicle, which expressed several genes in common from alternate loci. Furthermore, regions surrounding coexpressed modules were enriched in binding factor motifs representative of each tissue. The comprehensive annotation of promoters in such a large collection of tissues will substantially contribute to our understanding of gene expression in cattle and other mammalian species, shortening the gap between genotypes and phenotypes.
Collapse
Affiliation(s)
- Daniel E Goszczynski
- Department of Animal Science, University of California, Davis, California 95616, USA
| | - Michelle M Halstead
- Department of Animal Science, University of California, Davis, California 95616, USA
| | - Alma D Islas-Trejo
- Department of Animal Science, University of California, Davis, California 95616, USA
| | - Huaijun Zhou
- Department of Animal Science, University of California, Davis, California 95616, USA
| | - Pablo J Ross
- Department of Animal Science, University of California, Davis, California 95616, USA
| |
Collapse
|
37
|
Ueno D, Mikami M, Yamasaki S, Kaneko M, Mukuta T, Demura T, Kato K. Changes in mRNA Degradation Efficiencies under Varying Conditions Are Regulated by Multiple Determinants in Arabidopsis thaliana. PLANT & CELL PHYSIOLOGY 2021; 62:143-155. [PMID: 33289533 DOI: 10.1093/pcp/pcaa147] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 11/13/2020] [Indexed: 06/12/2023]
Abstract
Multiple mechanisms are involved in gene expression, with mRNA degradation being critical for the control of mRNA accumulation. In plants, although some trans-acting factors and motif sequences have been identified in deadenylation-dependent mRNA degradation, endonucleolytic cleavage-dependent mRNA degradation has not been studied in detail. Previously, we developed truncated RNA-end sequencing (TREseq) in Arabidopsis thaliana and detected G-rich sequence motifs around 5' degradation intermediates. However, it remained to be elucidated whether degradation efficiencies of 5' degradation intermediates in A. thaliana vary among growth conditions and developmental stages. To address this issue, we conducted TREseq of cultured cells under heat stress and at three developmental stages (seedlings, expanding leaves and expanded leaves) and compared 5' degradation intermediates data among the samples. Although some 5' degradation intermediates had almost identical degradation efficiencies, others differed among conditions. We focused on the genes and sites whose degradation efficiencies differed. Changes in degradation efficiencies at the gene and site levels revealed an effect on mRNA accumulation in all comparisons. These changes in degradation efficiencies involved multiple determinants, including mRNA length and translation efficiency. These results suggest that several determinants govern the efficiency of mRNA degradation in plants, helping the organism to adapt to varying conditions by controlling mRNA accumulation.
Collapse
Affiliation(s)
- Daishin Ueno
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, 630-0192 Japan
| | - Maki Mikami
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, 630-0192 Japan
| | - Shotaro Yamasaki
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, 630-0192 Japan
| | - Miho Kaneko
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, 630-0192 Japan
| | - Takafumi Mukuta
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, 630-0192 Japan
| | - Taku Demura
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, 630-0192 Japan
| | - Ko Kato
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, 630-0192 Japan
| |
Collapse
|
38
|
Young ND, Stroehlein AJ, Kinkar L, Wang T, Sohn WM, Chang BCH, Kaur P, Weisz D, Dudchenko O, Aiden EL, Korhonen PK, Gasser RB. High-quality reference genome for Clonorchis sinensis. Genomics 2021; 113:1605-1615. [PMID: 33677057 DOI: 10.1016/j.ygeno.2021.03.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 01/18/2021] [Accepted: 03/01/2021] [Indexed: 12/13/2022]
Abstract
The Chinese liver fluke, Clonorchis sinensis, causes the disease clonorchiasis, affecting ~35 million people in regions of China, Vietnam, Korea and the Russian Far East. Chronic clonorchiasis causes cholangitis and can induce a malignant cancer, called cholangiocarcinoma, in the biliary system. Control in endemic regions is challenging, and often relies largely on chemotherapy with one anthelmintic, called praziquantel. Routine treatment carries a significant risk of inducing resistance to this anthelmintic in the fluke, such that the discovery of new interventions is considered important. It is hoped that the use of molecular technologies will assist this endeavour by enabling the identification of drug or vaccine targets involved in crucial biological processes and/or pathways in the parasite. Although draft genomes of C. sinensis have been published, their assemblies are fragmented. In the present study, we tackle this genome fragmentation issue by utilising, in an integrated way, advanced (second- and third-generation) DNA sequencing and informatic approaches to build a high-quality reference genome for C. sinensis, with chromosome-level contiguity and curated gene models. This substantially-enhanced genome provides a resource that could accelerate fundamental and applied molecular investigations of C. sinensis, clonorchiasis and/or cholangiocarcinoma, and assist in the discovery of new interventions against what is a highly significant, but neglected disease-complex.
Collapse
Affiliation(s)
- Neil D Young
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Andreas J Stroehlein
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Liina Kinkar
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Tao Wang
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Woon-Mok Sohn
- Department of Parasitology and Institute of Health Sciences, School of Medicine, Gyeongsang National University, Jinju, Republic of Korea
| | - Bill C H Chang
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Parwinder Kaur
- UWA School of Agriculture and Environment, Faculty of Science, University of Western Australia, Perth, Western Australia 6009, Australia
| | - David Weisz
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olga Dudchenko
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA
| | - Erez Lieberman Aiden
- UWA School of Agriculture and Environment, Faculty of Science, University of Western Australia, Perth, Western Australia 6009, Australia; The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA; Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech, Pudong 201210, China
| | - Pasi K Korhonen
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
39
|
Westermann AJ, Vogel J. Cross-species RNA-seq for deciphering host-microbe interactions. Nat Rev Genet 2021; 22:361-378. [PMID: 33597744 DOI: 10.1038/s41576-021-00326-y] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/05/2021] [Indexed: 02/08/2023]
Abstract
The human body is constantly exposed to microorganisms, which entails manifold interactions between human cells and diverse commensal or pathogenic bacteria. The cellular states of the interacting cells are decisive for the outcome of these encounters such as whether bacterial virulence programmes and host defence or tolerance mechanisms are induced. This Review summarizes how next-generation RNA sequencing (RNA-seq) has become a primary technology to study host-microbe interactions with high resolution, improving our understanding of the physiological consequences and the mechanisms at play. We illustrate how the discriminatory power and sensitivity of RNA-seq helps to dissect increasingly complex cellular interactions in time and space down to the single-cell level. We also outline how future transcriptomics may answer currently open questions in host-microbe interactions and inform treatment schemes for microbial disorders.
Collapse
Affiliation(s)
- Alexander J Westermann
- Helmholtz Institute for RNA-based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), Würzburg, Germany. .,Institute for Molecular Infection Biology (IMIB), University of Würzburg, Würzburg, Germany.
| | - Jörg Vogel
- Helmholtz Institute for RNA-based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), Würzburg, Germany. .,Institute for Molecular Infection Biology (IMIB), University of Würzburg, Würzburg, Germany.
| |
Collapse
|
40
|
Altered visual processing in the mdx52 mouse model of Duchenne muscular dystrophy. Neurobiol Dis 2021; 152:105288. [PMID: 33556541 DOI: 10.1016/j.nbd.2021.105288] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 01/26/2021] [Accepted: 02/03/2021] [Indexed: 02/06/2023] Open
Abstract
The mdx52 mouse model of Duchenne muscular dystrophy (DMD) is lacking exon 52 of the DMD gene that is located in a hotspot mutation region causing cognitive deficits and retinal anomalies in DMD patients. This deletion leads to the loss of the dystrophin proteins, Dp427, Dp260 and Dp140, while Dp71 is preserved. The flash electroretinogram (ERG) in mdx52 mice was previously characterized by delayed dark-adapted b-waves. A detailed description of functional ERG changes and visual performances in mdx52 mice is, however, lacking. Here an extensive full-field ERG repertoire was applied in mdx52 mice and WT littermates to analyze retinal physiology in scotopic, mesopic and photopic conditions in response to flash, sawtooth and/or sinusoidal stimuli. Behavioral contrast sensitivity was assessed using quantitative optomotor response (OMR) to sinusoidally modulated luminance gratings at 100% or 50% contrast. The mdx52 mice exhibited reduced amplitudes and delayed implicit times in dark-adapted ERG flash responses, particularly in their b-wave and oscillatory potentials, and diminished amplitudes of light-adapted flash ERGs. ERG responses to sawtooth stimuli were also diminished and delayed for both mesopic and photopic conditions in mdx52 mice and the first harmonic amplitudes to photopic sine-wave stimuli were smaller at all temporal frequencies. OMR indices were comparable between genotypes at 100% contrast but significantly reduced in mdx52 mice at 50% contrast. The complex ERG alterations and disturbed contrast vision in mdx52 mice include features observed in DMD patients and suggest altered photoreceptor-to-bipolar cell transmission possibly affecting contrast sensitivity. The mdx52 mouse is a relevant model to appraise the roles of retinal dystrophins and for preclinical studies related to DMD.
Collapse
|
41
|
Phelps WA, Carlson AE, Lee MT. Optimized design of antisense oligomers for targeted rRNA depletion. Nucleic Acids Res 2021; 49:e5. [PMID: 33221877 PMCID: PMC7797071 DOI: 10.1093/nar/gkaa1072] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 10/01/2020] [Accepted: 10/21/2020] [Indexed: 11/14/2022] Open
Abstract
RNA sequencing (RNA-seq) is extensively used to quantify gene expression transcriptome-wide. Although often paired with polyadenylate (poly(A)) selection to enrich for messenger RNA (mRNA), many applications require alternate approaches to counteract the high proportion of ribosomal RNA (rRNA) in total RNA. Recently, digestion using RNaseH and antisense DNA oligomers tiling target rRNAs has emerged as an alternative to commercial rRNA depletion kits. Here, we present a streamlined, more economical RNaseH-mediated rRNA depletion with substantially lower up-front costs, using shorter antisense oligos only sparsely tiled along the target RNA in a 5-min digestion reaction. We introduce a novel Web tool, Oligo-ASST, that simplifies oligo design to target regions with optimal thermodynamic properties, and additionally can generate compact, common oligo pools that simultaneously target divergent RNAs, e.g. across different species. We demonstrate the efficacy of these strategies by generating rRNA-depletion oligos for Xenopus laevis and for zebrafish, which expresses two distinct versions of rRNAs during embryogenesis. The resulting RNA-seq libraries reduce rRNA to <5% of aligned reads, on par with poly(A) selection, and also reveal expression of many non-adenylated RNA species. Oligo-ASST is freely available at https://mtleelab.pitt.edu/oligo to design antisense oligos for any taxon or to target any abundant RNA for depletion.
Collapse
Affiliation(s)
- Wesley A Phelps
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Anne E Carlson
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Miler T Lee
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| |
Collapse
|
42
|
Markus BM, Waldman BS, Lorenzi HA, Lourido S. High-Resolution Mapping of Transcription Initiation in the Asexual Stages of Toxoplasma gondii. Front Cell Infect Microbiol 2021; 10:617998. [PMID: 33553008 PMCID: PMC7854901 DOI: 10.3389/fcimb.2020.617998] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 12/03/2020] [Indexed: 12/13/2022] Open
Abstract
Toxoplasma gondii is a common parasite of humans and animals, causing life-threatening disease in the immunocompromized, fetal abnormalities when contracted during gestation, and recurrent ocular lesions in some patients. Central to the prevalence and pathogenicity of this protozoan is its ability to adapt to a broad range of environments, and to differentiate between acute and chronic stages. These processes are underpinned by a major rewiring of gene expression, yet the mechanisms that regulate transcription in this parasite are only partially characterized. Deciphering these mechanisms requires a precise and comprehensive map of transcription start sites (TSSs); however, Toxoplasma TSSs have remained incompletely defined. To address this challenge, we used 5'-end RNA sequencing to genomically assess transcription initiation in both acute and chronic stages of Toxoplasma. Here, we report an in-depth analysis of transcription initiation at promoters, and provide empirically-defined TSSs for 7603 (91%) protein-coding genes, of which only 1840 concur with existing gene models. Comparing data from acute and chronic stages, we identified instances of stage-specific alternative TSSs that putatively generate mRNA isoforms with distinct 5' termini. Analysis of the nucleotide content and nucleosome occupancy around TSSs allowed us to examine the determinants of TSS choice, and outline features of Toxoplasma promoter architecture. We also found pervasive divergent transcription at Toxoplasma promoters, clustered within the nucleosomes of highly-symmetrical phased arrays, underscoring chromatin contributions to transcription initiation. Corroborating previous observations, we asserted that Toxoplasma 5' leaders are among the longest of any eukaryote studied thus far, displaying a median length of approximately 800 nucleotides. Further highlighting the utility of a precise TSS map, we pinpointed motifs associated with transcription initiation, including the binding sites of the master regulator of chronic-stage differentiation, BFD1, and a novel motif with a similar positional arrangement present at 44% of Toxoplasma promoters. This work provides a critical resource for functional genomics in Toxoplasma, and lays down a foundation to study the interactions between genomic sequences and the regulatory factors that control transcription in this parasite.
Collapse
Affiliation(s)
- Benedikt M. Markus
- Whitehead Institute for Biomedical Research, Cambridge, MA, United States
- Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Benjamin S. Waldman
- Whitehead Institute for Biomedical Research, Cambridge, MA, United States
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, United States
| | | | - Sebastian Lourido
- Whitehead Institute for Biomedical Research, Cambridge, MA, United States
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, United States
| |
Collapse
|
43
|
Moya-Ramírez I, Bouton C, Kontoravdi C, Polizzi K. High resolution biosensor to test the capping level and integrity of mRNAs. Nucleic Acids Res 2021; 48:e129. [PMID: 33152073 PMCID: PMC7736790 DOI: 10.1093/nar/gkaa955] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 09/22/2020] [Accepted: 10/08/2020] [Indexed: 11/21/2022] Open
Abstract
5′ Cap structures are ubiquitous on eukaryotic mRNAs, essential for post-transcriptional processing, translation initiation and stability. Here we describe a biosensor designed to detect the presence of cap structures on mRNAs that is also sensitive to mRNA degradation, so uncapped or degraded mRNAs can be detected in a single step. The biosensor is based on a chimeric protein that combines the recognition and transduction roles in a single molecule. The main feature of this sensor is its simplicity, enabling semi-quantitative analyses of capping levels with minimal instrumentation. The biosensor was demonstrated to detect the capping level on several in vitro transcribed mRNAs. Its sensitivity and dynamic range remained constant with RNAs ranging in size from 250 nt to approximately 2700 nt and the biosensor was able to detect variations in the capping level in increments of at least 20%, with a limit of detection of 2.4 pmol. Remarkably, it also can be applied to more complex analytes, such mRNA vaccines and mRNAs transcribed in vivo. This biosensor is an innovative example of a technology able to detect analytically challenging structures such as mRNA caps. It could find application in a variety of scenarios, from quality analysis of mRNA-based products such as vaccines to optimization of in vitro capping reactions.
Collapse
Affiliation(s)
- Ignacio Moya-Ramírez
- Department of Chemical Engineering, Imperial College London, London SW7 2AZ, UK.,Imperial College Centre for Synthetic Biology, Imperial College London, London SW7 2AZ, UK
| | - Clement Bouton
- Department of Infectious Disease, Imperial College London, London W2 1NY, UK
| | - Cleo Kontoravdi
- Department of Chemical Engineering, Imperial College London, London SW7 2AZ, UK
| | - Karen Polizzi
- Department of Chemical Engineering, Imperial College London, London SW7 2AZ, UK.,Imperial College Centre for Synthetic Biology, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
44
|
Lu Z, Lin Z. The origin and evolution of a distinct mechanism of transcription initiation in yeasts. Genome Res 2020; 31:51-63. [PMID: 33219055 PMCID: PMC7849388 DOI: 10.1101/gr.264325.120] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Accepted: 11/17/2020] [Indexed: 12/13/2022]
Abstract
The molecular process of transcription by RNA Polymerase II is highly conserved among eukaryotes (“classic model”). A distinct way of locating transcription start sites (TSSs) has been identified in a budding yeast Saccharomyces cerevisiae (“scanning model”). Herein, we applied genomic approaches to elucidate the origin of the scanning model and its underlying genetic mechanisms. We first identified TSSs at single-nucleotide resolution for 12 yeast species using the nAnT-iCAGE technique, which significantly improved the annotations of these genomes by providing accurate 5′ boundaries for protein-coding genes. We then inferred the initiation mechanism of each species based on its TSS maps and genome sequences. We discovered that the scanning model likely originated after the split of Yarrowia lipolytica and the other budding yeasts. Species that use the scanning model showed an adenine-rich region immediately upstream of the TSS that might facilitate TSS selection. Both initiation mechanisms share a strong preference for pyrimidine–purine dinucleotides surrounding the TSS. Our results suggest that the purine is required to accurately recruit the first nucleotide, thereby increasing the chances of a messenger RNA of being capped during mRNA maturation, which is critical for efficient translation initiation during protein biosynthesis. Based on our findings, we propose a model for TSS selection in the scanning-model species, as well as a model for the stepwise process responsible for the origin and evolution of the scanning model.
Collapse
Affiliation(s)
- Zhaolian Lu
- Department of Biology, Saint Louis University, St. Louis, Missouri 63104, USA
| | - Zhenguo Lin
- Department of Biology, Saint Louis University, St. Louis, Missouri 63104, USA
| |
Collapse
|
45
|
Barriuso J, Lamarca A. Clinical and Translational Research Challenges in Neuroendocrine Tumours. Curr Med Chem 2020; 27:4823-4839. [PMID: 32031064 DOI: 10.2174/0929867327666200207120725] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Revised: 12/04/2019] [Accepted: 01/16/2020] [Indexed: 12/31/2022]
Abstract
Neuroendocrine tumours (NETs) represent a range of neoplasms that may arise from any (neuro)endocrine cell situated in any part of the human body. As any other rare diseases, NETs face several difficulties in relation to research. This review will describe some of the main challenges and proposed solutions faced by researchers with expertise in rare malignancies. Some of the most common challenges in clinical and translational research are enumerated in this review, covering aspects from clinical, translational and basic research. NETs being a heterogeneous group of diseases and a limited sample size of clinical and translational research projects are the main challenges. Challenges with NETs lay over the disparities between healthcare models to tackle rare diseases. NETs add an extra layer of complexity due to a numerous group of different entities. Prospective real-world data trials are an opportunity for rare cancers with the revolution of electronic health technologies. This review explores potential solutions to these challenges that could be useful not only to the NET community but also to other rare tumours researchers.
Collapse
Affiliation(s)
- Jorge Barriuso
- Division of Cancer Sciences, School of Medical Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom
| | - Angela Lamarca
- Division of Cancer Sciences, School of Medical Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
46
|
Wragg JW, Roos L, Vucenovic D, Cvetesic N, Lenhard B, Müller F. Embryonic tissue differentiation is characterized by transitions in cell cycle dynamic-associated core promoter regulation. Nucleic Acids Res 2020; 48:8374-8392. [PMID: 32619237 PMCID: PMC7470974 DOI: 10.1093/nar/gkaa563] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 06/10/2020] [Accepted: 06/19/2020] [Indexed: 12/30/2022] Open
Abstract
The core-promoter, a stretch of DNA surrounding the transcription start site (TSS), is a major integration-point for regulatory-signals controlling gene-transcription. Cellular differentiation is marked by divergence in transcriptional repertoire and cell-cycling behaviour between cells of different fates. The role promoter-associated gene-regulatory-networks play in development-associated transitions in cell-cycle-dynamics is poorly understood. This study demonstrates in a vertebrate embryo, how core-promoter variations define transcriptional output in cells transitioning from a proliferative to cell-lineage specifying phenotype. Assessment of cell proliferation across zebrafish embryo segmentation, using the FUCCI transgenic cell-cycle-phase marker, revealed a spatial and lineage-specific separation in cell-cycling behaviour. To investigate the role differential promoter usage plays in this process, cap-analysis-of-gene-expression (CAGE) was performed on cells segregated by cycling dynamics. This analysis revealed a dramatic increase in tissue-specific gene expression, concurrent with slowed cycling behaviour. We revealed a distinct sharpening in TSS utilization in genes upregulated in slowly cycling, differentiating tissues, associated with enhanced utilization of the TATA-box, in addition to Sp1 binding-sites. In contrast, genes upregulated in rapidly cycling cells carry broad distribution of TSS utilization, coupled with enrichment for the CCAAT-box. These promoter features appear to correspond to cell-cycle-dynamic rather than tissue/cell-lineage origin. Moreover, we observed genes with cell-cycle-dynamic-associated transitioning in TSS distribution and differential utilization of alternative promoters. These results demonstrate the regulatory role of core-promoters in cell-cycle-dependent transcription regulation, during embryo-development.
Collapse
Affiliation(s)
| | | | - Dunja Vucenovic
- Institute of Clinical Sciences and MRC Clinical Sciences Centre, Faculty of Medicine, Imperial College London, London, W12 0NN, UK
| | - Nevena Cvetesic
- Institute of Clinical Sciences and MRC Clinical Sciences Centre, Faculty of Medicine, Imperial College London, London, W12 0NN, UK
| | - Boris Lenhard
- Correspondence may also be addressed to Boris Lenhard. Tel: +44 20 3313 8353;
| | - Ferenc Müller
- To whom correspondence should be addressed. Tel: +44 121 414 2895;
| |
Collapse
|
47
|
Policastro RA, Raborn RT, Brendel VP, Zentner GE. Simple and efficient profiling of transcription initiation and transcript levels with STRIPE-seq. Genome Res 2020; 30:910-923. [PMID: 32660958 PMCID: PMC7370879 DOI: 10.1101/gr.261545.120] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 06/18/2020] [Indexed: 01/07/2023]
Abstract
Accurate mapping of transcription start sites (TSSs) is key for understanding transcriptional regulation. However, current protocols for genome-wide TSS profiling are laborious and/or expensive. We present Survey of TRanscription Initiation at Promoter Elements with high-throughput sequencing (STRIPE-seq), a simple, rapid, and cost-effective protocol for sequencing capped RNA 5' ends from as little as 50 ng total RNA. Including depletion of uncapped RNA and reaction cleanups, a STRIPE-seq library can be constructed in about 5 h. We show application of STRIPE-seq to TSS profiling in yeast and human cells and show that it can also be effectively used for quantification of transcript levels and analysis of differential gene expression. In conjunction with our ready-to-use computational workflows, STRIPE-seq is a straightforward, efficient means by which to probe the landscape of transcriptional initiation.
Collapse
Affiliation(s)
| | | | - Volker P Brendel
- Department of Biology
- Department of Computer Science, Indiana University, Bloomington, Indiana 47405, USA
| | - Gabriel E Zentner
- Department of Biology
- Indiana University Melvin and Bren Simon Comprehensive Cancer Center, Indianapolis, Indiana 46202, USA
| |
Collapse
|
48
|
Ueno D, Mukuta T, Yamasaki S, Mikami M, Demura T, Matsui T, Sawada K, Katsumoto Y, Okitsu N, Kato K. Different Plant Species Have Common Sequence Features Related to mRNA Degradation Intermediates. PLANT & CELL PHYSIOLOGY 2020; 61:53-63. [PMID: 31501893 DOI: 10.1093/pcp/pcz175] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 09/03/2019] [Indexed: 06/10/2023]
Abstract
mRNA degradation is an important cellular mechanism involved in the control of gene expression. Several genome-wide profiling methods have been developed for detecting mRNA degradation in plants and animals. However, because many of these techniques use poly (A) mRNA for library preparation, degradation intermediates are often only detected near the 3'-ends of transcripts. Previously, we developed the Truncated RNA End Sequencing (TREseq) method using Arabidopsis thaliana, and demonstrated that this method ameliorates 3'-end bias. In analyses using TREseq, we observed G-rich sequences near the 5'-ends of degradation intermediates. However, this finding remained to be confirmed in other plant species. Hence, in this study, we conducted TREseq analyses in Lactuca sativa (lettuce), Oryza sativa (rice) and Rosa hybrida (rose). These species including A. thaliana were selected to encompass a diverse range in the angiosperm phylogeny. The results revealed similar sequence features near the 5'-ends of degradation intermediates, and involvement of translation process in all four species. In addition, homologous genes have similar efficiencies of mRNA degradation in different plants, suggesting that similar mechanisms of mRNA degradation are conserved across plant species. These strong sequence features were not observed in previous degradome analyses among different species in plants.
Collapse
Affiliation(s)
- Daishin Ueno
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, 630-0192 Japan
| | - Takafumi Mukuta
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, 630-0192 Japan
| | - Shotaro Yamasaki
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, 630-0192 Japan
| | - Maki Mikami
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, 630-0192 Japan
| | - Taku Demura
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, 630-0192 Japan
| | - Takeshi Matsui
- Idemitsu Kosan Co., Ltd., Advanced Technology Research Laboratories, 1280 Kami-izumi, Sodegaura, Chiba, 299-0293 Japan
| | - Kazutoshi Sawada
- Idemitsu Kosan Co., Ltd., Advanced Technology Research Laboratories, 1280 Kami-izumi, Sodegaura, Chiba, 299-0293 Japan
| | - Yukihisa Katsumoto
- Research Institute, Suntory Global Innovation Center Ltd, 8-1-1 Seikadai, Seika-cho, Soraku-Gun, Kyoto 619-0284 Japan
| | - Naoko Okitsu
- Research Institute, Suntory Global Innovation Center Ltd, 8-1-1 Seikadai, Seika-cho, Soraku-Gun, Kyoto 619-0284 Japan
| | - Ko Kato
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, 630-0192 Japan
| |
Collapse
|
49
|
Vahrenkamp JM, Szczotka K, Dodson MK, Jarboe EA, Soisson AP, Gertz J. FFPEcap-seq: a method for sequencing capped RNAs in formalin-fixed paraffin-embedded samples. Genome Res 2019; 29:1826-1835. [PMID: 31649055 PMCID: PMC6836741 DOI: 10.1101/gr.249656.119] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 10/03/2019] [Indexed: 12/16/2022]
Abstract
The majority of clinical cancer specimens are preserved as formalin-fixed paraffin-embedded (FFPE) samples. For clinical molecular tests to have wide-reaching impact, they must be applicable to FFPE material. Accurate quantitative measurements of RNA derived from FFPE specimens is challenging because of low yields and high amounts of degradation. Here, we present FFPEcap-seq, a method specifically designed for sequencing capped 5′ ends of RNA derived from FFPE samples. FFPEcap-seq combines enzymatic enrichment of 5′ capped RNAs with template switching to create sequencing libraries. We find that FFPEcap-seq can faithfully capture mRNA expression levels in FFPE specimens while also detecting enhancer RNAs that arise from distal regulatory regions. FFPEcap-seq is a fast and straightforward method for making high-quality 5′ end RNA-seq libraries from FFPE-derived RNA.
Collapse
Affiliation(s)
- Jeffery M Vahrenkamp
- Department of Oncological Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah 84112, USA
| | - Kathryn Szczotka
- Department of Obstetrics and Gynecology, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah 84112, USA
| | - Mark K Dodson
- Department of Obstetrics and Gynecology, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah 84112, USA
| | - Elke A Jarboe
- Department of Pathology, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah 84112, USA
| | - Andrew P Soisson
- Department of Obstetrics and Gynecology, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah 84112, USA
| | - Jason Gertz
- Department of Oncological Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah 84112, USA
| |
Collapse
|
50
|
Thodberg M, Thieffry A, Vitting-Seerup K, Andersson R, Sandelin A. CAGEfightR: analysis of 5'-end data using R/Bioconductor. BMC Bioinformatics 2019; 20:487. [PMID: 31585526 PMCID: PMC6778389 DOI: 10.1186/s12859-019-3029-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 08/15/2019] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND 5'-end sequencing assays, and Cap Analysis of Gene Expression (CAGE) in particular, have been instrumental in studying transcriptional regulation. 5'-end methods provide genome-wide maps of transcription start sites (TSSs) with base pair resolution. Because active enhancers often feature bidirectional TSSs, such data can also be used to predict enhancer candidates. The current availability of mature and comprehensive computational tools for the analysis of 5'-end data is limited, preventing efficient analysis of new and existing 5'-end data. RESULTS We present CAGEfightR, a framework for analysis of CAGE and other 5'-end data implemented as an R/Bioconductor-package. CAGEfightR can import data from BigWig files and allows for fast and memory efficient prediction and analysis of TSSs and enhancers. Downstream analyses include quantification, normalization, annotation with transcript and gene models, TSS shape statistics, linking TSSs to enhancers via co-expression, identification of enhancer clusters, and genome-browser style visualization. While built to analyze CAGE data, we demonstrate the utility of CAGEfightR in analyzing nascent RNA 5'-data (PRO-Cap). CAGEfightR is implemented using standard Bioconductor classes, making it easy to learn, use and combine with other Bioconductor packages, for example popular differential expression tools such as limma, DESeq2 and edgeR. CONCLUSIONS CAGEfightR provides a single, scalable and easy-to-use framework for comprehensive downstream analysis of 5'-end data. CAGEfightR is designed to be interoperable with other Bioconductor packages, thereby unlocking hundreds of mature transcriptomic analysis tools for 5'-end data. CAGEfightR is freely available via Bioconductor: bioconductor.org/packages/CAGEfightR .
Collapse
Affiliation(s)
- Malte Thodberg
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark.
- Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark.
| | - Axel Thieffry
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark
- Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark
| | - Kristoffer Vitting-Seerup
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark
- Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark
- Danish Cancer Society, Strandboulevarden 49 DK2100, Copenhagen Ø, Denmark
| | - Robin Andersson
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark
| | - Albin Sandelin
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark.
- Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark.
| |
Collapse
|