1
|
Cao X, Sun S, Xing J. A Massive Proteogenomic Screen Identifies Thousands of Novel Peptides From the Human "Dark" Proteome. Mol Cell Proteomics 2024; 23:100719. [PMID: 38242438 PMCID: PMC10867589 DOI: 10.1016/j.mcpro.2024.100719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 01/01/2024] [Accepted: 01/16/2024] [Indexed: 01/21/2024] Open
Abstract
Although the human gene annotation has been continuously improved over the past 2 decades, numerous studies demonstrated the existence of a "dark proteome", consisting of proteins that were critical for biological processes but not included in widely used gene catalogs. The Genotype-Tissue Expression project generated more than 15,000 RNA-seq datasets from multiple tissues, which modeled 30 million transcripts in the human genome. To provide a resource of high-confidence novel proteins from the dark proteome, we screened 50,000 mass spectrometry runs from over 900 projects to identify proteins translated from the Genotype-Tissue Expression transcript model with proteomic support. We also integrated 3.8 million common genetic variants from the gnomAD database to improve peptide identification. As a result, we identified 170,529 novel peptides with proteomic evidence, of which 6048 passed the strictest standard we defined and were supported by PepQuery. We provided a user-friendly website (https://ncorf.genes.fun/) for researchers to check the evidence of novel peptides from their studies. The findings will improve our understanding of coding genes and facilitate genomic data interpretation in biomedical research.
Collapse
Affiliation(s)
- Xiaolong Cao
- Department of Anesthesiology, Zhujiang Hospital, Southern Medical University, Guangzhou, Guangdong, China; Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA; Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| | - Siqi Sun
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA; Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| | - Jinchuan Xing
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA; Human Genetic Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA.
| |
Collapse
|
2
|
Petri AJ, Sahlin K. isONform: reference-free transcriptome reconstruction from Oxford Nanopore data. Bioinformatics 2023; 39:i222-i231. [PMID: 37387174 PMCID: PMC10311309 DOI: 10.1093/bioinformatics/btad264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION With advances in long-read transcriptome sequencing, we can now fully sequence transcripts, which greatly improves our ability to study transcription processes. A popular long-read transcriptome sequencing technique is Oxford Nanopore Technologies (ONT), which through its cost-effective sequencing and high throughput, has the potential to characterize the transcriptome in a cell. However, due to transcript variability and sequencing errors, long cDNA reads need substantial bioinformatic processing to produce a set of isoform predictions from the reads. Several genome and annotation-based methods exist to produce transcript predictions. However, such methods require high-quality genomes and annotations and are limited by the accuracy of long-read splice aligners. In addition, gene families with high heterogeneity may not be well represented by a reference genome and would benefit from reference-free analysis. Reference-free methods to predict transcripts from ONT, such as RATTLE, exist, but their sensitivity is not comparable to reference-based approaches. RESULTS We present isONform, a high-sensitivity algorithm to construct isoforms from ONT cDNA sequencing data. The algorithm is based on iterative bubble popping on gene graphs built from fuzzy seeds from the reads. Using simulated, synthetic, and biological ONT cDNA data, we show that isONform has substantially higher sensitivity than RATTLE albeit with some loss in precision. On biological data, we show that isONform's predictions have substantially higher consistency with the annotation-based method StringTie2 compared with RATTLE. We believe isONform can be used both for isoform construction for organisms without well-annotated genomes and as an orthogonal method to verify predictions of reference-based methods. AVAILABILITY AND IMPLEMENTATION https://github.com/aljpetri/isONform.
Collapse
Affiliation(s)
- Alexander J Petri
- Department of Mathematics, Science for Life Laboratory, Stockholm University, Stockholm 106 91, Sweden
| | - Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, Stockholm 106 91, Sweden
| |
Collapse
|
3
|
Oreper D, Klaeger S, Jhunjhunwala S, Delamarre L. The peptide woods are lovely, dark and deep: Hunting for novel cancer antigens. Semin Immunol 2023; 67:101758. [PMID: 37027981 DOI: 10.1016/j.smim.2023.101758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 03/22/2023] [Accepted: 03/22/2023] [Indexed: 04/08/2023]
Abstract
Harnessing the patient's immune system to control a tumor is a proven avenue for cancer therapy. T cell therapies as well as therapeutic vaccines, which target specific antigens of interest, are being explored as treatments in conjunction with immune checkpoint blockade. For these therapies, selecting the best suited antigens is crucial. Most of the focus has thus far been on neoantigens that arise from tumor-specific somatic mutations. Although there is clear evidence that T-cell responses against mutated neoantigens are protective, the large majority of these mutations are not immunogenic. In addition, most somatic mutations are unique to each individual patient and their targeting requires the development of individualized approaches. Therefore, novel antigen types are needed to broaden the scope of such treatments. We review high throughput approaches for discovering novel tumor antigens and some of the key challenges associated with their detection, and discuss considerations when selecting tumor antigens to target in the clinic.
Collapse
Affiliation(s)
- Daniel Oreper
- Genentech, 1 DNA way, South San Francisco, 94080 CA, USA.
| | - Susan Klaeger
- Genentech, 1 DNA way, South San Francisco, 94080 CA, USA.
| | | | | |
Collapse
|
4
|
Prjibelski AD, Mikheenko A, Joglekar A, Smetanin A, Jarroux J, Lapidus AL, Tilgner HU. Accurate isoform discovery with IsoQuant using long reads. Nat Biotechnol 2023:10.1038/s41587-022-01565-y. [PMID: 36593406 DOI: 10.1038/s41587-022-01565-y] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 10/13/2022] [Indexed: 01/04/2023]
Abstract
Annotating newly sequenced genomes and determining alternative isoforms from long-read RNA data are complex and incompletely solved problems. Here we present IsoQuant-a computational tool using intron graphs that accurately reconstructs transcripts both with and without reference genome annotation. For novel transcript discovery, IsoQuant reduces the false-positive rate fivefold and 2.5-fold for Oxford Nanopore reference-based or reference-free mode, respectively. IsoQuant also improves performance for Pacific Biosciences data.
Collapse
Affiliation(s)
- Andrey D Prjibelski
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia. .,Department of Computer Science, University of Helsinki, Helsinki, Finland.
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Anoushka Joglekar
- Tri-Institutional Computational Biology and Medicine, Weill Cornell Medicine, New York, NY, USA.,Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.,Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | | | - Julien Jarroux
- Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.,Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Alla L Lapidus
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Hagen U Tilgner
- Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA. .,Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
5
|
Mikheenko A, Prjibelski AD, Joglekar A, Tilgner HU. Sequencing of individual barcoded cDNAs using Pacific Biosciences and Oxford Nanopore technologies reveals platform-specific error patterns. Genome Res 2022; 32:726-737. [PMID: 35301264 PMCID: PMC8997348 DOI: 10.1101/gr.276405.121] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 03/05/2022] [Indexed: 12/04/2022]
Abstract
Long-read transcriptomics require understanding error sources inherent to technologies. Current approaches cannot compare methods for an individual RNA molecule. Here, we present a novel platform-comparison method that combines barcoding strategies and long-read sequencing to sequence cDNA copies representing an individual RNA molecule on both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT). We compare these long-read pairs in terms of sequence content and isoform patterns. Although individual read pairs show high similarity, we find differences in (1) aligned length, (2) transcription start site (TSS), (3) polyadenylation site (poly(A)-site) assignment, and (4) exon–intron structures. Overall, 25% of read pairs disagree on either TSS, poly(A)-site, or splice site. Intron-chain disagreement typically arises from alignment errors of microexons and complicated splice sites. Our single-molecule technology comparison reveals that inconsistencies are often caused by sequencing error–induced inaccurate ONT alignments, especially to downstream GUNNGU donor motifs. However, annotation-disagreeing upstream shifts in NAGNAG acceptors in ONT are often confirmed by PacBio and are thus likely real. In both barcoded and nonbarcoded ONT reads, we find that intron number and proximity of GU/AGs better predict inconsistencies with the annotation than read quality alone. We summarize these findings in an annotation-based algorithm for spliced alignment correction that improves subsequent transcript construction with ONT reads.
Collapse
|
6
|
Zhang Q, Shi Q, Shao M. Accurate assembly of multi-end RNA-seq data with Scallop2. NATURE COMPUTATIONAL SCIENCE 2022; 2:148-152. [PMID: 36713932 PMCID: PMC9879047 DOI: 10.1038/s43588-022-00216-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 02/16/2022] [Indexed: 02/02/2023]
Abstract
Modern RNA-sequencing protocols can produce multi-end data, where multiple reads originating from the same transcript are attached to the same barcode. The long-range information in the multi-end reads is beneficial in phasing complicated spliced isoforms, but assembly algorithms that leverage such information are lacking. Here we introduce Scallop2, a reference-based assembler optimized for multi-end RNA-seq data. The algorithmic core of Scallop2 consists of three steps: (1) using an algorithm to "bridge" multi-end reads into single-end phasing paths in the context of a splice graph, (2) employing a method to refine erroneous splice graphs by utilizing multi-end reads that fail to bridge, and (3) piping the refined splice graph and the bridged phasing paths into an algorithm that integrates multiple phase-preserving decompositions. Tested on 561 cells in two Smart-seq3 datasets and on 10 Illumina paired-end RNA-seq samples, Scallop2 substantially improves the assembly accuracy compared to two popular assemblers StringTie2 and Scallop.
Collapse
Affiliation(s)
- Qimin Zhang
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University
| | - Qian Shi
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University
| | - Mingfu Shao
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University
- Huck Institutes of the Life Sciences, The Pennsylvania State University
| |
Collapse
|
7
|
Sashittal P, Zhang C, Peng J, El-Kebir M. Jumper enables discontinuous transcript assembly in coronaviruses. Nat Commun 2021; 12:6728. [PMID: 34795232 PMCID: PMC8602663 DOI: 10.1038/s41467-021-26944-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 10/20/2021] [Indexed: 11/17/2022] Open
Abstract
Genes in SARS-CoV-2 and other viruses in the order of Nidovirales are expressed by a process of discontinuous transcription which is distinct from alternative splicing in eukaryotes and is mediated by the viral RNA-dependent RNA polymerase. Here, we introduce the DISCONTINUOUS TRANSCRIPT ASSEMBLYproblem of finding transcripts and their abundances given an alignment of paired-end short reads under a maximum likelihood model that accounts for varying transcript lengths. We show, using simulations, that our method, JUMPER, outperforms existing methods for classical transcript assembly. On short-read data of SARS-CoV-1, SARS-CoV-2 and MERS-CoV samples, we find that JUMPER not only identifies canonical transcripts that are part of the reference transcriptome, but also predicts expression of non-canonical transcripts that are supported by subsequent orthogonal analyses. Moreover, application of JUMPER on samples with and without treatment reveals viral drug response at the transcript level. As such, JUMPER enables detailed analyses of Nidovirales transcriptomes under varying conditions.
Collapse
Affiliation(s)
- Palash Sashittal
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Chuanyi Zhang
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Jian Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- College of Medicine, University of ILlinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Mohammed El-Kebir
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| |
Collapse
|
8
|
Krappinger JC, Bonstingl L, Pansy K, Sallinger K, Wreglesworth NI, Grinninger L, Deutsch A, El-Heliebi A, Kroneis T, Mcfarlane RJ, Sensen CW, Feichtinger J. Non-coding Natural Antisense Transcripts: Analysis and Application. J Biotechnol 2021; 340:75-101. [PMID: 34371054 DOI: 10.1016/j.jbiotec.2021.08.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 06/30/2021] [Accepted: 08/04/2021] [Indexed: 12/12/2022]
Abstract
Non-coding natural antisense transcripts (ncNATs) are regulatory RNA sequences that are transcribed in the opposite direction to protein-coding or non-coding transcripts. These transcripts are implicated in a broad variety of biological and pathological processes, including tumorigenesis and oncogenic progression. With this complex field still in its infancy, annotations, expression profiling and functional characterisations of ncNATs are far less comprehensive than those for protein-coding genes, pointing out substantial gaps in the analysis and characterisation of these regulatory transcripts. In this review, we discuss ncNATs from an analysis perspective, in particular regarding the use of high-throughput sequencing strategies, such as RNA-sequencing, and summarize the unique challenges of investigating the antisense transcriptome. Finally, we elaborate on their potential as biomarkers and future targets for treatment, focusing on cancer.
Collapse
Affiliation(s)
- Julian C Krappinger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Christian Doppler Laboratory for innovative Pichia pastoris host and vector systems, Division of Cell Biology, Histology and Embryology, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria
| | - Lilli Bonstingl
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Center for Biomarker Research in Medicine, Stiftingtalstraße 5, 8010 Graz, Austria
| | - Katrin Pansy
- Division of Haematology, Medical University of Graz, Stiftingtalstrasse 24, 8010 Graz, Austria
| | - Katja Sallinger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Center for Biomarker Research in Medicine, Stiftingtalstraße 5, 8010 Graz, Austria
| | - Nick I Wreglesworth
- North West Cancer Research Institute, School of Medical Sciences, Bangor University, LL57 2UW Bangor, United Kingdom
| | - Lukas Grinninger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Austrian Biotech University of Applied Sciences, Konrad Lorenz-Straße 10, 3430 Tulln an der Donau, Austria
| | - Alexander Deutsch
- Division of Haematology, Medical University of Graz, Stiftingtalstrasse 24, 8010 Graz, Austria; BioTechMed-Graz, Mozartgasse 12/II, 8010 Graz, Austria
| | - Amin El-Heliebi
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Center for Biomarker Research in Medicine, Stiftingtalstraße 5, 8010 Graz, Austria
| | - Thomas Kroneis
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Center for Biomarker Research in Medicine, Stiftingtalstraße 5, 8010 Graz, Austria
| | - Ramsay J Mcfarlane
- North West Cancer Research Institute, School of Medical Sciences, Bangor University, LL57 2UW Bangor, United Kingdom
| | - Christoph W Sensen
- BioTechMed-Graz, Mozartgasse 12/II, 8010 Graz, Austria; Institute of Computational Biotechnology, Graz University of Technology, Petersgasse 14/V, 8010 Graz, Austria; HCEMM Kft., Római blvd. 21, 6723 Szeged, Hungary
| | - Julia Feichtinger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Christian Doppler Laboratory for innovative Pichia pastoris host and vector systems, Division of Cell Biology, Histology and Embryology, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; BioTechMed-Graz, Mozartgasse 12/II, 8010 Graz, Austria.
| |
Collapse
|
9
|
Gatter T, Stadler PF. Ryūtō: Improved multi-sample transcript assembly for differential transcript expression analysis and more. Bioinformatics 2021; 37:4307-4313. [PMID: 34255826 DOI: 10.1093/bioinformatics/btab494] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 06/21/2021] [Accepted: 07/01/2021] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Accurate assembly of RNA-seq is a crucial step in many analytic tasks such as gene annotation or expression studies. Despite ongoing research, progress on traditional single sample assembly has brought no major breakthrough. Multi-sample RNA-Seq experiments provide more information than single sample datasets and thus constitute a promising area of research. Yet, this advantage is challenging to utilize due to the large amount of accumulating errors. RESULTS We present an extension to Ryūtō enabling the reconstruction of consensus transcriptomes from multiple RNA-seq data sets, incorporating consensus calling at low level features. We report stable improvements already at 3 replicates. Ryūtō outperforms competing approaches, providing a better and user-adjustable sensitivity-precision trade-off. Ryūtō's unique ability to utilize a (incomplete) reference for multi sample assemblies greatly increases precision. We demonstrate benefits for differential expression analysis. CONCLUSION Ryūtō consistently improves assembly on replicates of the same tissue independent of filter settings, even when mixing conditions or time series. Consensus voting in Ryūtō is especially effective at high precision assembly, while Ryūtō's conventional mode can reach higher recall. AVAILABILITY Ryūtō is available at https://github.com/studla/RYUTO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thomas Gatter
- Bioinformatics Group, Department of Computer Science & Interdisciplinary Center for Bioinformatics, Universität Leipzig, D-04107 Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science & Interdisciplinary Center for Bioinformatics, Universität Leipzig, D-04107 Leipzig, Germany
- Discrete Biomath Group, Max Planck Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, A-1090 Wien, Austria
- Santa Fe Institute, Santa Fe, NM 87501, USA
| |
Collapse
|
10
|
Patro R, Salmela L. Algorithms meet sequencing technologies - 10th edition of the RECOMB-Seq workshop. iScience 2021; 24:101956. [PMID: 33437938 PMCID: PMC7788091 DOI: 10.1016/j.isci.2020.101956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
DNA and RNA sequencing is a core technology in biological and medical research. The high throughput of these technologies and the consistent development of new experimental assays and biotechnologies demand the continuous development of methods to analyze the resulting data. The RECOMB Satellite Workshop on Massively Parallel Sequencing brings together leading researchers in computational genomics to discuss emerging frontiers in algorithm development for massively parallel sequencing data. The 10th meeting in this series, RECOMB-Seq 2020, was scheduled to be held in Padua, Italy, but due to the ongoing COVID-19 pandemic, the meeting was carried out virtually instead. The online workshop featured keynote talks by Paola Bonizzoni and Zamin Iqbal, two highlight talks, ten regular talks, and three short talks. Seven of the works presented in the workshop are featured in this edition of iScience, and many of the talks are available online in the RECOMB-Seq 2020 YouTube channel.
Collapse
Affiliation(s)
- Rob Patro
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, MD, USA
| | - Leena Salmela
- Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| |
Collapse
|
11
|
Schaarschmidt S, Fischer A, Lawas LMF, Alam R, Septiningsih EM, Bailey-Serres J, Jagadish SVK, Huettel B, Hincha DK, Zuther E. Utilizing PacBio Iso-Seq for Novel Transcript and Gene Discovery of Abiotic Stress Responses in Oryza sativa L. Int J Mol Sci 2020; 21:ijms21218148. [PMID: 33142722 PMCID: PMC7663775 DOI: 10.3390/ijms21218148] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 10/20/2020] [Accepted: 10/30/2020] [Indexed: 01/05/2023] Open
Abstract
The wide natural variation present in rice is an important source of genes to facilitate stress tolerance breeding. However, identification of candidate genes from RNA-Seq studies is hampered by the lack of high-quality genome assemblies for the most stress tolerant cultivars. A more targeted solution is the reconstruction of transcriptomes to provide templates to map RNA-seq reads. Here, we sequenced transcriptomes of ten rice cultivars of three subspecies on the PacBio Sequel platform. RNA was isolated from different organs of plants grown under control and abiotic stress conditions in different environments. Reconstructed de novo reference transcriptomes resulted in 37,500 to 54,600 plant-specific high-quality isoforms per cultivar. Isoforms were collapsed to reduce sequence redundancy and evaluated, e.g., for protein completeness (BUSCO). About 40% of all identified transcripts were novel isoforms compared to the Nipponbare reference transcriptome. For the drought/heat tolerant aus cultivar N22, 56 differentially expressed genes in developing seeds were identified at combined heat and drought in the field. The newly generated rice transcriptomes are useful to identify candidate genes for stress tolerance breeding not present in the reference transcriptomes/genomes. In addition, our approach provides a cost-effective alternative to genome sequencing for identification of candidate genes in highly stress tolerant genotypes.
Collapse
Affiliation(s)
- Stephanie Schaarschmidt
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam, Germany; (A.F.); (L.M.F.L.); (D.K.H.)
- Correspondence: (S.S.); (E.Z.)
| | - Axel Fischer
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam, Germany; (A.F.); (L.M.F.L.); (D.K.H.)
| | - Lovely Mae F. Lawas
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam, Germany; (A.F.); (L.M.F.L.); (D.K.H.)
- Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| | - Rejbana Alam
- Center for Plant Cell Biology, Department of Botany and Plant Sciences, University of California Riverside, Riverside, CA 92521, USA; (R.A.); (J.B.-S.)
| | - Endang M. Septiningsih
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA;
| | - Julia Bailey-Serres
- Center for Plant Cell Biology, Department of Botany and Plant Sciences, University of California Riverside, Riverside, CA 92521, USA; (R.A.); (J.B.-S.)
| | - S. V. Krishna Jagadish
- International Rice Research Institute, DAPO Box 7777, Metro Manila 1301, Philippines;
- Department of Agronomy, Kansas State University, Manhattan, KS 66506, USA
| | - Bruno Huettel
- Max Planck Genome Centre Cologne, Carl-von-Linné-Weg 10, 50829 Cologne, Germany;
| | - Dirk K. Hincha
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam, Germany; (A.F.); (L.M.F.L.); (D.K.H.)
| | - Ellen Zuther
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam, Germany; (A.F.); (L.M.F.L.); (D.K.H.)
- Correspondence: (S.S.); (E.Z.)
| |
Collapse
|
12
|
Grabski DF, Broseus L, Kumari B, Rekosh D, Hammarskjold ML, Ritchie W. Intron retention and its impact on gene expression and protein diversity: A review and a practical guide. WILEY INTERDISCIPLINARY REVIEWS-RNA 2020; 12:e1631. [PMID: 33073477 DOI: 10.1002/wrna.1631] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 09/16/2020] [Accepted: 09/23/2020] [Indexed: 12/12/2022]
Abstract
Intron retention (IR) occurs when a complete and unspliced intron remains in mature mRNA. An increasing body of literature has demonstrated a major role for IR in numerous biological functions, including several that impact human health and disease. Although experimental technologies used to study other forms of mRNA splicing can also be used to investigate IR, a specialized downstream computational analysis is optimal for IR discovery and analysis. Here we provide a review of IR and its biological implications, as well as a practical guide for how to detect and analyze it. Several methods, including long read third generation direct RNA sequencing, are described. We have developed an R package, FakIR, to facilitate the execution of the bioinformatic tasks recommended in this review and a tutorial on how to fit them to users aims. Additionally, we provide guidelines and experimental protocols to validate IR discovery and to evaluate the potential impact of IR on gene expression and protein output. This article is categorized under: RNA Evolution and Genomics > Computational Analyses of RNA RNA Processing > Splicing Regulation/Alternative Splicing RNA Methods > RNA Analyses in vitro and In Silico.
Collapse
Affiliation(s)
- David F Grabski
- Department of Molecular Physiology and Biological Physics, University of Virginia School of Medicine, Charlottesville, Virginia, USA.,Myles H. Thaler Center for AIDS and Human Retrovirus Research, University of Virginia, Charlottesville, Virginia, USA
| | - Lucile Broseus
- IGH, Centre National de la Recherche Scientifique, University of Montpellier, Montpellier, France
| | - Bandana Kumari
- IGH, Centre National de la Recherche Scientifique, University of Montpellier, Montpellier, France
| | - David Rekosh
- Myles H. Thaler Center for AIDS and Human Retrovirus Research, University of Virginia, Charlottesville, Virginia, USA.,Department of Microbiology, Immunology and Cancer Biology, University of Virginia School of Medicine, Charlottesville, Virginia, USA
| | - Marie-Louise Hammarskjold
- Myles H. Thaler Center for AIDS and Human Retrovirus Research, University of Virginia, Charlottesville, Virginia, USA.,Department of Microbiology, Immunology and Cancer Biology, University of Virginia School of Medicine, Charlottesville, Virginia, USA
| | - William Ritchie
- IGH, Centre National de la Recherche Scientifique, University of Montpellier, Montpellier, France
| |
Collapse
|