1
|
Chen Q, Zhang Y, Rong J, Chen C, Wang S, Wang J, Li Z, Hou Z, Liu D, Tao J, Xu J. MicroRNA expression profile of chicken liver at different times after Histomonas meleagridis infection. Vet Parasitol 2024; 329:110200. [PMID: 38744230 DOI: 10.1016/j.vetpar.2024.110200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 05/05/2024] [Accepted: 05/08/2024] [Indexed: 05/16/2024]
Abstract
Histomonas meleagridis, an anaerobic intercellular parasite, is known to infect gallinaceous birds, particularly turkeys and chickens. The resurgence of histomonosis in recent times has resulted in significant financial setbacks due to the prohibition of drugs used for disease treatment. Currently, research on about H. meleagridis primarily concentrate on the examination of its virulence, gene expression analysis, and the innate immunity response of the host organism. However, there is a lack of research on differentially expressed miRNAs (DEMs) related to liver infection induced by H. meleagridis. In this study, the weight gain and pathological changes at various post-infection time points were evaluated through animal experiments to determine the peak and early stages of infection. Next, High-throughput sequencing was used to examine the expression profile of liver miRNA at 10 and 15 days post-infection (DPI) in chickens infected with the Chinese JSYZ-F strain of H. meleagridis. A comparison with uninfected controls revealed the presence of 120 and 118 DEMs in the liver of infected chickens at 10 DPI and 15 DPI, respectively, with 74 DEMs being shared between the two time points. Differentially expressed microRNAs (DEMs) were categorized into three groups based on the time post-infection. The first group (L1) includes 45 miRNAs that were differentially expressed only at 10 DPI and were predicted to target 1646 genes. The second group (L2) includes 43 miRNAs that were differentially expressed only at 15 DPI and were predicted to target 2257 genes. The third group (L3) includes 75 miRNAs that were differentially expressed at both 10 DPI and 15 DPI and were predicted to target 1623 genes. At L1, L2, and L3, there were 89, 87, and 41 significantly enriched Gene Ontology (GO) terms, respectively (p<0.05). The analysis of differentially expressed miRNA target genes using KEGG pathways revealed significant enrichment at L1, L2, and L3, with 3, 4, and 5 pathways identified, respectively (p<0.05). This article suggests that the expression of liver miRNA undergoes dynamic alterations due to H. meleagridis and the host. It showed that the expression pattern of L1 class DEMs was more conducive to regulating the development of the inflammatory response, while the L2 class DEMs were more conducive to augmenting the inflammatory response. The observed patterns of miRNA expression associated with inflammation were in line with the liver's inflammatory process following infection. The results of this study provide a basis for conducting a comprehensive analysis of the pathogenic mechanism of H. meleagridis from the perspective of host miRNAs.
Collapse
Affiliation(s)
- Qiaoguang Chen
- College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou 225009, China
| | - Yuming Zhang
- College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou 225009, China; Animal Husbandry and Veterinary Station of Daxindian, Penglai District, Yantai 265600, China
| | - Jie Rong
- College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou 225009, China
| | - Chen Chen
- College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou 225009, China
| | - Shuang Wang
- College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou 225009, China
| | - Jiege Wang
- College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou 225009, China
| | - Zaifan Li
- College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou 225009, China
| | - Zhaofeng Hou
- College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou 225009, China
| | - Dandan Liu
- College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou 225009, China
| | - Jianping Tao
- College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou 225009, China
| | - Jinjun Xu
- College of Veterinary Medicine, Yangzhou University, Yangzhou 225009, China; Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou 225009, China.
| |
Collapse
|
2
|
Barra J, Taverna F, Bong F, Ahmed I, Karakach TK. Error modelled gene expression analysis (EMOGEA) provides a superior overview of time course RNA-seq measurements and low count gene expression. Brief Bioinform 2024; 25:bbae233. [PMID: 38770716 PMCID: PMC11106635 DOI: 10.1093/bib/bbae233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 04/03/2024] [Accepted: 04/30/2024] [Indexed: 05/22/2024] Open
Abstract
Temporal RNA-sequencing (RNA-seq) studies of bulk samples provide an opportunity for improved understanding of gene regulation during dynamic phenomena such as development, tumor progression or response to an incremental dose of a pharmacotherapeutic. Moreover, single-cell RNA-seq (scRNA-seq) data implicitly exhibit temporal characteristics because gene expression values recapitulate dynamic processes such as cellular transitions. Unfortunately, temporal RNA-seq data continue to be analyzed by methods that ignore this ordinal structure and yield results that are often difficult to interpret. Here, we present Error Modelled Gene Expression Analysis (EMOGEA), a framework for analyzing RNA-seq data that incorporates measurement uncertainty, while introducing a special formulation for those acquired to monitor dynamic phenomena. This method is specifically suited for RNA-seq studies in which low-count transcripts with small-fold changes lead to significant biological effects. Such transcripts include genes involved in signaling and non-coding RNAs that inherently exhibit low levels of expression. Using simulation studies, we show that this framework down-weights samples that exhibit extreme responses such as batch effects allowing them to be modeled with the rest of the samples and maintain the degrees of freedom originally envisioned for a study. Using temporal experimental data, we demonstrate the framework by extracting a cascade of gene expression waves from a well-designed RNA-seq study of zebrafish embryogenesis and an scRNA-seq study of mouse pre-implantation and provide unique biological insights into the regulation of genes in each wave. For non-ordinal measurements, we show that EMOGEA has a much higher rate of true positive calls and a vanishingly small rate of false negative discoveries compared to common approaches. Finally, we provide two packages in Python and R that are self-contained and easy to use, including test data.
Collapse
Affiliation(s)
- Jasmine Barra
- Laboratory of Integrative Multi-Omics Research, Department of Pharmacology, Dalhousie University, 5850 College Street, Halifax, NS, B3H 4R2, Canada
- Beatrice Hunter Cancer Research Institute, 5743 University Avenue, Suite 98, Halifax, NS, B3H 0A2, Canada
- Department of Microbiology & Immunology, Dalhousie University, 5850 College Street, Halifax, NS, B3H 4R2, Canada
| | - Federico Taverna
- Laboratory of Integrative Multi-Omics Research, Department of Pharmacology, Dalhousie University, 5850 College Street, Halifax, NS, B3H 4R2, Canada
- Beatrice Hunter Cancer Research Institute, 5743 University Avenue, Suite 98, Halifax, NS, B3H 0A2, Canada
| | - Fabian Bong
- Laboratory of Integrative Multi-Omics Research, Department of Pharmacology, Dalhousie University, 5850 College Street, Halifax, NS, B3H 4R2, Canada
- Beatrice Hunter Cancer Research Institute, 5743 University Avenue, Suite 98, Halifax, NS, B3H 0A2, Canada
| | - Ibrahim Ahmed
- Laboratory of Integrative Multi-Omics Research, Department of Pharmacology, Dalhousie University, 5850 College Street, Halifax, NS, B3H 4R2, Canada
- Beatrice Hunter Cancer Research Institute, 5743 University Avenue, Suite 98, Halifax, NS, B3H 0A2, Canada
| | - Tobias K Karakach
- Laboratory of Integrative Multi-Omics Research, Department of Pharmacology, Dalhousie University, 5850 College Street, Halifax, NS, B3H 4R2, Canada
- Beatrice Hunter Cancer Research Institute, 5743 University Avenue, Suite 98, Halifax, NS, B3H 0A2, Canada
| |
Collapse
|
3
|
Li X, Xu L, Demaree B, Noecker C, Bisanz JE, Weisgerber DW, Modavi C, Turnbaugh PJ, Abate AR. Microbiome single cell atlases generated with a commercial instrument. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.08.551713. [PMID: 37609281 PMCID: PMC10441329 DOI: 10.1101/2023.08.08.551713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Single cell sequencing is useful for resolving complex systems into their composite cell types and computationally mining them for unique features that are masked in pooled sequencing. However, while commercial instruments have made single cell analysis widespread for mammalian cells, analogous tools for microbes are limited. Here, we present EASi-seq (Easily Accessible Single microbe sequencing). By adapting the single cell workflow of the commercial Mission Bio Tapestri instrument, this method allows for efficient sequencing of individual microbes' genomes. EASi-seq allows thousands of microbes to be sequenced per run and, as we show, can generate detailed atlases of human and environmental microbiomes. The ability to capture large shotgun genome datasets from thousands of single microbes provides new opportunities in discovering and analyzing species subpopulations. To facilitate this, we develop a companion bioinformatic pipeline that clusters microbes by similarity, improving whole genome assembly, strain identification, taxonomic classification, and gene annotation. In addition, we demonstrate integration of metagenomic contigs with the EASi-seq datasets to reduce capture bias and increase coverage. Overall, EASi-seq enables high quality single cell genomic data for microbiome samples using an accessible workflow that can be run on a commercially available platform.
Collapse
|
4
|
Jousheghani ZZ, Patro R. Oarfish: Enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.28.582591. [PMID: 38464200 PMCID: PMC10925290 DOI: 10.1101/2024.02.28.582591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Motivation Long read sequencing technology is becoming an increasingly indispensable tool in genomic and transcriptomic analysis. In transcriptomics in particular, long reads offer the possibility of sequencing full-length isoforms, which can vastly simplify the identification of novel transcripts and transcript quantification. However, despite this promise, the focus of much long read method development to date has been on transcript identification, with comparatively little attention paid to quantification. Yet, due to differences in the underlying protocols and technologies, lower throughput (i.e. fewer reads sequenced per sample compared to short read technologies), as well as technical artifacts, long read quantification remains a challenge, motivating the continued development and assessment of quantification methods tailored to this increasingly prevalent type of data. Results We introduce a new method and software tool for long read transcript quantification called oarfish. Our model incorporates a novel and innovative coverage score, which affects the conditional probability of fragment assignment in the underlying probabilistic model. We demonstrate that by accounting for this coverage information, oarfish is able to produce more accurate quantification estimates than existing long read quantification methods, particularly when one considers the primary isoforms present in a particular cell line or tissue type. Availability and Implementation Oarfish is implemented in the Rust programming language, and is made available as free and open-source software under the BSD 3-clause license. The source code is available at https://www.github.com/COMBINE-lab/oarfish.
Collapse
Affiliation(s)
- Zahra Zare Jousheghani
- Department of Electrical and Computer Engineering, University of Maryland, College Park, 20742, Maryland, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, 20742, Maryland, USA
| |
Collapse
|
5
|
Abate A, Li X, Xu L, Demaree B, Noecker C, Bisanz J, Weisgerber D, Modavi C, Turnbaugh P. Microbiome single cell atlases generated with a commercial instrument. RESEARCH SQUARE 2023:rs.3.rs-3253785. [PMID: 37790580 PMCID: PMC10543498 DOI: 10.21203/rs.3.rs-3253785/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Single cell sequencing is useful for resolving complex systems into their composite cell types and computationally mining them for unique features that are masked in pooled sequencing. However, while commercial instruments have made single cell analysis widespread for mammalian cells, analogous tools for microbes are limited. Here, we present EASi-seq (Easily Accessible Single microbe sequencing). By adapting the single cell workflow of the commercial Mission Bio Tapestri instrument, this method allows for efficient sequencing of individual microbes' genomes. EASi-seq allows thousands of microbes to be sequenced per run and, as we show, can generate detailed atlases of human and environmental microbiomes. The ability to capture large shotgun genome datasets from thousands of single microbes provides new opportunities in discovering and analyzing species subpopulations. To facilitate this, we develop a companion bioinformatic pipeline that clusters microbes by similarity, improving whole genome assembly, strain identification, taxonomic classification, and gene annotation. In addition, we demonstrate integration of metagenomic contigs with the EASi-seq datasets to reduce capture bias and increase coverage. Overall, EASi-seq enables high quality single cell genomic data for microbiome samples using an accessible workflow that can be run on a commercially available platform.
Collapse
|
6
|
Singh NP, Love MI, Patro R. TreeTerminus -creating transcript trees using inferential replicate counts. iScience 2023; 26:106961. [PMID: 37378336 PMCID: PMC10291472 DOI: 10.1016/j.isci.2023.106961] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 04/18/2023] [Accepted: 05/22/2023] [Indexed: 06/29/2023] Open
Abstract
A certain degree of uncertainty is always associated with the transcript abundance estimates. The uncertainty may make many downstream analyses, such as differential testing, difficult for certain transcripts. Conversely, gene-level analysis, though less ambiguous, is often too coarse-grained. We introduce TreeTerminus, a data-driven approach for grouping transcripts into a tree structure where leaves represent individual transcripts and internal nodes represent an aggregation of a transcript set. TreeTerminus constructs trees such that, on average, the inferential uncertainty decreases as we ascend the tree topology. The tree provides the flexibility to analyze data at nodes that are at different levels of resolution in the tree and can be tuned depending on the analysis of interest. We evaluated TreeTerminus on two simulated and two experimental datasets and observed an improved performance compared to transcripts (leaves) and other methods under several different metrics.
Collapse
Affiliation(s)
- Noor Pratap Singh
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD, USA
| |
Collapse
|
7
|
Deshpande D, Chhugani K, Chang Y, Karlsberg A, Loeffler C, Zhang J, Muszyńska A, Munteanu V, Yang H, Rotman J, Tao L, Balliu B, Tseng E, Eskin E, Zhao F, Mohammadi P, P. Łabaj P, Mangul S. RNA-seq data science: From raw data to effective interpretation. Front Genet 2023; 14:997383. [PMID: 36999049 PMCID: PMC10043755 DOI: 10.3389/fgene.2023.997383] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 02/24/2023] [Indexed: 03/14/2023] Open
Abstract
RNA sequencing (RNA-seq) has become an exemplary technology in modern biology and clinical science. Its immense popularity is due in large part to the continuous efforts of the bioinformatics community to develop accurate and scalable computational tools to analyze the enormous amounts of transcriptomic data that it produces. RNA-seq analysis enables genes and their corresponding transcripts to be probed for a variety of purposes, such as detecting novel exons or whole transcripts, assessing expression of genes and alternative transcripts, and studying alternative splicing structure. It can be a challenge, however, to obtain meaningful biological signals from raw RNA-seq data because of the enormous scale of the data as well as the inherent limitations of different sequencing technologies, such as amplification bias or biases of library preparation. The need to overcome these technical challenges has pushed the rapid development of novel computational tools, which have evolved and diversified in accordance with technological advancements, leading to the current myriad of RNA-seq tools. These tools, combined with the diverse computational skill sets of biomedical researchers, help to unlock the full potential of RNA-seq. The purpose of this review is to explain basic concepts in the computational analysis of RNA-seq data and define discipline-specific jargon.
Collapse
Affiliation(s)
- Dhrithi Deshpande
- Department of Pharmacology and Pharmaceutical Sciences, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Los Angeles, CA, United States
| | - Karishma Chhugani
- Department of Pharmacology and Pharmaceutical Sciences, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Los Angeles, CA, United States
| | - Yutong Chang
- Department of Pharmacology and Pharmaceutical Sciences, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Los Angeles, CA, United States
| | - Aaron Karlsberg
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Los Angeles, CA, United States
| | - Caitlin Loeffler
- Department of Computer Science, University of California, Los Angeles, CA, United States
| | - Jinyang Zhang
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
| | - Agata Muszyńska
- Małopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
- Institute of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
| | - Viorel Munteanu
- Department of Computers, Informatics and Microelectronics, Technical University of Moldova, Chisinau, Moldova
| | - Harry Yang
- Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, CA, United States
| | - Jeremy Rotman
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Los Angeles, CA, United States
| | - Laura Tao
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, CHS, Los Angeles, CA, United States
| | - Brunilda Balliu
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, CHS, Los Angeles, CA, United States
| | | | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, CA, United States
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, CHS, Los Angeles, CA, United States
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, United States
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China
| | - Pejman Mohammadi
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, United States
| | - Paweł P. Łabaj
- Małopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
- Department of Biotechnology, Boku University Vienna, Vienna, Austria
| | - Serghei Mangul
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Los Angeles, CA, United States
- Department of Quantitative and Computational Biology, USC Dornsife College of Letters, Arts and Sciences, Los Angeles, CA, United States
- *Correspondence: Serghei Mangul,
| |
Collapse
|
8
|
Hu Y, Gouru A, Wang K. DELongSeq for efficient detection of differential isoform expression from long-read RNA-seq data. NAR Genom Bioinform 2023; 5:lqad019. [PMID: 36879902 PMCID: PMC9985341 DOI: 10.1093/nargab/lqad019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 01/12/2023] [Accepted: 02/16/2023] [Indexed: 03/07/2023] Open
Abstract
Conventional gene expression quantification approaches, such as microarrays or quantitative PCR, have similar variations of estimates for all genes. However, next-generation short-read or long-read sequencing use read counts to estimate expression levels with much wider dynamic ranges. In addition to the accuracy of estimated isoform expression, efficiency, which measures the degree of estimation uncertainty, is also an important factor for downstream analysis. Instead of read count, we present DELongSeq, which employs information matrix of EM algorithm to quantify uncertainty of isoform expression estimates to improve estimation efficiency. DELongSeq uses random-effect regression model for the analysis of DE isoform, in that within-study variation represents variable precision in isoform expression estimation and between-study variation represents variation in isoform expression levels across samples. More importantly, DELongSeq allows 1 case versus 1 control comparison of differential expression, which has specific application scenarios in precision medicine (such as before versus after treatment, or tumor versus stromal tissues). Through extensive simulations and analysis of several RNA-Seq datasets, we show that the uncertainty quantification approach is computationally reliable, and can improve the power of differential expression (DE) analysis of isoforms or genes. In summary, DELongSeq allows for efficient detection of differential isoform/gene expression from long-read RNA-Seq data.
Collapse
Affiliation(s)
- Yu Hu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Anagha Gouru
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
9
|
Silicone Breast Implant Surface Texture Impacts Gene Expression in Periprosthetic Fibrous Capsules. Plast Reconstr Surg 2023; 151:85-95. [PMID: 36205692 DOI: 10.1097/prs.0000000000009800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
BACKGROUND Silicone breast implants with smooth outer shells are associated with higher rates of capsular contracture, whereas textured implants have been linked to the development of breast implant-associated anaplastic large cell lymphoma. By assessing the gene expression profile of fibrous capsules formed in response to smooth and textured implants, insight into the development of breast implant-associated abnormalities can be gained. METHODS Miniature smooth or textured silicone implants were surgically inserted into female rats ( n = 10) and harvested for the surrounding capsules at postoperative week 6. RNA sequencing and quantitative polymerase chain reaction were performed to identify genes differentially expressed between smooth and textured capsules. For clinical correlation, the expression of candidate genes was assayed in implant capsules harvested from human patients with and without capsular contracture. RESULTS Of 18,555 differentially expressed transcripts identified, three candidate genes were selected: matrix metalloproteinase-3 ( MMP3 ), troponin-T3 ( TNNT3 ), and neuregulin-1 ( NRG1 ). In textured capsules, relative gene expression and immunostaining of MMP3 and TNNT3 was up-regulated, whereas NRG1 was down-regulated compared to smooth capsules [mean relative fold change, 8.79 ( P = 0.0059), 4.81 ( P = 0.0056), and 0.40 ( P < 0.0001), respectively]. Immunostaining of human specimens with capsular contracture revealed similar gene expression patterns to those of animal-derived smooth capsules. CONCLUSIONS An expression pattern of low MMP3 /low TNNT3 /high NRG1 is specifically associated with smooth implant capsules and human implant capsules with capsular contracture. The authors' clinically relevant breast implant rat model provides a strong foundation to further explore the molecular genetics of implant texture and its effect on breast implant-associated abnormalities. CLINICAL RELEVANCE STATEMENT The authors have demonstrated that there are distinct gene expression profiles in response to smooth versus textured breast implants. Since surface texture may be linked to implant-related pathology, further molecular analysis of periprosthetic capsules may yield strategies to mitigate implant-related complications.
Collapse
|
10
|
Chakraborty S, Hossain A, Cao T, Gnanagobal H, Segovia C, Hill S, Monk J, Porter J, Boyce D, Hall JR, Bindea G, Kumar S, Santander J. Multi-Organ Transcriptome Response of Lumpfish ( Cyclopterus lumpus) to Aeromonas salmonicida Subspecies salmonicida Systemic Infection. Microorganisms 2022; 10:2113. [PMID: 36363710 PMCID: PMC9692985 DOI: 10.3390/microorganisms10112113] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/17/2022] [Accepted: 10/21/2022] [Indexed: 09/10/2023] Open
Abstract
Lumpfish is utilized as a cleaner fish to biocontrol sealice infestations in Atlantic salmon farms. Aeromonas salmonicida, a Gram-negative facultative intracellular pathogen, is the causative agent of furunculosis in several fish species, including lumpfish. In this study, lumpfish were intraperitoneally injected with different doses of A. salmonicida to calculate the LD50. Samples of blood, head-kidney, spleen, and liver were collected at different time points to determine the infection kinetics. We determined that A. salmonicida LD50 is 102 CFU per dose. We found that the lumpfish head-kidney is the primary target organ of A. salmonicida. Triplicate biological samples were collected from head-kidney, spleen, and liver pre-infection and at 3- and 10-days post-infection for RNA-sequencing. The reference genome-guided transcriptome assembly resulted in 6246 differentially expressed genes. The de novo assembly resulted in 403,204 transcripts, which added 1307 novel genes not identified by the reference genome-guided transcriptome. Differential gene expression and gene ontology enrichment analyses suggested that A. salmonicida induces lethal infection in lumpfish by uncontrolled and detrimental blood coagulation, complement activation, inflammation, DNA damage, suppression of the adaptive immune system, and prevention of cytoskeleton formation.
Collapse
Affiliation(s)
- Setu Chakraborty
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Ahmed Hossain
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Trung Cao
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Hajarooba Gnanagobal
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Cristopher Segovia
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Stephen Hill
- Cold-Ocean Deep-Sea Research Facility, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Jennifer Monk
- Dr. Joe Brown Aquatic Research Building, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Jillian Porter
- Dr. Joe Brown Aquatic Research Building, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Danny Boyce
- Dr. Joe Brown Aquatic Research Building, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Jennifer R. Hall
- Aquatic Research Cluster, CREAIT Network, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Gabriela Bindea
- INSERM, Laboratory of Integrative Cancer Immunology, 75006 Paris, France
- Equipe Labellisée Ligue Contre Le Cancer, 75013 Paris, France
- Centre de Recherche des Cordeliers, Sorbonne Université, Université de Paris, 75006 Paris, France
| | - Surendra Kumar
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
- Ocean Frontier Institute, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Javier Santander
- Marine Microbial Pathogenesis and Vaccinology Laboratory, Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| |
Collapse
|
11
|
Fan J, Chan S, Patro R. Perplexity: evaluating transcript abundance estimation in the absence of ground truth. Algorithms Mol Biol 2022; 17:6. [PMID: 35331283 PMCID: PMC8951746 DOI: 10.1186/s13015-022-00214-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 03/01/2022] [Indexed: 11/20/2022] Open
Abstract
Background There has been rapid development of probabilistic models and inference methods for transcript abundance estimation from RNA-seq data. These models aim to accurately estimate transcript-level abundances, to account for different biases in the measurement process, and even to assess uncertainty in resulting estimates that can be propagated to subsequent analyses. The assumed accuracy of the estimates inferred by such methods underpin gene expression based analysis routinely carried out in the lab. Although hyperparameter selection is known to affect the distributions of inferred abundances (e.g. producing smooth versus sparse estimates), strategies for performing model selection in experimental data have been addressed informally at best. Results We derive perplexity for evaluating abundance estimates on fragment sets directly. We adapt perplexity from the analogous metric used to evaluate language and topic models and extend the metric to carefully account for corner cases unique to RNA-seq. In experimental data, estimates with the best perplexity also best correlate with qPCR measurements. In simulated data, perplexity is well behaved and concordant with genome-wide measurements against ground truth and differential expression analysis. Furthermore, we demonstrate theoretically and experimentally that perplexity can be computed for arbitrary transcript abundance estimation models. Conclusions Alongside the derivation and implementation of perplexity for transcript abundance estimation, our study is the first to make possible model selection for transcript abundance estimation on experimental data in the absence of ground truth.
Collapse
|
12
|
Zheng H, Ma C, Kingsford C. Deriving Ranges of Optimal Estimated Transcript Expression due to Nonidentifiability. J Comput Biol 2022; 29:121-139. [PMID: 35041494 PMCID: PMC8892959 DOI: 10.1089/cmb.2021.0444] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Current expression quantification methods suffer from a fundamental but undercharacterized type of error: the most likely estimates for transcript abundances are not unique. This means multiple estimates of transcript abundances generate the observed RNA-seq reads with equal likelihood, and the underlying true expression cannot be determined. This is called nonidentifiability in probabilistic modeling. It is further exacerbated by incomplete reference transcriptomes where reads may be sequenced from unannotated transcripts. Graph quantification is a generalization to transcript quantification, accounting for the reference incompleteness by allowing exponentially many unannotated transcripts to express reads. We propose methods to calculate a "confidence range of expression" for each transcript, representing its possible abundance across equally optimal estimates for both quantification models. This range informs both whether a transcript has potential estimation error due to nonidentifiability and the extent of the error. Applying our methods to the Human Body Map data, we observe that 35%-50% of transcripts potentially suffer from inaccurate quantification caused by nonidentifiability. When comparing the expression between isoforms in one sample, we find that the degree of inaccuracy of 20%-47% transcripts can be so large that the ranking of expression between the transcript and other isoforms from the same gene cannot be determined. When comparing the expression of a transcript between two groups of RNA-seq samples in differential expression analysis, we observe that the majority of detected differentially expressed transcripts are reliable with a few exceptions after considering the ranges of the optimal expression estimates.
Collapse
Affiliation(s)
- Hongyu Zheng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Cong Ma
- Computer Science Department, Princeton University, Princeton, New Jersey, USA
| | - Carl Kingsford
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
13
|
Mäklin T, Kallonen T, David S, Boinett CJ, Pascoe B, Méric G, Aanensen DM, Feil EJ, Baker S, Parkhill J, Sheppard SK, Corander J, Honkela A. High-resolution sweep metagenomics using fast probabilistic inference. Wellcome Open Res 2021; 5:14. [PMID: 34746439 PMCID: PMC8543175 DOI: 10.12688/wellcomeopenres.15639.2] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/30/2021] [Indexed: 01/13/2023] Open
Abstract
Determining the composition of bacterial communities beyond the level of a genus or species is challenging because of the considerable overlap between genomes representing close relatives. Here, we present the mSWEEP pipeline for identifying and estimating the relative sequence abundances of bacterial lineages from plate sweeps of enrichment cultures. mSWEEP leverages biologically grouped sequence assembly databases, applying probabilistic modelling, and provides controls for false positive results. Using sequencing data from major pathogens, we demonstrate significant improvements in lineage quantification and detection accuracy. Our pipeline facilitates investigating cultures comprising mixtures of bacteria, and opens up a new field of plate sweep metagenomics.
Collapse
Affiliation(s)
- Tommi Mäklin
- Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Teemu Kallonen
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Sophia David
- Centre for Genomic Pathogen Surveillance, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Christine J. Boinett
- Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
- Centre for Tropical Medicine and Global Health, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK
| | - Ben Pascoe
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, UK
| | - Guillaume Méric
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, UK
| | - David M. Aanensen
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
- Department of Infectious Disease Epidemiology, Imperial College London, London, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Edward J. Feil
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, UK
| | - Stephen Baker
- Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
- Centre for Tropical Medicine and Global Health, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK
| | - Julian Parkhill
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Samuel K. Sheppard
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, UK
| | - Jukka Corander
- Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Antti Honkela
- Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
- Department of Public Health, University of Helsinki, Helsinki, Finland
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| |
Collapse
|
14
|
Statistical Modeling of High Dimensional Counts. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2284:97-134. [PMID: 33835440 DOI: 10.1007/978-1-0716-1307-8_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Statistical modeling of count data from RNA sequencing (RNA-seq) experiments is important for proper interpretation of results. Here I will describe how count data can be modeled using count distributions, or alternatively analyzed using nonparametric methods. I will focus on basic routines for performing data input, scaling/normalization, visualization, and statistical testing to determine sets of features where the counts reflect differences in gene expression across samples. Finally, I discuss limitations and possible extensions to the models presented here.
Collapse
|
15
|
Cheng C, Liu L, Bao Y, Yi J, Quan W, Xue Y, Sun L, Zhang Y. SUVA: splicing site usage variation analysis from RNA-seq data reveals highly conserved complex splicing biomarkers in liver cancer. RNA Biol 2021; 18:157-171. [PMID: 34152934 DOI: 10.1080/15476286.2021.1940037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Most of the current alternative splicing (AS) analysis tools are powerless to analyse complex splicing. To address this, we developed SUVA (Splice sites Usage Variation Analysis) that decomposes complex splicing events into five types of splice junction pairs. By analysing real and simulated data, SUVA showed higher sensitivity and accuracy in detecting AS events than the compared methods. Notably, SUVA detected extensive complex AS events and screened out 69 highly conserved and dominant AS events associated with cancer. The cancer-associated complex AS events in FN1 and the co-regulated RNA-binding proteins were significantly correlated with patient survival.
Collapse
Affiliation(s)
- Chao Cheng
- ABLife BioBigData Institute, Wuhan, Hubei China.,Center for Genome Analysis, ABLife Inc., Wuhan, Hubei China
| | - Lei Liu
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun China
| | - Yongli Bao
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun China
| | - Jingwen Yi
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun China
| | - Weili Quan
- ABLife BioBigData Institute, Wuhan, Hubei China
| | - Yaqiang Xue
- ABLife BioBigData Institute, Wuhan, Hubei China
| | - Luguo Sun
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun China
| | - Yi Zhang
- ABLife BioBigData Institute, Wuhan, Hubei China.,Center for Genome Analysis, ABLife Inc., Wuhan, Hubei China
| |
Collapse
|
16
|
Jones DC, Ruzzo WL. Polee: RNA-Seq analysis using approximate likelihood. NAR Genom Bioinform 2021; 3:lqab046. [PMID: 34056596 PMCID: PMC8152449 DOI: 10.1093/nargab/lqab046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 04/11/2021] [Accepted: 05/11/2021] [Indexed: 12/20/2022] Open
Abstract
The analysis of mRNA transcript abundance with RNA-Seq is a central tool in molecular biology research, but often analyses fail to account for the uncertainty in these estimates, which can be significant, especially when trying to disentangle isoforms or duplicated genes. Preserving uncertainty necessitates a full probabilistic model of the all the sequencing reads, which quickly becomes intractable, as experiments can consist of billions of reads. To overcome these limitations, we propose a new method of approximating the likelihood function of a sparse mixture model, using a technique we call the Pólya tree transformation. We demonstrate that substituting this approximation for the real thing achieves most of the benefits with a fraction of the computational costs, leading to more accurate detection of differential transcript expression and transcript coexpression.
Collapse
Affiliation(s)
- Daniel C Jones
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
| | - Walter L Ruzzo
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
- Department of Genome Sciences, University of Washington, Box 355065, Seattle, WA 98195-5065, USA
- Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., P.O. Box 19024, Seattle, WA 98109, USA
| |
Collapse
|
17
|
Liu Y, Qu HQ, Chang X, Tian L, Qu J, Glessner J, Sleiman PMA, Hakonarson H. Machine Learning Reduced Gene/Non-Coding RNA Features That Classify Schizophrenia Patients Accurately and Highlight Insightful Gene Clusters. Int J Mol Sci 2021; 22:3364. [PMID: 33805976 PMCID: PMC8037538 DOI: 10.3390/ijms22073364] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 03/20/2021] [Accepted: 03/23/2021] [Indexed: 12/28/2022] Open
Abstract
RNA-seq has been a powerful method to detect the differentially expressed genes/long non-coding RNAs (lncRNAs) in schizophrenia (SCZ) patients; however, due to overfitting problems differentially expressed targets (DETs) cannot be used properly as biomarkers. This study used machine learning to reduce gene/non-coding RNA features. Dorsolateral prefrontal cortex (dlpfc) RNA-seq data from 254 individuals was obtained from the CommonMind consortium. The average predictive accuracy for SCZ patients was 67% based on coding genes, and 96% based on long non-coding RNAs (lncRNAs). Machine learning is a powerful algorithm to reduce functional biomarkers in SCZ patients. The lncRNAs capture the characteristics of SCZ tissue more accurately than mRNA as the former regulate every level of gene expression, not limited to mRNA levels.
Collapse
Affiliation(s)
- Yichuan Liu
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA; (Y.L.); (H.-Q.Q.); (X.C.); (L.T.); (J.Q.); (J.G.); (P.M.A.S.)
| | - Hui-Qi Qu
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA; (Y.L.); (H.-Q.Q.); (X.C.); (L.T.); (J.Q.); (J.G.); (P.M.A.S.)
| | - Xiao Chang
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA; (Y.L.); (H.-Q.Q.); (X.C.); (L.T.); (J.Q.); (J.G.); (P.M.A.S.)
| | - Lifeng Tian
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA; (Y.L.); (H.-Q.Q.); (X.C.); (L.T.); (J.Q.); (J.G.); (P.M.A.S.)
| | - Jingchun Qu
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA; (Y.L.); (H.-Q.Q.); (X.C.); (L.T.); (J.Q.); (J.G.); (P.M.A.S.)
| | - Joseph Glessner
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA; (Y.L.); (H.-Q.Q.); (X.C.); (L.T.); (J.Q.); (J.G.); (P.M.A.S.)
| | - Patrick M. A. Sleiman
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA; (Y.L.); (H.-Q.Q.); (X.C.); (L.T.); (J.Q.); (J.G.); (P.M.A.S.)
- Division of Human Genetics, Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA; (Y.L.); (H.-Q.Q.); (X.C.); (L.T.); (J.Q.); (J.G.); (P.M.A.S.)
- Division of Human Genetics, Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| |
Collapse
|
18
|
Bayesian Inference of Gene Expression. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] Open
|
19
|
Sarkar H, Srivastava A, Bravo HC, Love MI, Patro R. Terminus enables the discovery of data-driven, robust transcript groups from RNA-seq data. Bioinformatics 2021; 36:i102-i110. [PMID: 32657377 PMCID: PMC7355257 DOI: 10.1093/bioinformatics/btaa448] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Advances in sequencing technology, inference algorithms and differential testing methodology have enabled transcript-level analysis of RNA-seq data. Yet, the inherent inferential uncertainty in transcript-level abundance estimation, even among the most accurate approaches, means that robust transcript-level analysis often remains a challenge. Conversely, gene-level analysis remains a common and robust approach for understanding RNA-seq data, but it coarsens the resulting analysis to the level of genes, even if the data strongly support specific transcript-level effects. RESULTS We introduce a new data-driven approach for grouping together transcripts in an experiment based on their inferential uncertainty. Transcripts that share large numbers of ambiguously-mapping fragments with other transcripts, in complex patterns, often cannot have their abundances confidently estimated. Yet, the total transcriptional output of that group of transcripts will have greatly reduced inferential uncertainty, thus allowing more robust and confident downstream analysis. Our approach, implemented in the tool terminus, groups together transcripts in a data-driven manner allowing transcript-level analysis where it can be confidently supported, and deriving transcriptional groups where the inferential uncertainty is too high to support a transcript-level result. AVAILABILITY AND IMPLEMENTATION Terminus is implemented in Rust, and is freely available and open source. It can be obtained from https://github.com/COMBINE-lab/Terminus. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hirak Sarkar
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA.,Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
| | - Avi Srivastava
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA
| | - Héctor Corrada Bravo
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA.,Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC 27516, USA.,Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC 27514, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA.,Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
20
|
Simoneau J, Gosselin R, Scott MS. Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures. NAR Genom Bioinform 2021; 2:lqaa043. [PMID: 33575596 PMCID: PMC7671328 DOI: 10.1093/nargab/lqaa043] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 05/15/2020] [Accepted: 06/05/2020] [Indexed: 12/12/2022] Open
Abstract
RNA-seq is a modular experimental and computational approach aiming in identifying and quantifying RNA molecules. The modularity of the RNA-seq technology enables adaptation of the protocol to develop new ways to explore RNA biology, but this modularity also brings forth the importance of methodological thoroughness. Liberty of approach comes with the responsibility of choices, and such choices must be informed. Here, we present an approach that identifies gene group-specific quantification biases in current RNA-seq software and references by processing datasets using diverse RNA-seq computational pipelines, and by decomposing these expression datasets with an independent component analysis matrix factorization method. By exploring the RNA-seq pipeline using this systemic approach, we identify genome annotations as a design choice that affects to the same extent quantification results as does the choice of aligners and quantifiers. We also show that the different choices in RNA-seq methodology are not independent, identifying interactions between genome annotations and quantification software. Genes were mainly affected by differences in their sequence, by overlapping genes and genes with similar sequence. Our approach offers an explanation for the observed biases by identifying the common features used differently by the software and references, therefore providing leads for the betterment of RNA-seq methodology.
Collapse
Affiliation(s)
- Joël Simoneau
- Department of Biochemistry and Functional Genomics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, Québec, J1K 2R1, Canada
| | - Ryan Gosselin
- Department of Chemical & Biotechnological Engineering, Faculty of Engineering, Université de Sherbrooke, Sherbrooke, Québec, J1K 2R1, Canada
| | - Michelle S Scott
- Department of Biochemistry and Functional Genomics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, Québec, J1K 2R1, Canada
| |
Collapse
|
21
|
Huson KM, Atcheson E, Oliver NAM, Best P, Barley JP, Hanna REB, McNeilly TN, Fang Y, Haldenby S, Paterson S, Robinson MW. Transcriptome and Secretome Analysis of Intra-Mammalian Life-Stages of Calicophoron daubneyi Reveals Adaptation to a Unique Host Environment. Mol Cell Proteomics 2021; 20:100055. [PMID: 33581320 PMCID: PMC7973311 DOI: 10.1074/mcp.ra120.002175] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Paramphistomosis, caused by the rumen fluke, Calicophoron daubneyi, is a parasitic infection of ruminant livestock, which has seen a rapid rise in prevalence throughout Western Europe in recent years. After ingestion of metacercariae (parasite cysts) by the mammalian host, newly excysted juveniles (NEJs) emerge and invade the duodenal submucosa, which causes significant pathology in heavy infections. The immature flukes then migrate upward, along the gastrointestinal tract, and enter the rumen where they mature and begin to produce eggs. Despite their emergence, and sporadic outbreaks of acute disease, we know little about the molecular mechanisms used by C. daubneyi to establish infection, acquire nutrients, and avoid the host immune response. Here, transcriptome analysis of four intramammalian life-cycle stages, integrated with secretome analysis of the NEJ and adult parasites (responsible for acute and chronic diseases, respectively), revealed how the expression and secretion of selected families of virulence factors and immunomodulators are regulated in accordance with fluke development and migration. Our data show that while a family of cathepsins B with varying S2 subsite residues (indicating distinct substrate specificities) is differentially secreted by NEJs and adult flukes, cathepsins L and F are secreted in low abundance by NEJs only. We found that C. daubneyi has an expanded family of aspartic peptidases, which is upregulated in adult worms, although they are under-represented in the secretome. The most abundant proteins in adult fluke secretions were helminth defense molecules that likely establish an immune environment permissive to fluke survival and/or neutralize pathogen-associated molecular patterns such as bacterial lipopolysaccharide in the microbiome-rich rumen. The distinct collection of molecules secreted by C. daubneyi allowed the development of the first coproantigen-based ELISA for paramphistomosis which, importantly, did not recognize antigens from other helminths commonly found as coinfections with rumen fluke.
Collapse
Affiliation(s)
- Kathryn M Huson
- School of Biological Sciences, Queen's University Belfast, Belfast, Northern Ireland
| | - Erwan Atcheson
- School of Biological Sciences, Queen's University Belfast, Belfast, Northern Ireland
| | - Nicola A M Oliver
- School of Biological Sciences, Queen's University Belfast, Belfast, Northern Ireland
| | - Philip Best
- School of Biological Sciences, Queen's University Belfast, Belfast, Northern Ireland
| | - Jason P Barley
- Veterinary Sciences Division, Agri-Food and Biosciences Institute, Belfast, Northern Ireland
| | - Robert E B Hanna
- Veterinary Sciences Division, Agri-Food and Biosciences Institute, Belfast, Northern Ireland
| | - Tom N McNeilly
- Disease Control Department, Moredun Research Institute, Edinburgh, Scotland
| | - Yongxiang Fang
- Centre for Genomic Research, University of Liverpool, Liverpool, England
| | - Sam Haldenby
- Centre for Genomic Research, University of Liverpool, Liverpool, England
| | - Steve Paterson
- Centre for Genomic Research, University of Liverpool, Liverpool, England
| | - Mark W Robinson
- School of Biological Sciences, Queen's University Belfast, Belfast, Northern Ireland.
| |
Collapse
|
22
|
Patro R, Salmela L. Algorithms meet sequencing technologies - 10th edition of the RECOMB-Seq workshop. iScience 2021; 24:101956. [PMID: 33437938 PMCID: PMC7788091 DOI: 10.1016/j.isci.2020.101956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
DNA and RNA sequencing is a core technology in biological and medical research. The high throughput of these technologies and the consistent development of new experimental assays and biotechnologies demand the continuous development of methods to analyze the resulting data. The RECOMB Satellite Workshop on Massively Parallel Sequencing brings together leading researchers in computational genomics to discuss emerging frontiers in algorithm development for massively parallel sequencing data. The 10th meeting in this series, RECOMB-Seq 2020, was scheduled to be held in Padua, Italy, but due to the ongoing COVID-19 pandemic, the meeting was carried out virtually instead. The online workshop featured keynote talks by Paola Bonizzoni and Zamin Iqbal, two highlight talks, ten regular talks, and three short talks. Seven of the works presented in the workshop are featured in this edition of iScience, and many of the talks are available online in the RECOMB-Seq 2020 YouTube channel.
Collapse
Affiliation(s)
- Rob Patro
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, MD, USA
| | - Leena Salmela
- Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| |
Collapse
|
23
|
Srivastava A, Malik L, Sarkar H, Zakeri M, Almodaresi F, Soneson C, Love MI, Kingsford C, Patro R. Alignment and mapping methodology influence transcript abundance estimation. Genome Biol 2020; 21:239. [PMID: 32894187 PMCID: PMC7487471 DOI: 10.1186/s13059-020-02151-8] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 08/19/2020] [Indexed: 01/23/2023] Open
Abstract
Background The accuracy of transcript quantification using RNA-seq data depends on many factors, such as the choice of alignment or mapping method and the quantification model being adopted. While the choice of quantification model has been shown to be important, considerably less attention has been given to comparing the effect of various read alignment approaches on quantification accuracy. Results We investigate the influence of mapping and alignment on the accuracy of transcript quantification in both simulated and experimental data, as well as the effect on subsequent differential expression analysis. We observe that, even when the quantification model itself is held fixed, the effect of choosing a different alignment methodology, or aligning reads using different parameters, on quantification estimates can sometimes be large and can affect downstream differential expression analyses as well. These effects can go unnoticed when assessment is focused too heavily on simulated data, where the alignment task is often simpler than in experimentally acquired samples. We also introduce a new alignment methodology, called selective alignment, to overcome the shortcomings of lightweight approaches without incurring the computational cost of traditional alignment. Conclusion We observe that, on experimental datasets, the performance of lightweight mapping and alignment-based approaches varies significantly, and highlight some of the underlying factors. We show this variation both in terms of quantification and downstream differential expression analysis. In all comparisons, we also show the improved performance of our proposed selective alignment method and suggest best practices for performing RNA-seq quantification.
Collapse
Affiliation(s)
- Avi Srivastava
- Department of Computer Science, Stony Brook University, Stony Brook, USA
| | - Laraib Malik
- Department of Computer Science, Stony Brook University, Stony Brook, USA
| | - Hirak Sarkar
- Department of Computer Science, University of Maryland, College Park, USA
| | - Mohsen Zakeri
- Department of Computer Science, University of Maryland, College Park, USA
| | - Fatemeh Almodaresi
- Department of Computer Science, University of Maryland, College Park, USA
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA.,Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Carl Kingsford
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, USA.
| |
Collapse
|
24
|
Anyansi C, Straub TJ, Manson AL, Earl AM, Abeel T. Computational Methods for Strain-Level Microbial Detection in Colony and Metagenome Sequencing Data. Front Microbiol 2020; 11:1925. [PMID: 33013732 PMCID: PMC7507117 DOI: 10.3389/fmicb.2020.01925] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 07/22/2020] [Indexed: 01/17/2023] Open
Abstract
Metagenomic sequencing is a powerful tool for examining the diversity and complexity of microbial communities. Most widely used tools for taxonomic profiling of metagenomic sequence data allow for a species-level overview of the composition. However, individual strains within a species can differ greatly in key genotypic and phenotypic characteristics, such as drug resistance, virulence and growth rate. Therefore, the ability to resolve microbial communities down to the level of individual strains within a species is critical to interpreting metagenomic data for clinical and environmental applications, where identifying a particular strain, or tracking a particular strain across a set of samples, can help aid in clinical diagnosis and treatment, or in characterizing yet unstudied strains across novel environmental locations. Recently published approaches have begun to tackle the problem of resolving strains within a particular species in metagenomic samples. In this review, we present an overview of these new algorithms and their uses, including methods based on assembly reconstruction and methods operating with or without a reference database. While existing metagenomic analysis methods show reasonable performance at the species and higher taxonomic levels, identifying closely related strains within a species presents a bigger challenge, due to the diversity of databases, genetic relatedness, and goals when conducting these analyses. Selection of which metagenomic tool to employ for a specific application should be performed on a case-by case basis as these tools have strengths and weaknesses that affect their performance on specific tasks. A comprehensive benchmark across different use case scenarios is vital to validate performance of these tools on microbial samples. Because strain-level metagenomic analysis is still in its infancy, development of more fine-grained, high-resolution algorithms will continue to be in demand for the future.
Collapse
Affiliation(s)
- Christine Anyansi
- Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| | - Timothy J. Straub
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Abigail L. Manson
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| | - Ashlee M. Earl
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| | - Thomas Abeel
- Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| |
Collapse
|
25
|
Doskey CM, Fader KA, Nault R, Lydic T, Matthews J, Potter D, Sharratt B, Williams K, Zacharewski T. 2,3,7,8-Tetrachlorodibenzo-p-dioxin (TCDD) alters hepatic polyunsaturated fatty acid metabolism and eicosanoid biosynthesis in female Sprague-Dawley rats. Toxicol Appl Pharmacol 2020; 398:115034. [PMID: 32387183 PMCID: PMC7294678 DOI: 10.1016/j.taap.2020.115034] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Revised: 04/30/2020] [Accepted: 05/04/2020] [Indexed: 12/16/2022]
Abstract
2,3,7,8-Tetrachlorodibenzo-p-dioxin (TCDD) is a potent aryl hydrocarbon receptor (AhR) agonist that elicits a broad spectrum of dose-dependent hepatic effects including lipid accumulation, inflammation, and fibrosis. To determine the role of inflammatory lipid mediators in TCDD-mediated hepatotoxicity, eicosanoid metabolism was investigated. Female Sprague-Dawley (SD) rats were orally gavaged with sesame oil vehicle or 0.01-10 μg/kg TCDD every 4 days for 28 days. Hepatic RNA-Seq data was integrated with untargeted metabolomics of liver, serum, and urine, revealing dose-dependent changes in linoleic acid (LA) and arachidonic acid (AA) metabolism. TCDD also elicited dose-dependent differential gene expression associated with the cyclooxygenase, lipoxygenase, and cytochrome P450 epoxidation/hydroxylation pathways with corresponding changes in ω-6 (e.g. AA and LA) and ω-3 polyunsaturated fatty acids (PUFAs), as well as associated eicosanoid metabolites. Overall, TCDD increased the ratio of ω-6 to ω-3 PUFAs. Phospholipase A2 (Pla2g12a) was induced consistent with increased AA metabolism, while AA utilization by induced lipoxygenases Alox5 and Alox15 increased leukotrienes (LTs). More specifically, TCDD increased pro-inflammatory eicosanoids including leukotriene LTB4, and LTB3, known to recruit neutrophils to damaged tissue. Dose-response modeling suggests the cytochrome P450 hydroxylase/epoxygenase and lipoxygenase pathways are more sensitive to TCDD than the cyclooxygenase pathway. Hepatic AhR ChIP-Seq analysis found little enrichment within the regulatory regions of differentially expressed genes (DEGs) involved in eicosanoid biosynthesis, suggesting TCDD-elicited dysregulation of eicosanoid metabolism is a downstream effect of AhR activation. Overall, these results suggest alterations in eicosanoid metabolism may play a key role in TCDD-elicited hepatotoxicity associated with the progression of steatosis to steatohepatitis.
Collapse
Affiliation(s)
- Claire M Doskey
- Department of Biochemistry & Molecular Biology, Institute for Integrative Toxicology, Michigan State University, East Lansing, MI 48824, United States
| | - Kelly A Fader
- Department of Biochemistry & Molecular Biology, Institute for Integrative Toxicology, Michigan State University, East Lansing, MI 48824, United States
| | - Rance Nault
- Department of Biochemistry & Molecular Biology, Institute for Integrative Toxicology, Michigan State University, East Lansing, MI 48824, United States
| | - Todd Lydic
- Department of Physiology, Michigan State University, East Lansing, MI 48824, United States
| | - Jason Matthews
- Department of Nutrition, Institute of Basic Medical Sciences, University of Oslo, Oslo 0316, Norway
| | - Dave Potter
- Wellington Laboratories Inc., Guelph, Ontario NIG 3M5, Canada
| | - Bonnie Sharratt
- Wellington Laboratories Inc., Guelph, Ontario NIG 3M5, Canada
| | - Kurt Williams
- Department of Pathobiology and Diagnostic Investigation, Michigan State, East Lansing, MI 48824, United States
| | - Tim Zacharewski
- Department of Biochemistry & Molecular Biology, Institute for Integrative Toxicology, Michigan State University, East Lansing, MI 48824, United States.
| |
Collapse
|
26
|
Mäklin T, Kallonen T, David S, Boinett CJ, Pascoe B, Méric G, Aanensen DM, Feil EJ, Baker S, Parkhill J, Sheppard SK, Corander J, Honkela A. High-resolution sweep metagenomics using fast probabilistic inference. Wellcome Open Res 2020; 5:14. [PMID: 34746439 PMCID: PMC8543175 DOI: 10.12688/wellcomeopenres.15639.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/02/2020] [Indexed: 12/29/2022] Open
Abstract
Determining the composition of bacterial communities beyond the level of a genus or species is challenging because of the considerable overlap between genomes representing close relatives. Here, we present the mSWEEP pipeline for identifying and estimating the relative sequence abundances of bacterial lineages from plate sweeps of enrichment cultures. mSWEEP leverages biologically grouped sequence assembly databases, applying probabilistic modelling, and provides controls for false positive results. Using sequencing data from major pathogens, we demonstrate significant improvements in lineage quantification and detection accuracy. Our pipeline facilitates investigating cultures comprising mixtures of bacteria, and opens up a new field of plate sweep metagenomics.
Collapse
Affiliation(s)
- Tommi Mäklin
- Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Teemu Kallonen
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Sophia David
- Centre for Genomic Pathogen Surveillance, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Christine J. Boinett
- Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
- Centre for Tropical Medicine and Global Health, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK
| | - Ben Pascoe
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, UK
| | - Guillaume Méric
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, UK
| | - David M. Aanensen
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
- Department of Infectious Disease Epidemiology, Imperial College London, London, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Edward J. Feil
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, UK
| | - Stephen Baker
- Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
- Centre for Tropical Medicine and Global Health, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK
| | - Julian Parkhill
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Samuel K. Sheppard
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, UK
| | - Jukka Corander
- Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Antti Honkela
- Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
- Department of Public Health, University of Helsinki, Helsinki, Finland
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| |
Collapse
|
27
|
Transcriptome Analysis Reveals the Molecular Mechanisms Underlying Adenosine Biosynthesis in Anamorph Strain of Caterpillar Fungus. BIOMED RESEARCH INTERNATIONAL 2020; 2019:1864168. [PMID: 31915684 PMCID: PMC6935459 DOI: 10.1155/2019/1864168] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Accepted: 07/28/2019] [Indexed: 01/19/2023]
Abstract
Caterpillar fungus is a well-known fungal Chinese medicine. To reveal molecular changes during early and late stages of adenosine biosynthesis, transcriptome analysis was performed with the anamorph strain of caterpillar fungus. A total of 2,764 differentially expressed genes (DEGs) were identified (p ≤ 0.05, |log2 Ratio| ≥ 1), of which 1,737 were up-regulated and 1,027 were down-regulated. Gene expression profiling on 4–10 d revealed a distinct shift in expression of the purine metabolism pathway. Differential expression of 17 selected DEGs which involved in purine metabolism (map00230) were validated by qPCR, and the expression trends were consistent with the RNA-Seq results. Subsequently, the predicted adenosine biosynthesis pathway combined with qPCR and gene expression data of RNA-Seq indicated that the increased adenosine accumulation is a result of down-regulation of ndk, ADK, and APRT genes combined with up-regulation of AK gene. This study will be valuable for understanding the molecular mechanisms of the adenosine biosynthesis in caterpillar fungus.
Collapse
|
28
|
Ma C, Kingsford C. Detecting, Categorizing, and Correcting Coverage Anomalies of RNA-Seq Quantification. Cell Syst 2019; 9:589-599.e7. [PMID: 31786209 DOI: 10.1016/j.cels.2019.10.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Revised: 07/09/2019] [Accepted: 10/17/2019] [Indexed: 11/13/2022]
Abstract
Because of incomplete reference transcriptomes, incomplete sequencing bias models, or other modeling defects, algorithms to infer isoform expression from RNA sequencing (RNA-seq) sometimes do not accurately model expression. We present a computational method to detect instances where a quantification algorithm could not completely explain the input reads. Our approach identifies regions where the read coverage significantly deviates from expectation. We call these regions "expression anomalies." We further present a method to attribute their cause to either the incompleteness of the reference transcriptome or algorithmic mistakes. We detect anomalies for 30 GEUVADIS and 16 Human Body Map samples. By correcting anomalies when possible, we reduce the number of falsely predicted instances of differential expression. Anomalies that cannot be corrected are suspected to indicate the existence of isoforms unannotated by the reference. We detected 88 common anomalies of this type and find that they tend to have a lower-than-expected coverage toward their 3' ends.
Collapse
Affiliation(s)
- Cong Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213, USA
| | - Carl Kingsford
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213, USA.
| |
Collapse
|
29
|
Zhu A, Srivastava A, Ibrahim JG, Patro R, Love MI. Nonparametric expression analysis using inferential replicate counts. Nucleic Acids Res 2019; 47:e105. [PMID: 31372651 PMCID: PMC6765120 DOI: 10.1093/nar/gkz622] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Revised: 06/11/2019] [Accepted: 07/11/2019] [Indexed: 11/13/2022] Open
Abstract
A primary challenge in the analysis of RNA-seq data is to identify differentially expressed genes or transcripts while controlling for technical biases. Ideally, a statistical testing procedure should incorporate the inherent uncertainty of the abundance estimates arising from the quantification step. Most popular methods for RNA-seq differential expression analysis fit a parametric model to the counts for each gene or transcript, and a subset of methods can incorporate uncertainty. Previous work has shown that nonparametric models for RNA-seq differential expression may have better control of the false discovery rate, and adapt well to new data types without requiring reformulation of a parametric model. Existing nonparametric models do not take into account inferential uncertainty, leading to an inflated false discovery rate, in particular at the transcript level. We propose a nonparametric model for differential expression analysis using inferential replicate counts, extending the existing SAMseq method to account for inferential uncertainty. We compare our method, Swish, with popular differential expression analysis methods. Swish has improved control of the false discovery rate, in particular for transcripts with high inferential uncertainty. We apply Swish to a single-cell RNA-seq dataset, assessing differential expression between sub-populations of cells, and compare its performance to the Wilcoxon test.
Collapse
Affiliation(s)
- Anqi Zhu
- Department of Biostatistics, University of North Carolina-Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599, USA
| | - Avi Srivastava
- Department of Computer Science, Stony Brook University, Computer Science Building, Engineering Dr, Stony Brook, NY 11794, USA
| | - Joseph G Ibrahim
- Department of Biostatistics, University of North Carolina-Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599, USA
| | - Rob Patro
- Department of Computer Science, Stony Brook University, Computer Science Building, Engineering Dr, Stony Brook, NY 11794, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina-Chapel Hill, 120 Mason Farm Rd, Chapel Hill, NC 27514, USA
| |
Collapse
|
30
|
Bendall ML, de Mulder M, Iñiguez LP, Lecanda-Sánchez A, Pérez-Losada M, Ostrowski MA, Jones RB, Mulder LCF, Reyes-Terán G, Crandall KA, Ormsby CE, Nixon DF. Telescope: Characterization of the retrotranscriptome by accurate estimation of transposable element expression. PLoS Comput Biol 2019; 15:e1006453. [PMID: 31568525 PMCID: PMC6786656 DOI: 10.1371/journal.pcbi.1006453] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 10/10/2019] [Accepted: 09/17/2019] [Indexed: 12/20/2022] Open
Abstract
Characterization of Human Endogenous Retrovirus (HERV) expression within the transcriptomic landscape using RNA-seq is complicated by uncertainty in fragment assignment because of sequence similarity. We present Telescope, a computational software tool that provides accurate estimation of transposable element expression (retrotranscriptome) resolved to specific genomic locations. Telescope directly addresses uncertainty in fragment assignment by reassigning ambiguously mapped fragments to the most probable source transcript as determined within a Bayesian statistical model. We demonstrate the utility of our approach through single locus analysis of HERV expression in 13 ENCODE cell types. When examined at this resolution, we find that the magnitude and breadth of the retrotranscriptome can be vastly different among cell types. Furthermore, our approach is robust to differences in sequencing technology and demonstrates that the retrotranscriptome has potential to be used for cell type identification. We compared our tool with other approaches for quantifying transposable element (TE) expression, and found that Telescope has the greatest resolution, as it estimates expression at specific TE insertions rather than at the TE subfamily level. Telescope performs highly accurate quantification of the retrotranscriptomic landscape in RNA-seq experiments, revealing a differential complexity in the transposable element biology of complex systems not previously observed. Telescope is available at https://github.com/mlbendall/telescope.
Collapse
Affiliation(s)
- Matthew L. Bendall
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, D.C., United States of America
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, New York, N.Y., United States of America
| | - Miguel de Mulder
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, New York, N.Y., United States of America
| | - Luis Pedro Iñiguez
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, New York, N.Y., United States of America
- Center for Research in Infectious Diseases (CIENI), Instituto Nacional de Enfermedades Respiratorias, Mexico City, Mexico
| | - Aarón Lecanda-Sánchez
- Center for Research in Infectious Diseases (CIENI), Instituto Nacional de Enfermedades Respiratorias, Mexico City, Mexico
| | - Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, D.C., United States of America
- Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, D.C., United States of America
- CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão, Portugal
| | - Mario A. Ostrowski
- Department of Immunology, University of Toronto, Toronto, Ontario, Canada
- Keenan Research Centre for Biomedical Science of St. Michael's Hospital, Toronto, Ontario, Canada
| | - R. Brad Jones
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, New York, N.Y., United States of America
| | - Lubbertus C. F. Mulder
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- The Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Gustavo Reyes-Terán
- Center for Research in Infectious Diseases (CIENI), Instituto Nacional de Enfermedades Respiratorias, Mexico City, Mexico
| | - Keith A. Crandall
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, D.C., United States of America
- Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, D.C., United States of America
| | - Christopher E. Ormsby
- Center for Research in Infectious Diseases (CIENI), Instituto Nacional de Enfermedades Respiratorias, Mexico City, Mexico
| | - Douglas F. Nixon
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, New York, N.Y., United States of America
| |
Collapse
|
31
|
Madroñero J, Corredor Rozo ZL, Escobar Pérez JA, Velandia Romero ML. Next generation sequencing and proteomics in plant virology: how is Colombia doing? ACTA BIOLÓGICA COLOMBIANA 2019. [DOI: 10.15446/abc.v24n3.79486] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Crop production and trade are two of the most economically important activities in Colombia, and viral diseases cause a high negative impact to agricultural sector. Therefore, the detection, diagnosis, control, and management of viral diseases are crucial. Currently, Next-Generation Sequencing (NGS) and ‘Omic’ technologies constitute a right-hand tool for the discovery of novel viruses and for studying virus-plant interactions. This knowledge allows the development of new viral diagnostic methods and the discovery of key components of infectious processes, which could be used to generate plants resistant to viral infections. Globally, crop sciences are advancing in this direction. In this review, advancements in ‘omic’ technologies and their different applications in plant virology in Colombia are discussed. In addition, bioinformatics pipelines and resources for omics data analyses are presented. Due to their decreasing prices, NGS technologies are becoming an affordable and promising means to explore many phytopathologies affecting a wide variety of Colombian crops so as to improve their trade potential.
Collapse
|
32
|
Systematic characterization of BAF mutations provides insights into intracomplex synthetic lethalities in human cancers. Nat Genet 2019; 51:1399-1410. [PMID: 31427792 PMCID: PMC6952272 DOI: 10.1038/s41588-019-0477-9] [Citation(s) in RCA: 87] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Accepted: 06/04/2019] [Indexed: 12/26/2022]
Abstract
Aberrations in genes coding for subunits of the BAF chromatin remodeling complexes are highly abundant in human cancers. Currently, it is not understood how these loss-of-function mutations contribute to cancer development and how they can be targeted therapeutically. The cancer-type-specific occurrence patterns of certain subunit mutations suggest subunit-specific effects on BAF complex function, possibly by the formation of aberrant residual complexes. Here, we systematically characterize the effects of individual subunit loss on complex composition, chromatin accessibility and gene expression in a panel of knock-out cell lines deficient for 22 BAF subunits. We observe strong, specific and sometimes discordant alterations dependent on the targeted subunit and show that these explain intra-complex co-dependencies, including the synthetic lethal interactions SMARCA4-ARID2, SMARCA4-ACTB and SMARCC1-SMARCC2. These data provide insights into the role of different BAF subcomplexes in genome-wide chromatin organization and suggest approaches to therapeutically target BAF mutant cancers.
Collapse
|
33
|
Van den Berge K, Hembach KM, Soneson C, Tiberi S, Clement L, Love MI, Patro R, Robinson MD. RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-072018-021255] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Gene expression is the fundamental level at which the results of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq data sets, as well as the performance of the myriad of methods developed. In this review, we give an overview of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on the quantification of gene expression and statistical approachesfor differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.
Collapse
Affiliation(s)
- Koen Van den Berge
- Bioinformatics Institute Ghent and Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium
| | - Katharina M. Hembach
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Charlotte Soneson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Simone Tiberi
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Lieven Clement
- Bioinformatics Institute Ghent and Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium
| | - Michael I. Love
- Department of Biostatistics and Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27514, USA
| | - Rob Patro
- Department of Computer Science, Stony Brook University, Stony Brook, New York 11794, USA
| | - Mark D. Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| |
Collapse
|
34
|
Warner AD, Gevirtzman L, Hillier LW, Ewing B, Waterston RH. The C. elegans embryonic transcriptome with tissue, time, and alternative splicing resolution. Genome Res 2019; 29:1036-1045. [PMID: 31123079 PMCID: PMC6581053 DOI: 10.1101/gr.243394.118] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2018] [Accepted: 03/11/2019] [Indexed: 12/22/2022]
Abstract
We have used RNA-seq in Caenorhabditis elegans to produce transcription profiles for seven specific embryonic cell populations from gastrulation to the onset of terminal differentiation. The expression data for these seven cell populations, covering major cell lineages and tissues in the worm, reveal the complex and dynamic changes in gene expression, both spatially and temporally. Also, within genes, start sites and exon usage can be highly differential, producing transcripts that are specific to developmental periods or cell lineages. We have also found evidence of novel exons and introns, as well as differential usage of SL1 and SL2 splice leaders. By combining this data set with the modERN ChIP-seq resource, we are able to support and predict gene regulatory relationships. The detailed information on differences and similarities between gene expression in cell lineages and tissues should be of great value to the community and provides a framework for the investigation of expression in individual cells.
Collapse
Affiliation(s)
- Adam D Warner
- Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington 98195, USA
| | - Louis Gevirtzman
- Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington 98195, USA
| | - LaDeana W Hillier
- Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington 98195, USA
| | - Brent Ewing
- Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington 98195, USA
| | - Robert H Waterston
- Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
35
|
Yang W, Rosenstiel P, Schulenburg H. aFold - using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data. BMC Genomics 2019; 20:364. [PMID: 31077153 PMCID: PMC6509820 DOI: 10.1186/s12864-019-5686-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 04/11/2019] [Indexed: 12/12/2022] Open
Abstract
Background Data normalization and identification of significant differential expression represent crucial steps in RNA-Seq analysis. Many available tools rely on assumptions that are often not met by real data, including the common assumption of symmetrical distribution of up- and down-regulated genes, the presence of only few differentially expressed genes and/or few outliers. Moreover, the cut-off for selecting significantly differentially expressed genes for further downstream analysis often depend on arbitrary choices. Results We here introduce a new tool for estimating differential expression in noisy real-life data. It employs a novel normalization procedure (qtotal), which takes account of the overall distribution of read counts for data standardization enhancing reliable identification of differential gene expression, especially in case of asymmetrical distributions of up- and downregulated genes. The tool then introduces a polynomial algorithm (aFold) to model the uncertainty of read counts across treatments and genes. We extensively benchmark aFold on a variety of simulated and validated real-life data sets (e.g. ABRF, SEQC and MAQC-II) and show a higher ability to correctly identify differentially expressed genes under most tested conditions. aFold infers fold change values that are comparable across experiments, thereby facilitating data clustering, visualization, and other downstream applications. Conclusions We here present a new transcriptomics analysis tool that includes both a data normalization method and a differential expression analysis approach. The new tool is shown to enhance reliable identification of significant differential expression across distinct data distributions. It outcompetes alternative procedures in case of asymmetrical distributions of up- versus down-regulated genes and also the presence of outliers, all common to real data sets. Electronic supplementary material The online version of this article (10.1186/s12864-019-5686-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wentao Yang
- Evolutionary Ecology and Genetics, Zoological Institute, CAU Kiel, Am Botanischen Garten 9, 24118, Kiel, Germany.
| | - Philip Rosenstiel
- Institute for Clinical Molecular Biology, CAU Kiel, Am Botanischen Garten 11, 24118, Kiel, Germany
| | - Hinrich Schulenburg
- Evolutionary Ecology and Genetics, Zoological Institute, CAU Kiel, Am Botanischen Garten 9, 24118, Kiel, Germany. .,Max Planck Institute for Evolutionary Biology, Ausgust-Thienemann-Str. 2, 24306 Ploen, Kiel, Germany.
| |
Collapse
|
36
|
Kuosmanen A, Norri T, Mäkinen V. Evaluating approaches to find exon chains based on long reads. Brief Bioinform 2019; 19:404-414. [PMID: 28069635 PMCID: PMC5952954 DOI: 10.1093/bib/bbw137] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Indexed: 11/25/2022] Open
Abstract
Transcript prediction can be modeled as a graph problem where exons are modeled as nodes and reads spanning two or more exons are modeled as exon chains. Pacific Biosciences third-generation sequencing technology produces significantly longer reads than earlier second-generation sequencing technologies, which gives valuable information about longer exon chains in a graph. However, with the high error rates of third-generation sequencing, aligning long reads correctly around the splice sites is a challenging task. Incorrect alignments lead to spurious nodes and arcs in the graph, which in turn lead to incorrect transcript predictions. We survey several approaches to find the exon chains corresponding to long reads in a splicing graph, and experimentally study the performance of these methods using simulated data to allow for sensitivity/precision analysis. Our experiments show that short reads from second-generation sequencing can be used to significantly improve exon chain correctness either by error-correcting the long reads before splicing graph creation, or by using them to create a splicing graph on which the long-read alignments are then projected. We also study the memory and time consumption of various modules, and show that accurate exon chains lead to significantly increased transcript prediction accuracy. Availability: The simulated data and in-house scripts used for this article are available at http://www.cs.helsinki.fi/group/gsa/exon-chains/exon-chains-bib.tar.bz2.
Collapse
Affiliation(s)
- Anna Kuosmanen
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Tuukka Norri
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Veli Mäkinen
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| |
Collapse
|
37
|
Soneson C, Love MI, Patro R, Hussain S, Malhotra D, Robinson MD. A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs. Life Sci Alliance 2019; 2:2/1/e201800175. [PMID: 30655364 PMCID: PMC6337739 DOI: 10.26508/lsa.201800175] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 01/07/2019] [Accepted: 01/08/2019] [Indexed: 02/01/2023] Open
Abstract
Comparison of observed exon–exon junction counts to those predicted from estimated transcript abundances can identify genes with misannotated or misquantified transcripts. Most methods for statistical analysis of RNA-seq data take a matrix of abundance estimates for some type of genomic features as their input, and consequently the quality of any obtained results is directly dependent on the quality of these abundances. Here, we present the junction coverage compatibility score, which provides a way to evaluate the reliability of transcript-level abundance estimates and the accuracy of transcript annotation catalogs. It works by comparing the observed number of reads spanning each annotated splice junction in a genomic region to the predicted number of junction-spanning reads, inferred from the estimated transcript abundances and the genomic coordinates of the corresponding annotated transcripts. We show that although most genes show good agreement between the observed and predicted junction coverages, there is a small set of genes that do not. Genes with poor agreement are found regardless of the method used to estimate transcript abundances, and the corresponding transcript abundances should be treated with care in any downstream analyses.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland .,SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.,Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Rob Patro
- Department of Computer Science, Stony Brook University, NY, USA
| | - Shobbir Hussain
- Department of Biology and Biochemistry, University of Bath, Bath, UK
| | - Dheeraj Malhotra
- F. Hoffmann-La Roche Ltd, Pharma Research and Early Development, Neuroscience, Ophthalmology and Rare Diseases, Roche Innovation Center Basel, Basel, Switzerland
| | - Mark D Robinson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland .,SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| |
Collapse
|
38
|
Chabbert CD, Eberhart T, Guccini I, Krek W, Kovacs WJ. Correction of gene model annotations improves isoform abundance estimates: the example of ketohexokinase ( Khk). F1000Res 2018; 7:1956. [PMID: 31001414 PMCID: PMC6464065 DOI: 10.12688/f1000research.17082.2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/20/2019] [Indexed: 12/13/2022] Open
Abstract
Next generation sequencing protocols such as RNA-seq have made the genome-wide characterization of the transcriptome a crucial part of many research projects in biology. Analyses of the resulting data provide key information on gene expression and in certain cases on exon or isoform usage. The emergence of transcript quantification software such as Salmon has enabled researchers to efficiently estimate isoform and gene expressions across the genome while tremendously reducing the necessary computational power. Although overall gene expression estimations were shown to be accurate, isoform expression quantifications appear to be a more challenging task. Low expression levels and uneven or insufficient coverage were reported as potential explanations for inconsistent estimates. Here, through the example of the ketohexokinase (
Khk) gene in mouse, we demonstrate that the use of an incorrect gene annotation can also result in erroneous isoform quantification results. Manual correction of the input
Khk gene model provided a much more accurate estimation of relative
Khk isoform expression when compared to quantitative PCR (qPCR measurements). In particular, removal of an unexpressed retained intron and a proper adjustment of the 5’ and 3’ untranslated regions both had a strong impact on the correction of erroneous estimates. Finally, we observed a better concordance in isoform quantification between datasets and sequencing strategies when relying on the newly generated
Khk annotations. These results highlight the importance of accurate gene models and annotations for correct isoform quantification and reassert the need for orthogonal methods of estimation of isoform expression to confirm important findings.
Collapse
Affiliation(s)
| | - Tanja Eberhart
- Institute of Molecular Health Sciences, ETH Zurich, Zurich, 8093, Switzerland
| | - Ilaria Guccini
- Institute of Molecular Health Sciences, ETH Zurich, Zurich, 8093, Switzerland
| | - Wilhelm Krek
- Institute of Molecular Health Sciences, ETH Zurich, Zurich, 8093, Switzerland
| | - Werner J Kovacs
- Institute of Molecular Health Sciences, ETH Zurich, Zurich, 8093, Switzerland
| |
Collapse
|
39
|
Topa H, Honkela A. GPrank: an R package for detecting dynamic elements from genome-wide time series. BMC Bioinformatics 2018; 19:367. [PMID: 30286713 PMCID: PMC6172792 DOI: 10.1186/s12859-018-2370-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 09/11/2018] [Indexed: 01/30/2023] Open
Abstract
Background Genome-wide high-throughput sequencing (HTS) time series experiments are a powerful tool for monitoring various genomic elements over time. They can be used to monitor, for example, gene or transcript expression with RNA sequencing (RNA-seq), DNA methylation levels with bisulfite sequencing (BS-seq), or abundances of genetic variants in populations with pooled sequencing (Pool-seq). However, because of high experimental costs, the time series data sets often consist of a very limited number of time points with very few or no biological replicates, posing challenges in the data analysis. Results Here we present the GPrank R package for modelling genome-wide time series by incorporating variance information obtained during pre-processing of the HTS data using probabilistic quantification methods or from a beta-binomial model using sequencing depth. GPrank is well-suited for analysing both short and irregularly sampled time series. It is based on modelling each time series by two Gaussian process (GP) models, namely, time-dependent and time-independent GP models, and comparing the evidence provided by data under two models by computing their Bayes factor (BF). Genomic elements are then ranked by their BFs, and temporally most dynamic elements can be identified. Conclusions Incorporating the variance information helps GPrank avoid false positives without compromising computational efficiency. Fitted models can be easily further explored in a browser. Detection and visualisation of temporally most active dynamic elements in the genome can provide a good starting point for further downstream analyses for increasing our understanding of the studied processes.
Collapse
Affiliation(s)
- Hande Topa
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, 00014, Finland. .,Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, 00076, Finland.
| | - Antti Honkela
- Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, 00014, Finland.,Department of Public Health, University of Helsinki, Helsinki, 00014, Finland
| |
Collapse
|
40
|
Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data. G3-GENES GENOMES GENETICS 2018; 8:2923-2940. [PMID: 30021829 PMCID: PMC6118309 DOI: 10.1534/g3.118.200373] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Alternative splicing leverages genomic content by allowing the synthesis of multiple transcripts and, by implication, protein isoforms, from a single gene. However, estimating the abundance of transcripts produced in a given tissue from short sequencing reads is difficult and can result in both the construction of transcripts that do not exist, and the failure to identify true transcripts. An alternative approach is to catalog the events that make up isoforms (splice junctions and exons). We present here the Event Analysis (EA) approach, where we project transcripts onto the genome and identify overlapping/unique regions and junctions. In addition, all possible logical junctions are assembled into a catalog. Transcripts are filtered before quantitation based on simple measures: the proportion of the events detected, and the coverage. We find that mapping to a junction catalog is more efficient at detecting novel junctions than mapping in a splice aware manner. We identify 99.8% of true transcripts while iReckon identifies 82% of the true transcripts and creates more transcripts not included in the simulation than were initially used in the simulation. Using PacBio Iso-seq data from a mouse neural progenitor cell model, EA detects 60% of the novel junctions that are combinations of existing exons while only 43% are detected by STAR. EA further detects ∼5,000 annotated junctions missed by STAR. Filtering transcripts based on the proportion of the transcript detected and the number of reads on average supporting that transcript captures 95% of the PacBio transcriptome. Filtering the reference transcriptome before quantitation, results in is a more stable estimate of isoform abundance, with improved correlation between replicates. This was particularly evident when EA is applied to an RNA-seq study of type 1 diabetes (T1D), where the coefficient of variation among subjects (n = 81) in the transcript abundance estimates was substantially reduced compared to the estimation using the full reference. EA focuses on individual transcriptional events. These events can be quantitate and analyzed directly or used to identify the probable set of expressed transcripts. Simple rules based on detected events and coverage used in filtering result in a dramatic improvement in isoform estimation without the use of ancillary data (e.g., ChIP, long reads) that may not be available for many studies.
Collapse
|
41
|
The DNA methylation landscape of glioblastoma disease progression shows extensive heterogeneity in time and space. Nat Med 2018; 24:1611-1624. [PMID: 30150718 DOI: 10.1038/s41591-018-0156-x] [Citation(s) in RCA: 179] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2017] [Accepted: 07/12/2018] [Indexed: 12/12/2022]
Abstract
Glioblastoma is characterized by widespread genetic and transcriptional heterogeneity, yet little is known about the role of the epigenome in glioblastoma disease progression. Here, we present genome-scale maps of DNA methylation in matched primary and recurring glioblastoma tumors, using data from a highly annotated clinical cohort that was selected through a national patient registry. We demonstrate the feasibility of DNA methylation mapping in a large set of routinely collected FFPE samples, and we validate bisulfite sequencing as a multipurpose assay that allowed us to infer a range of different genetic, epigenetic, and transcriptional characteristics of the profiled tumor samples. On the basis of these data, we identified subtle differences between primary and recurring tumors, links between DNA methylation and the tumor microenvironment, and an association of epigenetic tumor heterogeneity with patient survival. In summary, this study establishes an open resource for dissecting DNA methylation heterogeneity in a genetically diverse and heterogeneous cancer, and it demonstrates the feasibility of integrating epigenomics, radiology, and digital pathology for a national cohort, thereby leveraging existing samples and data collected as part of routine clinical practice.
Collapse
|
42
|
Seo GY, Shui JW, Takahashi D, Song C, Wang Q, Kim K, Mikulski Z, Chandra S, Giles DA, Zahner S, Kim PH, Cheroutre H, Colonna M, Kronenberg M. LIGHT-HVEM Signaling in Innate Lymphoid Cell Subsets Protects Against Enteric Bacterial Infection. Cell Host Microbe 2018; 24:249-260.e4. [PMID: 30092201 PMCID: PMC6132068 DOI: 10.1016/j.chom.2018.07.008] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 05/19/2018] [Accepted: 07/16/2018] [Indexed: 01/25/2023]
Abstract
Innate lymphoid cells (ILCs) are important regulators of early infection at mucosal barriers. ILCs are divided into three groups based on expression profiles, and are activated by cytokines and neuropeptides. Yet, it remains unknown if ILCs integrate other signals in providing protection. We show that signaling through herpes virus entry mediator (HVEM), a member of the tumor necrosis factor (TNF) receptor superfamily, in ILC3 is important for host defense against oral infection with the bacterial pathogen Yersinia enterocolitica. HVEM stimulates protective interferon-γ (IFN-γ) secretion from ILCs, and mice with HVEM-deficient ILC3 exhibit reduced IFN-γ production, higher bacterial burdens and increased mortality. In addition, IFN-γ production is critical as adoptive transfer of wild-type but not IFN-γ-deficient ILC3 can restore protection to mice lacking ILCs. We identify the TNF superfamily member, LIGHT, as the ligand inducing HVEM signals in ILCs. Thus HVEM signaling mediated by LIGHT plays a critical role in regulating ILC3-derived IFN-γ production for protection following infection. VIDEO ABSTRACT.
Collapse
MESH Headings
- Adoptive Transfer
- Adult
- Animals
- Cytokines/metabolism
- Disease Models, Animal
- Enterobacteriaceae Infections/pathology
- Enterobacteriaceae Infections/prevention & control
- Homeodomain Proteins/genetics
- Homeodomain Proteins/metabolism
- Host-Pathogen Interactions/immunology
- Host-Pathogen Interactions/physiology
- Humans
- Interferon-gamma/metabolism
- Lymphocytes/immunology
- Lymphocytes/metabolism
- Male
- Mice
- Mice, Inbred C57BL
- Mice, Knockout
- Neuropeptides/metabolism
- Protein Transport
- Receptors, CCR6/genetics
- Receptors, CCR6/metabolism
- Receptors, Tumor Necrosis Factor/metabolism
- Receptors, Tumor Necrosis Factor, Member 14/immunology
- Receptors, Tumor Necrosis Factor, Member 14/metabolism
- Signal Transduction
- Spleen/microbiology
- Spleen/pathology
- Tumor Necrosis Factor Ligand Superfamily Member 14/metabolism
- Yersinia enterocolitica/pathogenicity
Collapse
Affiliation(s)
- Goo-Young Seo
- Division of Developmental Immunology, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA; Department of Molecular Bioscience, School of Biomedical Science and Institute of Bioscience and Biotechnology, Kangwon National University, Chuncheon 24341, Republic of Korea
| | - Jr-Wen Shui
- Division of Developmental Immunology, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Daisuke Takahashi
- Division of Developmental Immunology, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Christina Song
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Qingyang Wang
- Division of Developmental Immunology, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Kenneth Kim
- Division of Inflammation Biology, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Zbigniew Mikulski
- Microscopy and Histology Core, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Shilpi Chandra
- Division of Developmental Immunology, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Daniel A Giles
- Division of Developmental Immunology, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Sonja Zahner
- Division of Developmental Immunology, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Pyeung-Hyeun Kim
- Department of Molecular Bioscience, School of Biomedical Science and Institute of Bioscience and Biotechnology, Kangwon National University, Chuncheon 24341, Republic of Korea
| | - Hilde Cheroutre
- Division of Developmental Immunology, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Marco Colonna
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Mitchell Kronenberg
- Division of Developmental Immunology, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA; Division of Biology, University of California San Diego, La Jolla, CA 92037, USA.
| |
Collapse
|
43
|
Papastamoulis P, Rattray M. Bayesian estimation of differential transcript usage from RNA-seq data. Stat Appl Genet Mol Biol 2018; 16:367-386. [PMID: 29091583 DOI: 10.1515/sagmb-2017-0005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace's approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.
Collapse
|
44
|
Love MI, Soneson C, Patro R. Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification. F1000Res 2018; 7:952. [PMID: 30356428 PMCID: PMC6178912 DOI: 10.12688/f1000research.15398.1] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/22/2018] [Indexed: 12/25/2022] Open
Abstract
Detection of differential transcript usage (DTU) from RNA-seq data is an important bioinformatic analysis that complements differential gene expression analysis. Here we present a simple workflow using a set of existing R/Bioconductor packages for analysis of DTU. We show how these packages can be used downstream of RNA-seq quantification using the Salmon software package. The entire pipeline is fast, benefiting from inference steps by Salmon to quantify expression at the transcript level. The workflow includes live, runnable code chunks for analysis using DRIMSeq and DEXSeq, as well as for performing two-stage testing of DTU using the stageR package, a statistical framework to screen at the gene level and then confirm which transcripts within the significant genes show evidence of DTU. We evaluate these packages and other related packages on a simulated dataset with parameters estimated from real data.
Collapse
Affiliation(s)
- Michael I. Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
| | - Charlotte Soneson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Rob Patro
- Department of Computer Science, Stony Brook University, Stony Brook, NY, 11794, USA
| |
Collapse
|
45
|
Baumgartner C, Toifl S, Farlik M, Halbritter F, Scheicher R, Fischer I, Sexl V, Bock C, Baccarini M. An ERK-Dependent Feedback Mechanism Prevents Hematopoietic Stem Cell Exhaustion. Cell Stem Cell 2018; 22:879-892.e6. [PMID: 29804890 PMCID: PMC5988582 DOI: 10.1016/j.stem.2018.05.003] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Revised: 03/08/2018] [Accepted: 05/04/2018] [Indexed: 11/22/2022]
Abstract
Hematopoietic stem cells (HSCs) sustain hematopoiesis throughout life. HSCs exit dormancy to restore hemostasis in response to stressful events, such as acute blood loss, and must return to a quiescent state to prevent their exhaustion and resulting bone marrow failure. HSC activation is driven in part through the phosphatidylinositol 3-kinase (PI3K)/AKT/mTORC1 signaling pathway, but less is known about the cell-intrinsic pathways that control HSC dormancy. Here, we delineate an ERK-dependent, rate-limiting feedback mechanism that controls HSC fitness and their re-entry into quiescence. We show that the MEK/ERK and PI3K pathways are synchronously activated in HSCs during emergency hematopoiesis and that feedback phosphorylation of MEK1 by activated ERK counterbalances AKT/mTORC1 activation. Genetic or chemical ablation of this feedback loop tilts the balance between HSC dormancy and activation, increasing differentiated cell output and accelerating HSC exhaustion. These results suggest that MEK inhibitors developed for cancer therapy may find additional utility in controlling HSC activation.
Collapse
Affiliation(s)
- Christian Baumgartner
- Department of Microbiology, Immunobiology and Genetics, Center for Molecular Biology of the University of Vienna, Max F. Perutz Laboratories, Vienna Biocenter (VBC), 1030 Vienna, Austria
| | - Stefanie Toifl
- Department of Microbiology, Immunobiology and Genetics, Center for Molecular Biology of the University of Vienna, Max F. Perutz Laboratories, Vienna Biocenter (VBC), 1030 Vienna, Austria
| | - Matthias Farlik
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Florian Halbritter
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
| | - Ruth Scheicher
- Department for Biomedical Sciences, Institute of Pharmacology and Toxicology, University of Veterinary Medicine, 1210 Vienna, Austria
| | - Irmgard Fischer
- Department of Microbiology, Immunobiology and Genetics, Center for Molecular Biology of the University of Vienna, Max F. Perutz Laboratories, Vienna Biocenter (VBC), 1030 Vienna, Austria
| | - Veronika Sexl
- Department for Biomedical Sciences, Institute of Pharmacology and Toxicology, University of Veterinary Medicine, 1210 Vienna, Austria
| | - Christoph Bock
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria; Department of Laboratory Medicine, Medical University of Vienna, Vienna, Austria; Saarland Informatics Campus, Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Manuela Baccarini
- Department of Microbiology, Immunobiology and Genetics, Center for Molecular Biology of the University of Vienna, Max F. Perutz Laboratories, Vienna Biocenter (VBC), 1030 Vienna, Austria.
| |
Collapse
|
46
|
Li M, Xie X, Zhou J, Sheng M, Yin X, Ko EA, Zhou T, Gu W. Quantifying circular RNA expression from RNA-seq data using model-based framework. Bioinformatics 2018; 33:2131-2139. [PMID: 28334396 DOI: 10.1093/bioinformatics/btx129] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Accepted: 03/07/2017] [Indexed: 11/13/2022] Open
Abstract
Motivation Circular RNAs (circRNAs) are a class of non-coding RNAs that are widely expressed in various cell lines and tissues of many organisms. Although the exact function of many circRNAs is largely unknown, the cell type-and tissue-specific circRNA expression has implicated their crucial functions in many biological processes. Hence, the quantification of circRNA expression from high-throughput RNA-seq data is becoming important to ascertain. Although many model-based methods have been developed to quantify linear RNA expression from RNA-seq data, these methods are not applicable to circRNA quantification. Results Here, we proposed a novel strategy that transforms circular transcripts to pseudo-linear transcripts and estimates the expression values of both circular and linear transcripts using an existing model-based algorithm, Sailfish. The new strategy can accurately estimate transcript expression of both linear and circular transcripts from RNA-seq data. Several factors, such as gene length, amount of expression and the ratio of circular to linear transcripts, had impacts on quantification performance of circular transcripts. In comparison to count-based tools, the new computational framework had superior performance in estimating the amount of circRNA expression from both simulated and real ribosomal RNA-depleted (rRNA-depleted) RNA-seq datasets. On the other hand, the consideration of circular transcripts in expression quantification from rRNA-depleted RNA-seq data showed substantial increased accuracy of linear transcript expression. Our proposed strategy was implemented in a program named Sailfish-cir. Availability and Implementation Sailfish-cir is freely available at https://github.com/zerodel/Sailfish-cir . Contact tongz@medicine.nevada.edu or wanjun.gu@gmail.com. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Musheng Li
- State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Xueying Xie
- State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Jing Zhou
- Research Center for Learning Sciences, Southeast University, Nanjing, Jiangsu, China
| | - Mengying Sheng
- State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Xiaofeng Yin
- Department of Orthopedics and Trauma, Peking University People's Hospital, Beijing, China
| | - Eun-A Ko
- Department of Physiology and Cell Biology, The University of Nevada School of Medicine, Reno, NV, USA
| | - Tong Zhou
- Department of Physiology and Cell Biology, The University of Nevada School of Medicine, Reno, NV, USA
| | - Wanjun Gu
- State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing, Jiangsu, China
| |
Collapse
|
47
|
Bayega A, Fahiminiya S, Oikonomopoulos S, Ragoussis J. Current and Future Methods for mRNA Analysis: A Drive Toward Single Molecule Sequencing. Methods Mol Biol 2018; 1783:209-241. [PMID: 29767365 DOI: 10.1007/978-1-4939-7834-2_11] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The transcriptome encompasses a range of species including messenger RNA, and other noncoding RNA such as rRNA, tRNA, and short and long noncoding RNAs. Due to the huge role played by mRNA in development and disease, several methods have been developed to sequence and characterize mRNA, with RNA sequencing (RNA-Seq) emerging as the current method of choice particularly for large high-throughput studies. Short-read RNA-Seq which involves sequencing of short cDNA fragments and computationally assembling them to reconstruct the transcriptome, or aligning them to a reference is the most widely used approach. However, due to inherent limitations of this approach in de novo transcriptome assembly and isoform quantification, long-read RNA-Seq approaches, which also happen to be single molecule sequencing approaches, are increasingly becoming the standard for de novo transcriptome assembly and isoform quantification. In this chapter, we review the technical aspects of the current methods of RNA-Seq, both short and long-read approaches, and data analysis methods available. We discuss recent advances in single-cell RNA-Seq and direct RNA-Seq approaches, which perhaps will dominate the future of RNA-Seq.
Collapse
Affiliation(s)
- Anthony Bayega
- McGill University and Genome Quebec Innovation Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada
| | | | - Spyros Oikonomopoulos
- McGill University and Genome Quebec Innovation Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada
| | - Jiannis Ragoussis
- McGill University and Genome Quebec Innovation Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada.
- Department of Bioengineering, McGill University, Montréal, QC, Canada.
- Cancer and Mutagen Unit, Department of Biochemistry, Center of Innovation in Personalized Medicine, King Fahd Center for Medical Research, King Abdulaziz University, Jeddah, Saudi Arabia.
| |
Collapse
|
48
|
Bronowski C, Mustafa K, Goodhead I, James CE, Nelson C, Lucaci A, Wigley P, Humphrey TJ, Williams NJ, Winstanley C. Campylobacter jejuni transcriptome changes during loss of culturability in water. PLoS One 2017; 12:e0188936. [PMID: 29190673 PMCID: PMC5708674 DOI: 10.1371/journal.pone.0188936] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 11/15/2017] [Indexed: 12/21/2022] Open
Abstract
Background Water serves as a potential reservoir for Campylobacter, the leading cause of bacterial gastroenteritis in humans. However, little is understood about the mechanisms underlying variations in survival characteristics between different strains of C. jejuni in natural environments, including water. Results We identified three Campylobacter jejuni strains that exhibited variability in their ability to retain culturability after suspension in tap water at two different temperatures (4°C and 25°C). Of the three, strains C. jejuni M1 exhibited the most rapid loss of culturability whilst retaining viability. Using RNAseq transcriptomics, we characterised C. jejuni M1 gene expression in response to suspension in water by analyzing bacterial suspensions recovered immediately after introduction into water (Time 0), and from two sampling time/temperature combinations where considerable loss of culturability was evident, namely (i) after 24 h at 25°C, and (ii) after 72 h at 4°C. Transcript data were compared with a culture-grown control. Some gene expression characteristics were shared amongst the three populations recovered from water, with more genes being up-regulated than down. Many of the up-regulated genes were identified in the Time 0 sample, whereas the majority of down-regulated genes occurred in the 25°C (24 h) sample. Conclusions Variations in expression were found amongst genes associated with oxygen tolerance, starvation and osmotic stress. However, we also found upregulation of flagellar assembly genes, accompanied by down-regulation of genes involved in chemotaxis. Our data also suggested a switch from secretion via the sec system to via the tat system, and that the quorum sensing gene luxS may be implicated in the survival of strain M1 in water. Variations in gene expression also occurred in accessory genome regions. Our data suggest that despite the loss of culturability, C. jejuni M1 remains viable and adapts via specific changes in gene expression.
Collapse
Affiliation(s)
- Christina Bronowski
- Institute of Infection and Global Health, University of Liverpool, Liverpool, United Kingdom
| | - Kasem Mustafa
- Institute of Infection and Global Health, University of Liverpool, Liverpool, United Kingdom
| | - Ian Goodhead
- School of Environment and Life Sciences, University of Salford, Salford, United Kingdom
| | - Chloe E. James
- School of Environment and Life Sciences, University of Salford, Salford, United Kingdom
| | - Charlotte Nelson
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Anita Lucaci
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Paul Wigley
- Institute of Infection and Global Health, University of Liverpool, Liverpool, United Kingdom
| | - Tom J. Humphrey
- Medical Microbiology and Infectious Diseases, School of Medicine, Swansea University, Swansea, United Kingdom
| | - Nicola J. Williams
- Institute of Infection and Global Health, University of Liverpool, Liverpool, United Kingdom
| | - Craig Winstanley
- Institute of Infection and Global Health, University of Liverpool, Liverpool, United Kingdom
- * E-mail:
| | | |
Collapse
|
49
|
Peng H, Yang Y, Zhe S, Wang J, Gribskov M, Qi Y. DEIsoM: a hierarchical Bayesian model for identifying differentially expressed isoforms using biological replicates. Bioinformatics 2017; 33:3018-3027. [PMID: 28595376 PMCID: PMC5870796 DOI: 10.1093/bioinformatics/btx357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Accepted: 06/02/2017] [Indexed: 11/18/2022] Open
Abstract
Motivation High-throughput mRNA sequencing (RNA-Seq) is a powerful tool for quantifying gene expression. Identification of transcript isoforms that are differentially expressed in different conditions, such as in patients and healthy subjects, can provide insights into the molecular basis of diseases. Current transcript quantification approaches, however, do not take advantage of the shared information in the biological replicates, potentially decreasing sensitivity and accuracy. Results We present a novel hierarchical Bayesian model called Differentially Expressed Isoform detection from Multiple biological replicates (DEIsoM) for identifying differentially expressed (DE) isoforms from multiple biological replicates representing two conditions, e.g. multiple samples from healthy and diseased subjects. DEIsoM first estimates isoform expression within each condition by (1) capturing common patterns from sample replicates while allowing individual differences, and (2) modeling the uncertainty introduced by ambiguous read mapping in each replicate. Specifically, we introduce a Dirichlet prior distribution to capture the common expression pattern of replicates from the same condition, and treat the isoform expression of individual replicates as samples from this distribution. Ambiguous read mapping is modeled as a multinomial distribution, and ambiguous reads are assigned to the most probable isoform in each replicate. Additionally, DEIsoM couples an efficient variational inference and a post-analysis method to improve the accuracy and speed of identification of DE isoforms over alternative methods. Application of DEIsoM to an hepatocellular carcinoma (HCC) dataset identifies biologically relevant DE isoforms. The relevance of these genes/isoforms to HCC are supported by principal component analysis (PCA), read coverage visualization, and the biological literature. Availability and implementation The software is available at https://github.com/hao-peng/DEIsoM Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Yifan Yang
- Department of Computer Science.,Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | | | - Jian Wang
- Eli Lilly and Company, Indianapolis, IN 46285, USA
| | - Michael Gribskov
- Department of Computer Science.,Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - Yuan Qi
- Department of Computer Science.,Department of Statistics, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
50
|
Dowsey AW. The need for statistical contributions to bioinformatics at scale, with illustration to mass spectrometry. STAT MODEL 2017. [DOI: 10.1177/1471082x17708519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In their article, Morris and Baladandayuthapani clearly evidence the influence of statisticians in recent methodological advances throughout the bioinformatics pipeline and advocate for the expansion of this role. The latest acquisition platforms, such as next generation sequencing (genomics/transcriptomics) and hyphenated mass spectrometry (proteomics/metabolomics), output raw datasets in the order of gigabytes; it is not unusual to acquire a terabyte or more of data per study. The increasing computational burden this brings is a further impediment against the use of statistically rigorous methodology in the pre-processing stages of the bioinformatics pipeline. In this discussion I describe the mass spectrometry pipeline and use it as an example to show that beneath this challenge lies a two-fold opportunity: (a) Biological complexity and dynamic range is still well beyond what is captured by current processing methodology; hence, potential biomarkers and mechanistic insights are consistently missed; (b) Statistical science could play a larger role in optimizing the acquisition process itself. Data rates will continue to increase as routine clinical omics analysis moves to large-scale facilities with systematic, standardized protocols. Key inferential gains will be achieved by borrowing strength across the sum total of all analyzed studies, a task best underpinned by appropriate statistical modelling.
Collapse
Affiliation(s)
- Andrew W Dowsey
- School of Social & Community Medicine and School of Veterinary Sciences, Faculty of Health Sciences, University of Bristol, United Kingdom
| |
Collapse
|