1
|
Kuang N, Ma Q, Zheng X, Meng X, Zhai Z, Li Q, Pan J. GeTeSEPdb: A comprehensive database and online tool for the identification and analysis of gene profiles with temporal-specific expression patterns. Comput Struct Biotechnol J 2024; 23:2488-2496. [PMID: 38939556 PMCID: PMC11208770 DOI: 10.1016/j.csbj.2024.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 05/29/2024] [Accepted: 06/04/2024] [Indexed: 06/29/2024] Open
Abstract
Gene expression is dynamic and varies at different stages of processes. The identification of gene profiles with temporal-specific expression patterns can provide valuable insights into ongoing biological processes, such as the cell cycle, cell development, circadian rhythms, or responses to external stimuli such as drug treatments or viral infections. However, currently, no database defines, identifies or archives gene profiles with temporal-specific expression patterns. Here, using a high-throughput regression analysis approach, eight linear and nonlinear parametric models were fitted to gene expression profiles from time-series experiments to identify eight types of gene profiles with temporal-specific expression patterns. We curated 2684 time-series transcriptome datasets and identified 2644,370 gene profiles exhibiting temporal-specific expression patterns. The results were stored in the database GeTeSEPdb (gene profiles with temporal-specific expression patterns database, http://www.inbirg.com/GeTeSEPdb/). Moreover, we implemented an online tool to identify gene profiles with temporal-specific expression patterns from user-submitted data. In summary, GeTeSEPdb is a comprehensive web service that can be used to identify and analyse gene profiles with temporal-specific expression patterns. This approach facilitates the exploration of transcriptional changes and temporal patterns of responses. We firmly believe that GeTeSEPdb will become a valuable resource for biologists and bioinformaticians.
Collapse
Affiliation(s)
- Ni Kuang
- Basic Medicine Research and Innovation Center for Novel Target and Therapeutic Intervention, Ministry of Education, Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China
| | - Qinfeng Ma
- Basic Medicine Research and Innovation Center for Novel Target and Therapeutic Intervention, Ministry of Education, Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China
| | - Xiao Zheng
- Basic Medicine Research and Innovation Center for Novel Target and Therapeutic Intervention, Ministry of Education, Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China
| | - Xuehang Meng
- Basic Medicine Research and Innovation Center for Novel Target and Therapeutic Intervention, Ministry of Education, Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China
| | - Zhaoyu Zhai
- Basic Medicine Research and Innovation Center for Novel Target and Therapeutic Intervention, Ministry of Education, Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China
| | - Qiang Li
- Basic Medicine Research and Innovation Center for Novel Target and Therapeutic Intervention, Ministry of Education, Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China
| | - Jianbo Pan
- Basic Medicine Research and Innovation Center for Novel Target and Therapeutic Intervention, Ministry of Education, Institute of Life Sciences, Chongqing Medical University, Chongqing 400016, China
- Precision Medicine Center, The Second Affiliated Hospital of Chongqing Medical University, Chongqing 400010, China
| |
Collapse
|
2
|
Bohn T, Balbuena E, Ulus H, Iddir M, Wang G, Crook N, Eroglu A. Carotenoids in Health as Studied by Omics-Related Endpoints. Adv Nutr 2023; 14:1538-1578. [PMID: 37678712 PMCID: PMC10721521 DOI: 10.1016/j.advnut.2023.09.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/25/2023] [Accepted: 09/01/2023] [Indexed: 09/09/2023] Open
Abstract
Carotenoids have been associated with risk reduction for several chronic diseases, including the association of their dietary intake/circulating levels with reduced incidence of obesity, type 2 diabetes, certain types of cancer, and even lower total mortality. In addition to some carotenoids constituting vitamin A precursors, they are implicated in potential antioxidant effects and pathways related to inflammation and oxidative stress, including transcription factors such as nuclear factor κB and nuclear factor erythroid 2-related factor 2. Carotenoids and metabolites may also interact with nuclear receptors, mainly retinoic acid receptor/retinoid X receptor and peroxisome proliferator-activated receptors, which play a role in the immune system and cellular differentiation. Therefore, a large number of downstream targets are likely influenced by carotenoids, including but not limited to genes and proteins implicated in oxidative stress and inflammation, antioxidation, and cellular differentiation processes. Furthermore, recent studies also propose an association between carotenoid intake and gut microbiota. While all these endpoints could be individually assessed, a more complete/integrative way to determine a multitude of health-related aspects of carotenoids includes (multi)omics-related techniques, especially transcriptomics, proteomics, lipidomics, and metabolomics, as well as metagenomics, measured in a variety of biospecimens including plasma, urine, stool, white blood cells, or other tissue cellular extracts. In this review, we highlight the use of omics technologies to assess health-related effects of carotenoids in mammalian organisms and models.
Collapse
Affiliation(s)
- Torsten Bohn
- Nutrition and Health Research Group, Department of Precision Health, Luxembourg Institute of Health, Strassen, Luxembourg.
| | - Emilio Balbuena
- Department of Molecular and Structural Biochemistry, College of Agriculture and Life Sciences, North Carolina State University, Raleigh, NC, United States; Plants for Human Health Institute, North Carolina Research Campus, North Carolina State University, Kannapolis, NC, United States
| | - Hande Ulus
- Plants for Human Health Institute, North Carolina Research Campus, North Carolina State University, Kannapolis, NC, United States
| | - Mohammed Iddir
- Nutrition and Health Research Group, Department of Precision Health, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Genan Wang
- Department of Chemical and Biomolecular Engineering, College of Engineering, North Carolina State University, Raleigh, NC, United States
| | - Nathan Crook
- Department of Chemical and Biomolecular Engineering, College of Engineering, North Carolina State University, Raleigh, NC, United States
| | - Abdulkerim Eroglu
- Department of Molecular and Structural Biochemistry, College of Agriculture and Life Sciences, North Carolina State University, Raleigh, NC, United States; Plants for Human Health Institute, North Carolina Research Campus, North Carolina State University, Kannapolis, NC, United States.
| |
Collapse
|
3
|
Lee DJ, Kim HY, Lee SJ, Jung HS. Spatiotemporal Changes in Transcriptome of Odontogenic and Non-odontogenic Regions in the Dental Arch of Mus musculus. Front Cell Dev Biol 2021; 9:723326. [PMID: 34722506 PMCID: PMC8551760 DOI: 10.3389/fcell.2021.723326] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 09/16/2021] [Indexed: 11/13/2022] Open
Abstract
Over the past 40 years, studies on tooth regeneration have been conducted. These studies comprised two main flows: some focused on epithelial-mesenchymal interaction in the odontogenic region, whereas others focused on creating a supernumerary tooth in the non-odontogenic region. Recently, the scope of the research has moved from conventional gene modification and molecular therapy to genome and transcriptome sequencing analyses. However, these sequencing data have been produced only in the odontogenic region. We provide RNA-Seq data of not only the odontogenic region but also the non-odontogenic region, which loses tooth-forming capacity during development and remains a rudiment. Sequencing data were collected from mouse embryos at three different stages of tooth development. These data will expand our understanding of tooth development and will help in designing developmental and regenerative studies from a new perspective.
Collapse
Affiliation(s)
- Dong-Joon Lee
- Division in Anatomy and Developmental Biology, Department of Oral Biology, Taste Research Center, Oral Science Research Center, BK21 FOUR Project, Yonsei University College of Dentistry, Seoul, South Korea
| | - Hyun-Yi Kim
- Division in Anatomy and Developmental Biology, Department of Oral Biology, Taste Research Center, Oral Science Research Center, BK21 FOUR Project, Yonsei University College of Dentistry, Seoul, South Korea
- NGeneS Inc., Ansan-si, South Korea
| | - Seung-Jun Lee
- Division in Anatomy and Developmental Biology, Department of Oral Biology, Taste Research Center, Oral Science Research Center, BK21 FOUR Project, Yonsei University College of Dentistry, Seoul, South Korea
| | - Han-Sung Jung
- Division in Anatomy and Developmental Biology, Department of Oral Biology, Taste Research Center, Oral Science Research Center, BK21 FOUR Project, Yonsei University College of Dentistry, Seoul, South Korea
| |
Collapse
|
4
|
Habowski AN, Habowski TJ, Waterman ML. GECO: gene expression clustering optimization app for non-linear data visualization of patterns. BMC Bioinformatics 2021; 22:29. [PMID: 33494695 PMCID: PMC7831185 DOI: 10.1186/s12859-020-03951-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 12/28/2020] [Indexed: 01/23/2023] Open
Abstract
Background Due to continued advances in sequencing technology, the limitation in understanding biological systems through an “-omics” lens is no longer the generation of data, but the ability to analyze it. Importantly, much of this rich -omics data is publicly available waiting to be further investigated. Although many code-based pipelines exist, there is a lack of user-friendly and accessible applications that enable rapid analysis or visualization of data.
Results GECO (Gene Expression Clustering Optimization; http://www.theGECOapp.com) is a minimalistic GUI app that utilizes non-linear reduction techniques to rapidly visualize expression trends in many types of biological data matrices (such as bulk RNA-seq or proteomics). The required input is a data matrix with samples and any type of expression level of genes/protein/other with a unique ID. The output is an interactive t-SNE or UMAP analysis that clusters genes (or proteins/other unique IDs) based on their expression patterns across the multiple samples enabling visualization of expression trends. Customizable settings for dimensionality reduction, data normalization, along with visualization parameters including coloring and filters, ensure adaptability to a variety of user uploaded data. Conclusion This local and cloud-hosted web browser app enables investigation of any -omic data matrix in a rapid and code-independent manner. With the continued growth of available -omic data, the ability to quickly evaluate a dataset, including specific genes of interest, is more important than ever. GECO is intended to supplement traditional statistical analysis methods and is particularly useful when visualizing clusters of genes with similar trajectories across many samples (ex: multiple cell types, time course, dose response). Users will be empowered to investigate -omic data with a new lens of visualization and analysis that has the potential to uncover genes of interest, cohorts of co-regulated genes programs, and previously undetected patterns of expression.
Collapse
Affiliation(s)
- A N Habowski
- Department of Microbiology and Molecular Genetics, University of California Irvine, Irvine, CA, 92697, USA.
| | - T J Habowski
- Department of Microbiology and Molecular Genetics, University of California Irvine, Irvine, CA, 92697, USA
| | - M L Waterman
- Department of Microbiology and Molecular Genetics, University of California Irvine, Irvine, CA, 92697, USA
| |
Collapse
|
5
|
Cao M, Zhou W, Breidt FJ, Peers G. Large scale maximum average power multiple inference on time‐course count data with application to RNA‐seq analysis. Biometrics 2019; 76:9-22. [DOI: 10.1111/biom.13144] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 08/28/2019] [Indexed: 11/30/2022]
Affiliation(s)
- Meng Cao
- Department of Statistics Colorado State University Fort Collins Colorado
| | - Wen Zhou
- Department of Statistics Colorado State University Fort Collins Colorado
| | - F. Jay Breidt
- Department of Statistics Colorado State University Fort Collins Colorado
| | - Graham Peers
- Department of Biology Colorado State University Fort Collins Colorado
| |
Collapse
|
6
|
Abstract
Identification of differentially expressed genes has been a high priority task of downstream analyses to further advances in biomedical research. Investigators have been faced with an array of issues in dealing with more complicated experiments and metadata, including batch effects, normalization, temporal dynamics (temporally differential expression), and isoform diversity (isoform-level quantification and differential splicing events). To date, there are currently no standard approaches to precisely and efficiently analyze these moderate or large-scale experimental designs, especially with combined metadata. In this report, we propose comprehensive analytical pipelines to precisely characterize temporal dynamics in differential expression of genes and other genomic features, i.e., the variability of transcripts, isoforms and exons, by controlling batch effects and other nuisance factors that could have significant confounding effects on the main effects of interest in comparative models and may result in misleading interpretations.
Collapse
|
7
|
Willis SD, Hossian AKMN, Evans N, Hickman MJ. Measuring mRNA Levels Over Time During the Yeast S. cerevisiae Hypoxic Response. J Vis Exp 2017:56226. [PMID: 28829420 PMCID: PMC5614221 DOI: 10.3791/56226] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open
Abstract
Complex changes in gene expression typically mediate a large portion of a cellular response. Each gene may change expression with unique kinetics as the gene is regulated by the particular timing of one of many stimuli, signaling pathways or secondary effects. In order to capture the entire gene expression response to hypoxia in the yeast S. cerevisiae, RNA-seq analysis was used to monitor the mRNA levels of all genes at specific times after exposure to hypoxia. Hypoxia was established by growing cells in ~100% N2 gas. Importantly, unlike other hypoxic studies, ergosterol and unsaturated fatty acids were not added to the media because these metabolites affect gene expression. Time points were chosen in the range of 0 - 4 h after hypoxia because that period captures the major changes in gene expression. At each time point, mid-log hypoxic cells were quickly filtered and frozen, limiting exposure to O2 and concomitant changes in gene expression. Total RNA was extracted from cells and used to enrich for mRNA, which was then converted to cDNA. From this cDNA, multiplex libraries were created and eight or more samples were sequenced in one lane of a next-generation sequencer. A post-sequencing pipeline is described, which includes quality base trimming, read mapping and determining the number of reads per gene. DESeq2 within the R statistical environment was used to identify genes that change significantly at any one of the hypoxic time points. Analysis of three biological replicates revealed high reproducibility, genes of differing kinetics and a large number of expected O2-regulated genes. These methods can be used to study how the cells of various organisms respond to hypoxia over time and adapted to study gene expression during other cellular responses.
Collapse
Affiliation(s)
- Stephen D Willis
- Department of Molecular Biology, Rowan School of Osteopathic Medicine
| | | | - Nathan Evans
- Department of Biological Sciences, Rowan University
| | | |
Collapse
|
8
|
Nascimento M, Silva FFE, Sáfadi T, Nascimento ACC, Ferreira TEM, Barroso LMA, Ferreira Azevedo C, Guimarães SEF, Serão NVL. Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data. PLoS One 2017; 12:e0181195. [PMID: 28715507 PMCID: PMC5513449 DOI: 10.1371/journal.pone.0181195] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 06/27/2017] [Indexed: 11/19/2022] Open
Abstract
Gene expression time series (GETS) analysis aims to characterize sets of genes according to their longitudinal patterns of expression. Due to the large number of genes evaluated in GETS analysis, an useful strategy to summarize biological functional processes and regulatory mechanisms is through clustering of genes that present similar expression pattern over time. Traditional cluster methods usually ignore the challenges in GETS, such as the lack of data normality and small number of temporal observations. Independent Component Analysis (ICA) is a statistical procedure that uses a transformation to convert raw time series data into sets of values of independent variables, which can be used for cluster analysis to identify sets of genes with similar temporal expression patterns. ICA allows clustering small series of distribution-free data while accounting for the dependence between subsequent time-points. Using temporal simulated and real (four libraries of two pig breeds at 21, 40, 70 and 90 days of gestation) RNA-seq data set we present a methodology (ICAclust) that jointly considers independent components analysis (ICA) and a hierarchical method for clustering GETS. We compare ICAclust results with those obtained for K-means clustering. ICAclust presented, on average, an absolute gain of 5.15% over the best K-means scenario. Considering the worst scenario for K-means, the gain was of 84.85%, when compared with the best ICAclust result. For the real data set, genes were grouped into six distinct clusters with 89, 51, 153, 67, 40, and 58 genes each, respectively. In general, it can be observed that the 6 clusters presented very distinct expression patterns. Overall, the proposed two-step clustering method (ICAclust) performed well compared to K-means, a traditional method used for cluster analysis of temporal gene expression data. In ICAclust, genes with similar expression pattern over time were clustered together.
Collapse
Affiliation(s)
- Moysés Nascimento
- Department of Statistics, Federal University of Viçosa, Viçosa, Minas Gerais, Brazil
| | | | - Thelma Sáfadi
- Department of Exact Sciences, Federal University of Lavras, Lavras, Minas Gerais, Brazil
| | | | | | | | | | | | | |
Collapse
|
9
|
Oh S, Song S. Differential Gene Expression (DEX) and Alternative Splicing Events (ASE) for Temporal Dynamic Processes Using HMMs and Hierarchical Bayesian Modeling Approaches. Methods Mol Biol 2017; 1552:165-176. [PMID: 28224498 DOI: 10.1007/978-1-4939-6753-7_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In gene expression profile, data analysis pipeline is categorized into four levels, major downstream tasks, i.e., (1) identification of differential expression; (2) clustering co-expression patterns; (3) classification of subtypes of samples; and (4) detection of genetic regulatory networks, are performed posterior to preprocessing procedure such as normalization techniques. To be more specific, temporal dynamic gene expression data has its inherent feature, namely, two neighboring time points (previous and current state) are highly correlated with each other, compared to static expression data which samples are assumed as independent individuals. In this chapter, we demonstrate how HMMs and hierarchical Bayesian modeling methods capture the horizontal time dependency structures in time series expression profiles by focusing on the identification of differential expression. In addition, those differential expression genes and transcript variant isoforms over time detected in core prerequisite steps can be generally further applied in detection of genetic regulatory networks to comprehensively uncover dynamic repertoires in the aspects of system biology as the coupled framework.
Collapse
Affiliation(s)
- Sunghee Oh
- Department of Computer Science and Statistics, Jeju National University, Jeju City, 690-756, South Korea.
| | - Seongho Song
- Department of Mathematical Science, University of Cincinnati, Cincinnati, OH, 45221-0025, USA
| |
Collapse
|
10
|
Sun X, Dalpiaz D, Wu D, S Liu J, Zhong W, Ma P. Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model. BMC Bioinformatics 2016; 17:324. [PMID: 27565575 PMCID: PMC5002174 DOI: 10.1186/s12859-016-1180-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 08/11/2016] [Indexed: 02/05/2023] Open
Abstract
Background Accurate identification of differentially expressed (DE) genes in time course RNA-Seq data is crucial for understanding the dynamics of transcriptional regulatory network. However, most of the available methods treat gene expressions at different time points as replicates and test the significance of the mean expression difference between treatments or conditions irrespective of time. They thus fail to identify many DE genes with different profiles across time. In this article, we propose a negative binomial mixed-effect model (NBMM) to identify DE genes in time course RNA-Seq data. In the NBMM, mean gene expression is characterized by a fixed effect, and time dependency is described by random effects. The NBMM is very flexible and can be fitted to both unreplicated and replicated time course RNA-Seq data via a penalized likelihood method. By comparing gene expression profiles over time, we further classify the DE genes into two subtypes to enhance the understanding of expression dynamics. A significance test for detecting DE genes is derived using a Kullback-Leibler distance ratio. Additionally, a significance test for gene sets is developed using a gene set score. Results Simulation analysis shows that the NBMM outperforms currently available methods for detecting DE genes and gene sets. Moreover, our real data analysis of fruit fly developmental time course RNA-Seq data demonstrates the NBMM identifies biologically relevant genes which are well justified by gene ontology analysis. Conclusions The proposed method is powerful and efficient to detect biologically relevant DE genes and gene sets in time course RNA-Seq data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1180-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoxiao Sun
- Department of Statistics, University of Georgia, 101 Cedar Street, Athens, 30602, USA
| | - David Dalpiaz
- Department of Statistics, University of Illinois at Urbana-Champaign, 725 South Wright Street, Champaign, 61820, USA
| | - Di Wu
- Department of Statistics, Harvard University, One Oxford Street, Cambridge, 02138, USA
| | - Jun S Liu
- Department of Statistics, Harvard University, One Oxford Street, Cambridge, 02138, USA
| | - Wenxuan Zhong
- Department of Statistics, University of Georgia, 101 Cedar Street, Athens, 30602, USA
| | - Ping Ma
- Department of Statistics, University of Georgia, 101 Cedar Street, Athens, 30602, USA.
| |
Collapse
|
11
|
Heera R, Sivachandran P, Chinni SV, Mason J, Croft L, Ravichandran M, Yin LS. Efficient extraction of small and large RNAs in bacteria for excellent total RNA sequencing and comprehensive transcriptome analysis. BMC Res Notes 2015; 8:754. [PMID: 26645211 PMCID: PMC4673735 DOI: 10.1186/s13104-015-1726-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Accepted: 11/20/2015] [Indexed: 11/13/2022] Open
Abstract
Background Next-generation transcriptome sequencing (RNA-Seq) has become the standard practice for studying gene splicing, mutations and changes in gene expression to obtain valuable, accurate biological conclusions. However, obtaining good sequencing coverage and depth to study these is impeded by the difficulties of obtaining high quality total RNA with minimal genomic DNA contamination. With this in mind, we evaluated the performance of Phenol-free total RNA purification kit (Amresco) in comparison with TRI Reagent (MRC) and RNeasy Mini (Qiagen) for the extraction of total RNA of Pseudomonas aeruginosa which was grown in glucose-supplemented (control) and polyethylene-supplemented (growth-limiting condition) minimal medium. All three extraction methods were coupled with an in-house DNase I treatment before the yield, integrity and size distribution of the purified RNA were assessed. RNA samples extracted with the best extraction kit were then sequenced using the Illumina HiSeq 2000 platform. Results TRI Reagent gave the lowest yield enriched with small RNAs (sRNAs), while RNeasy gave moderate yield of good quality RNA with trace amounts of sRNAs. The Phenol-free kit, on the other hand, gave the highest yield and the best quality RNA (RIN value of 9.85 ± 0.3) with good amounts of sRNAs. Subsequent bioinformatic analysis of the sequencing data revealed that 5435 coding genes, 452 sRNAs and 7 potential novel intergenic sRNAs were detected, indicating excellent sequencing coverage across RNA size ranges. In addition, detection of low abundance transcripts and consistency of their expression profiles across replicates from the same conditions demonstrated the reproducibility of the RNA extraction technique. Conclusions Amresco’s Phenol-free Total RNA purification kit coupled with DNase I treatment yielded the highest quality RNAs containing good ratios of high and low molecular weight transcripts with minimal genomic DNA. These RNA extracts gave excellent non-biased sequencing coverage useful for comprehensive total transcriptome sequencing and analysis. Furthermore, our findings would be useful for those interested in studying both coding and non-coding RNAs from precious bacterial samples cultivated in growth-limiting condition, in a single sequencing run. Electronic supplementary material The online version of this article (doi:10.1186/s13104-015-1726-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rajandas Heera
- Department of Biotechnology, Faculty of Applied Sciences, AIMST University, Semeling, 08100, Bedong, Kedah, Malaysia. .,Unit of Biochemistry, Faculty of Medicine, AIMST University, Semeling, 08100, Bedong, Kedah, Malaysia.
| | - Parimannan Sivachandran
- Department of Biotechnology, Faculty of Applied Sciences, AIMST University, Semeling, 08100, Bedong, Kedah, Malaysia.
| | - Suresh V Chinni
- Department of Biotechnology, Faculty of Applied Sciences, AIMST University, Semeling, 08100, Bedong, Kedah, Malaysia.
| | - Joanne Mason
- Malaysian Genomics Resource Centre, 27-9, Level 9 Boulevard Signature Offices, 59200, Mid Valley City, Malaysia. .,Oxford Biomedical Research Centre, Old Road Headington Oxford, Oxfordshire, OX3 7LE, UK.
| | - Larry Croft
- Malaysian Genomics Resource Centre, 27-9, Level 9 Boulevard Signature Offices, 59200, Mid Valley City, Malaysia.
| | - Manickam Ravichandran
- Department of Biotechnology, Faculty of Applied Sciences, AIMST University, Semeling, 08100, Bedong, Kedah, Malaysia.
| | - Lee Su Yin
- Department of Biotechnology, Faculty of Applied Sciences, AIMST University, Semeling, 08100, Bedong, Kedah, Malaysia.
| |
Collapse
|
12
|
Spies D, Ciaudo C. Dynamics in Transcriptomics: Advancements in RNA-seq Time Course and Downstream Analysis. Comput Struct Biotechnol J 2015; 13:469-77. [PMID: 26430493 PMCID: PMC4564389 DOI: 10.1016/j.csbj.2015.08.004] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 08/05/2015] [Accepted: 08/07/2015] [Indexed: 12/17/2022] Open
Abstract
Analysis of gene expression has contributed to a plethora of biological and medical research studies. Microarrays have been intensively used for the profiling of gene expression during diverse developmental processes, treatments and diseases. New massively parallel sequencing methods, often named as RNA-sequencing (RNA-seq) are extensively improving our understanding of gene regulation and signaling networks. Computational methods developed originally for microarrays analysis can now be optimized and applied to genome-wide studies in order to have access to a better comprehension of the whole transcriptome. This review addresses current challenges on RNA-seq analysis and specifically focuses on new bioinformatics tools developed for time series experiments. Furthermore, possible improvements in analysis, data integration as well as future applications of differential expression analysis are discussed.
Collapse
Affiliation(s)
- Daniel Spies
- Swiss Federal Institute of Technology Zurich, Department of Biology, Institute of Molecular Health Sciences, Zurich, Otto-Stern Weg 7, 8093 Zurich, Switzerland
- Life Science Zurich Graduate School, Molecular Life Science Program, University of Zurich, Institute of Molecular Life Sciences, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Constance Ciaudo
- Swiss Federal Institute of Technology Zurich, Department of Biology, Institute of Molecular Health Sciences, Zurich, Otto-Stern Weg 7, 8093 Zurich, Switzerland
| |
Collapse
|
13
|
Zolhavarieh S, Aghabozorgi S, Teh YW. A review of subsequence time series clustering. ScientificWorldJournal 2014; 2014:312521. [PMID: 25140332 PMCID: PMC4130317 DOI: 10.1155/2014/312521] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2014] [Revised: 06/05/2014] [Accepted: 06/23/2014] [Indexed: 11/30/2022] Open
Abstract
Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies.
Collapse
Affiliation(s)
- Seyedjamal Zolhavarieh
- Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya (UM), 50603 Kuala Lumpur, Malaysia
| | - Saeed Aghabozorgi
- Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya (UM), 50603 Kuala Lumpur, Malaysia
| | - Ying Wah Teh
- Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya (UM), 50603 Kuala Lumpur, Malaysia
| |
Collapse
|
14
|
Stratification of gene coexpression patterns and GO function mining for a RNA-Seq data series. BIOMED RESEARCH INTERNATIONAL 2014; 2014:969768. [PMID: 24955372 PMCID: PMC4052503 DOI: 10.1155/2014/969768] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2014] [Revised: 04/05/2014] [Accepted: 04/06/2014] [Indexed: 11/17/2022]
Abstract
RNA-Seq is emerging as an increasingly important tool in biological research, and it provides the most direct evidence of the relationship between the physiological state and molecular changes in cells. A large amount of RNA-Seq data across diverse experimental conditions have been generated and deposited in public databases. However, most developed approaches for coexpression analyses focus on the coexpression pattern mining of the transcriptome, thereby ignoring the magnitude of gene differences in one pattern. Furthermore, the functional relationships of genes in one pattern, and notably among patterns, were not always recognized. In this study, we developed an integrated strategy to identify differential coexpression patterns of genes and probed the functional mechanisms of the modules. Two real datasets were used to validate the method and allow comparisons with other methods. One of the datasets was selected to illustrate the flow of a typical analysis. In summary, we present an approach to robustly detect coexpression patterns in transcriptomes and to stratify patterns according to their relative differences. Furthermore, a global relationship between patterns and biological functions was constructed. In addition, a freely accessible web toolkit “coexpression pattern mining and GO functional analysis” (COGO) was developed.
Collapse
|
15
|
Oh S, Song S, Dasgupta N, Grabowski G. The analytical landscape of static and temporal dynamics in transcriptome data. Front Genet 2014; 5:35. [PMID: 24600473 PMCID: PMC3929947 DOI: 10.3389/fgene.2014.00035] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Accepted: 01/30/2014] [Indexed: 12/16/2022] Open
Abstract
Interpreting gene expression profiles often involves statistical analysis of large numbers of differentially expressed genes, isoforms, and alternative splicing events at either static or dynamic spectrums. Reduced sequencing costs have made feasible dense time-series analysis of gene expression using RNA-seq; however, statistical methods in the context of temporal RNA-seq data are poorly developed. Here we will review current methods for identifying temporal changes in gene expression using RNA-seq, which are limited to static pairwise comparisons of time points and which fail to account for temporal dependencies in gene expression patterns. We also review recently developed very few number of temporal dynamic RNA-seq specific methods. Application and development of RNA-specific temporal dynamic methods have been continuously under the development, yet, it is still in infancy. We fully cover microarray specific temporal methods and transcriptome studies in initial digital technology (e.g., SAGE) between traditional microarray and new RNA-seq.
Collapse
Affiliation(s)
- Sunghee Oh
- Division of Human Genetics, Department of Pediatrics, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| | - Seongho Song
- Department of Mathematical Sciences, McMicken College of Arts and Sciences, University of Cincinnati Cincinnati, OH, USA
| | - Nupur Dasgupta
- Division of Human Genetics, Department of Pediatrics, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| | - Gregory Grabowski
- Division of Human Genetics, Department of Pediatrics, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| |
Collapse
|