1
|
Dong M, He Y, Jiang Y, Zou F. Joint gene network construction by single-cell RNA sequencing data. Biometrics 2023; 79:915-925. [PMID: 35184277 PMCID: PMC10548400 DOI: 10.1111/biom.13645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 11/30/2021] [Accepted: 02/07/2022] [Indexed: 11/26/2022]
Abstract
In contrast to differential gene expression analysis at the single-gene level, gene regulatory network (GRN) analysis depicts complex transcriptomic interactions among genes for better understandings of underlying genetic architectures of human diseases and traits. Recent advances in single-cell RNA sequencing (scRNA-seq) allow constructing GRNs at a much finer resolution than bulk RNA-seq and microarray data. However, scRNA-seq data are inherently sparse, which hinders the direct application of the popular Gaussian graphical models (GGMs). Furthermore, most existing approaches for constructing GRNs with scRNA-seq data only consider gene networks under one condition. To better understand GRNs across different but related conditions at single-cell resolution, we propose to construct Joint Gene Networks with scRNA-seq data (JGNsc) under the GGMs framework. To facilitate the use of GGMs, JGNsc first proposes a hybrid imputation procedure that combines a Bayesian zero-inflated Poisson model with an iterative low-rank matrix completion step to efficiently impute zero-inflated counts resulted from technical artifacts. JGNsc then transforms the imputed data via a nonparanormal transformation, based on which joint GGMs are constructed. We demonstrate JGNsc and assess its performance using synthetic data. The application of JGNsc on two cancer clinical studies of medulloblastoma and glioblastoma gains novel insights in addition to confirming well-known biological results.
Collapse
Affiliation(s)
- Meichen Dong
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Yiping He
- Department of Pathology, School of Medicine, Duke University, Durham, North Carolina, USA
| | - Yuchao Jiang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Fei Zou
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
2
|
Karaaslanli A, Saha S, Maiti T, Aviyente S. Kernelized multiview signed graph learning for single-cell RNA sequencing data. BMC Bioinformatics 2023; 24:127. [PMID: 37016281 PMCID: PMC10071725 DOI: 10.1186/s12859-023-05250-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Accepted: 03/22/2023] [Indexed: 04/06/2023] Open
Abstract
BACKGROUND Characterizing the topology of gene regulatory networks (GRNs) is a fundamental problem in systems biology. The advent of single cell technologies has made it possible to construct GRNs at finer resolutions than bulk and microarray datasets. However, cellular heterogeneity and sparsity of the single cell datasets render void the application of regular Gaussian assumptions for constructing GRNs. Additionally, most GRN reconstruction approaches estimate a single network for the entire data. This could cause potential loss of information when single cell datasets are generated from multiple treatment conditions/disease states. RESULTS To better characterize single cell GRNs under different but related conditions, we propose the joint estimation of multiple networks using multiple signed graph learning (scMSGL). The proposed method is based on recently developed graph signal processing (GSP) based graph learning, where GRNs and gene expressions are modeled as signed graphs and graph signals, respectively. scMSGL learns multiple GRNs by optimizing the total variation of gene expressions with respect to GRNs while ensuring that the learned GRNs are similar to each other through regularization with respect to a learned signed consensus graph. We further kernelize scMSGL with the kernel selected to suit the structure of single cell data. CONCLUSIONS scMSGL is shown to have superior performance over existing state of the art methods in GRN recovery on simulated datasets. Furthermore, scMSGL successfully identifies well-established regulators in a mouse embryonic stem cell differentiation study and a cancer clinical study of medulloblastoma.
Collapse
Affiliation(s)
- Abdullah Karaaslanli
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, USA.
| | - Satabdi Saha
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Tapabrata Maiti
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, USA
| | - Selin Aviyente
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
3
|
Gao F, Li C, Smith SM, Peinado N, Kohbodi G, Tran E, Loh YHE, Li W, Borok Z, Minoo P. Decoding the IGF1 signaling gene regulatory network behind alveologenesis from a mouse model of bronchopulmonary dysplasia. eLife 2022; 11:e77522. [PMID: 36214448 PMCID: PMC9581530 DOI: 10.7554/elife.77522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 10/07/2022] [Indexed: 11/13/2022] Open
Abstract
Lung development is precisely controlled by underlying gene regulatory networks (GRN). Disruption of genes in the network can interrupt normal development and cause diseases such as bronchopulmonary dysplasia (BPD) - a chronic lung disease in preterm infants with morbid and sometimes lethal consequences characterized by lung immaturity and reduced alveolarization. Here, we generated a transgenic mouse exhibiting a moderate severity BPD phenotype by blocking IGF1 signaling in secondary crest myofibroblasts (SCMF) at the onset of alveologenesis. Using approaches mirroring the construction of the model GRN in sea urchin's development, we constructed the IGF1 signaling network underlying alveologenesis using this mouse model that phenocopies BPD. The constructed GRN, consisting of 43 genes, provides a bird's eye view of how the genes downstream of IGF1 are regulatorily connected. The GRN also reveals a mechanistic interpretation of how the effects of IGF1 signaling are transduced within SCMF from its specification genes to its effector genes and then from SCMF to its neighboring alveolar epithelial cells with WNT5A and FGF10 signaling as the bridge. Consistently, blocking WNT5A signaling in mice phenocopies BPD as inferred by the network. A comparative study on human samples suggests that a GRN of similar components and wiring underlies human BPD. Our network view of alveologenesis is transforming our perspective to understand and treat BPD. This new perspective calls for the construction of the full signaling GRN underlying alveologenesis, upon which targeted therapies for this neonatal chronic lung disease can be viably developed.
Collapse
Affiliation(s)
- Feng Gao
- Division of Neonatology, Department of Pediatrics, University of Southern CaliforniaLos AngelesUnited States
| | - Changgong Li
- Division of Neonatology, Department of Pediatrics, University of Southern CaliforniaLos AngelesUnited States
| | - Susan M Smith
- Division of Neonatology, Department of Pediatrics, University of Southern CaliforniaLos AngelesUnited States
| | - Neil Peinado
- Division of Neonatology, Department of Pediatrics, University of Southern CaliforniaLos AngelesUnited States
| | - Golenaz Kohbodi
- Division of Neonatology, Department of Pediatrics, University of Southern CaliforniaLos AngelesUnited States
| | - Evelyn Tran
- Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern CaliforniaLos AngelesUnited States
- Department of Biochemistry and Molecular Medicine, Keck School of Medicine, University of Southern CaliforniaLos AngelesUnited States
| | - Yong-Hwee Eddie Loh
- Norris Medical Library, University of Southern CaliforniaLos AngelesUnited States
| | - Wei Li
- Department of Nephrology, Jiangsu Provincial Hospital of Traditional Chinese MedicineNanjingChina
| | - Zea Borok
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of California, San DiegoSan DiegoUnited States
| | - Parviz Minoo
- Division of Neonatology, Department of Pediatrics, University of Southern CaliforniaLos AngelesUnited States
- Hastings Center for Pulmonary Research, Keck School of Medicine, University of Southern CaliforniaLos AngelesUnited States
| |
Collapse
|
4
|
Mégret L, Mendoza C, Arrieta Lobo M, Brouillet E, Nguyen TTY, Bouaziz O, Chambaz A, Néri C. Precision machine learning to understand micro-RNA regulation in neurodegenerative diseases. Front Mol Neurosci 2022; 15:914830. [PMID: 36157078 PMCID: PMC9500540 DOI: 10.3389/fnmol.2022.914830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 08/19/2022] [Indexed: 11/13/2022] Open
Abstract
Micro-RNAs (miRNAs) are short (∼21 nt) non-coding RNAs that regulate gene expression through the degradation or translational repression of mRNAs. Accumulating evidence points to a role of miRNA regulation in the pathogenesis of a wide range of neurodegenerative (ND) diseases such as, for example, Alzheimer’s disease, Parkinson’s disease, amyotrophic lateral sclerosis and Huntington disease (HD). Several systems level studies aimed to explore the role of miRNA regulation in NDs, but these studies remain challenging. Part of the problem may be related to the lack of sufficiently rich or homogeneous data, such as time series or cell-type-specific data obtained in model systems or human biosamples, to account for context dependency. Part of the problem may also be related to the methodological challenges associated with the accurate system-level modeling of miRNA and mRNA data. Here, we critically review the main families of machine learning methods used to analyze expression data, highlighting the added value of using shape-analysis concepts as a solution for precisely modeling highly dimensional miRNA and mRNA data such as the ones obtained in the study of the HD process, and elaborating on the potential of these concepts and methods for modeling complex omics data.
Collapse
Affiliation(s)
- Lucile Mégret
- Sorbonne Université, Centre National de la Recherche Scientifique UMR 8256, Paris, France
- *Correspondence: Lucile Mégret,
| | - Cloé Mendoza
- Sorbonne Université, Centre National de la Recherche Scientifique UMR 8256, Paris, France
| | - Maialen Arrieta Lobo
- Sorbonne Université, Centre National de la Recherche Scientifique UMR 8256, Paris, France
| | - Emmanuel Brouillet
- Sorbonne Université, Centre National de la Recherche Scientifique UMR 8256, Paris, France
| | - Thi-Thanh-Yen Nguyen
- Université Paris Cité, MAP5 (Centre National de la Recherche Scientifique UMR 8145), Paris, France
| | - Olivier Bouaziz
- Université Paris Cité, MAP5 (Centre National de la Recherche Scientifique UMR 8145), Paris, France
| | - Antoine Chambaz
- Université Paris Cité, MAP5 (Centre National de la Recherche Scientifique UMR 8145), Paris, France
| | - Christian Néri
- Sorbonne Université, Centre National de la Recherche Scientifique UMR 8256, Paris, France
- Christian Néri,
| |
Collapse
|
5
|
Zhang A, Fang J, Hu W, Calhoun VD, Wang YP. A Latent Gaussian Copula Model for Mixed Data Analysis in Brain Imaging Genetics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1350-1360. [PMID: 31689199 PMCID: PMC7756188 DOI: 10.1109/tcbb.2019.2950904] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Recent advances in imaging genetics make it possible to combine different types of data including medical images like functional magnetic resonance imaging (fMRI) and genetic data like single nucleotide polymorphisms (SNPs) for comprehensive diagnosis of mental disorders. Understanding complex interactions among these heterogeneous data may give rise to a new perspective, while at the same time demand statistical models for their integration. Various graphical models have been proposed for the study of interaction or association networks with continuous, binary, and count data as well as the mixture of them. However, limited efforts have been made for the multinomial case, for instance, SNP data. Our goal is therefore to fill the void by developing a graphical model for the integration of fMRI image and SNP data, which can provide deeper understanding of the unknown neurogenetic mechanism. In this article, we propose a latent Gaussian copula model for mixed data containing multinomial components. We assume that the discrete variable is obtained by discretizing a latent (unobserved) continuous variable and then create a semi-rank based estimator of the graph structure. The simulation results demonstrate that the proposed latent correlation has more steady and accurate performance than several existing methods in detecting graph structure. When applying to a real schizophrenia data consisting of SNP array and fMRI image collected by the Mind Clinical Imaging Consortium (MCIC), the proposed method reveals a set of distinct SNP-brain associations, which are verified to be biologically significant. The proposed model is statistically promising in handling mixed types of data including multinomial components, which can find widespread applications. To promote reproducible research, the R code is available at https://github.com/Aiying0512/LGCM.
Collapse
|
6
|
Zhang R, Ren Z, Celedón JC, Chen W. Inference of large modified Poisson-type graphical models: Application to RNA-seq data in childhood atopic asthma studies. Ann Appl Stat 2021. [DOI: 10.1214/20-aoas1413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Rong Zhang
- Department of Statistics, University of Pittsburgh
| | - Zhao Ren
- Department of Statistics, University of Pittsburgh
| | - Juan C. Celedón
- Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh
| | - Wei Chen
- Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh
| |
Collapse
|
7
|
Jia B, Liang F. Fast hybrid Bayesian integrative learning of multiple gene regulatory networks for type 1 diabetes. Biostatistics 2021; 22:233-249. [PMID: 33838043 PMCID: PMC8035990 DOI: 10.1093/biostatistics/kxz027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 06/01/2019] [Accepted: 06/23/2019] [Indexed: 11/12/2022] Open
Abstract
Motivated by the study of the molecular mechanism underlying type 1 diabetes with gene expression data collected from both patients and healthy controls at multiple time points, we propose a hybrid Bayesian method for jointly estimating multiple dependent Gaussian graphical models with data observed under distinct conditions, which avoids inversion of high-dimensional covariance matrices and thus can be executed very fast. We prove the consistency of the proposed method under mild conditions. The numerical results indicate the superiority of the proposed method over existing ones in both estimation accuracy and computational efficiency. Extension of the proposed method to joint estimation of multiple mixed graphical models is straightforward.
Collapse
Affiliation(s)
- Bochao Jia
- Eli Lilly and Company, Lilly Corporate Center, Indianapolis, IN, USA
| | - Faming Liang
- Department of Statistics, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
8
|
Mercatelli D, Scalambra L, Triboli L, Ray F, Giorgi FM. Gene regulatory network inference resources: A practical overview. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1863:194430. [PMID: 31678629 DOI: 10.1016/j.bbagrm.2019.194430] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 09/06/2019] [Accepted: 09/09/2019] [Indexed: 02/08/2023]
Abstract
Transcriptional regulation is a fundamental molecular mechanism involved in almost every aspect of life, from homeostasis to development, from metabolism to behavior, from reaction to stimuli to disease progression. In recent years, the concept of Gene Regulatory Networks (GRNs) has grown popular as an effective applied biology approach for describing the complex and highly dynamic set of transcriptional interactions, due to its easy-to-interpret features. Since cataloguing, predicting and understanding every GRN connection in all species and cellular contexts remains a great challenge for biology, researchers have developed numerous tools and methods to infer regulatory processes. In this review, we catalogue these methods in six major areas, based on the dominant underlying information leveraged to infer GRNs: Coexpression, Sequence Motifs, Chromatin Immunoprecipitation (ChIP), Orthology, Literature and Protein-Protein Interaction (PPI) specifically focused on transcriptional complexes. The methods described here cover a wide range of user-friendliness: from web tools that require no prior computational expertise to command line programs and algorithms for large scale GRN inferences. Each method for GRN inference described herein effectively illustrates a type of transcriptional relationship, with many methods being complementary to others. While a truly holistic approach for inferring and displaying GRNs remains one of the greatest challenges in the field of systems biology, we believe that the integration of multiple methods described herein provides an effective means with which experimental and computational biologists alike may obtain the most complete pictures of transcriptional relationships. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Daniele Mercatelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Laura Scalambra
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Luca Triboli
- Centre for Integrative Biology (CIBIO), University of Trento, Italy
| | - Forest Ray
- Department of Systems Biology, Columbia University Medical Center, New York, NY, United States
| | - Federico M Giorgi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
| |
Collapse
|
9
|
Xu S, Jia B, Liang F. Learning Moral Graphs in Construction of High-Dimensional Bayesian Networks for Mixed Data. Neural Comput 2019; 31:1183-1214. [PMID: 30979349 PMCID: PMC6874850 DOI: 10.1162/neco_a_01190] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Bayesian networks have been widely used in many scientific fields for describing the conditional independence relationships for a large set of random variables. This letter proposes a novel algorithm, the so-called p-learning algorithm, for learning moral graphs for high-dimensional Bayesian networks. The moral graph is a Markov network representation of the Bayesian network and also the key to construction of the Bayesian network for constraint-based algorithms. The consistency of the p-learning algorithm is justified under the small-n, large-p scenario. The numerical results indicate that the p-learning algorithm significantly outperforms the existing ones, such as the PC, grow-shrink, incremental association, semi-interleaved hiton, hill-climbing, and max-min hill-climbing. Under the sparsity assumption, the p-learning algorithm has a computational complexity of O(p2) even in the worst case, while the existing algorithms have a computational complexity of O(p3) in the worst case.
Collapse
Affiliation(s)
- Suwa Xu
- Department of Biostatistics, University of Florida, Gainesville, FL 32611, U.S.A.
| | - Bochao Jia
- Lilly Corporate Center, Eli Lilly and Company, Indianapolis, IN 46285, U.S.A.
| | - Faming Liang
- Department of Statistics, Purdue University, West Lafayette, IN 47906, U.S.A.
| |
Collapse
|
10
|
Transcriptome profiling reveals the anti-diabetic molecular mechanism of Cyclocarya paliurus polysaccharides. J Funct Foods 2019. [DOI: 10.1016/j.jff.2018.12.039] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
|
11
|
Zhang R, Ren Z, Chen W. SILGGM: An extensive R package for efficient statistical inference in large-scale gene networks. PLoS Comput Biol 2018; 14:e1006369. [PMID: 30102702 PMCID: PMC6107288 DOI: 10.1371/journal.pcbi.1006369] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 08/23/2018] [Accepted: 07/17/2018] [Indexed: 11/18/2022] Open
Abstract
Gene co-expression network analysis is extremely useful in interpreting a complex biological process. The recent droplet-based single-cell technology is able to generate much larger gene expression data routinely with thousands of samples and tens of thousands of genes. To analyze such a large-scale gene-gene network, remarkable progress has been made in rigorous statistical inference of high-dimensional Gaussian graphical model (GGM). These approaches provide a formal confidence interval or a p-value rather than only a single point estimator for conditional dependence of a gene pair and are more desirable for identifying reliable gene networks. To promote their widespread use, we herein introduce an extensive and efficient R package named SILGGM (Statistical Inference of Large-scale Gaussian Graphical Model) that includes four main approaches in statistical inference of high-dimensional GGM. Unlike the existing tools, SILGGM provides statistically efficient inference on both individual gene pair and whole-scale gene pairs. It has a novel and consistent false discovery rate (FDR) procedure in all four methodologies. Based on the user-friendly design, it provides outputs compatible with multiple platforms for interactive network visualization. Furthermore, comparisons in simulation illustrate that SILGGM can accelerate the existing MATLAB implementation to several orders of magnitudes and further improve the speed of the already very efficient R package FastGGM. Testing results from the simulated data confirm the validity of all the approaches in SILGGM even in a very large-scale setting with the number of variables or genes to a ten thousand level. We have also applied our package to a novel single-cell RNA-seq data set with pan T cells. The results show that the approaches in SILGGM significantly outperform the conventional ones in a biological sense. The package is freely available via CRAN at https://cran.r-project.org/package=SILGGM.
Collapse
Affiliation(s)
- Rong Zhang
- Department of Statistics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Zhao Ren
- Department of Statistics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Wei Chen
- Division of Pulmonary Medicine; Department of Pediatrics, Children’s Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Biostatistics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
12
|
Ryu D, Bilgili D, Ergönül Ö, Liang F, Ebrahimi N. A Bayesian Generalized Linear Model for Crimean–Congo Hemorrhagic Fever Incidents. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2017. [DOI: 10.1007/s13253-017-0310-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
13
|
Mohorianu I, Bretman A, Smith DT, Fowler EK, Dalmay T, Chapman T. Comparison of alternative approaches for analysing multi-level RNA-seq data. PLoS One 2017; 12:e0182694. [PMID: 28792517 PMCID: PMC5549751 DOI: 10.1371/journal.pone.0182694] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 07/21/2017] [Indexed: 11/19/2022] Open
Abstract
RNA sequencing (RNA-seq) is widely used for RNA quantification in the environmental, biological and medical sciences. It enables the description of genome-wide patterns of expression and the identification of regulatory interactions and networks. The aim of RNA-seq data analyses is to achieve rigorous quantification of genes/transcripts to allow a reliable prediction of differential expression (DE), despite variation in levels of noise and inherent biases in sequencing data. This can be especially challenging for datasets in which gene expression differences are subtle, as in the behavioural transcriptomics test dataset from D. melanogaster that we used here. We investigated the power of existing approaches for quality checking mRNA-seq data and explored additional, quantitative quality checks. To accommodate nested, multi-level experimental designs, we incorporated sample layout into our analyses. We employed a subsampling without replacement-based normalization and an identification of DE that accounted for the hierarchy and amplitude of effect sizes within samples, then evaluated the resulting differential expression call in comparison to existing approaches. In a final step to test for broader applicability, we applied our approaches to a published set of H. sapiens mRNA-seq samples, The dataset-tailored methods improved sample comparability and delivered a robust prediction of subtle gene expression changes. The proposed approaches have the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments.
Collapse
Affiliation(s)
- Irina Mohorianu
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
- School of Computing Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
| | - Amanda Bretman
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
- School of Biology, University of Leeds, Leeds, LS2 9JT, United Kingdom
| | - Damian T. Smith
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
| | - Emily K. Fowler
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
| | - Tamas Dalmay
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
| | - Tracey Chapman
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
- * E-mail:
| |
Collapse
|