1
|
Shen L, Huang H, Li J, Chen W, Yao Y, Hu J, Zhou J, Huang F, Ni C. Exploration of prognosis and immunometabolism landscapes in ER+ breast cancer based on a novel lipid metabolism-related signature. Front Immunol 2023; 14:1199465. [PMID: 37469520 PMCID: PMC10352658 DOI: 10.3389/fimmu.2023.1199465] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 06/19/2023] [Indexed: 07/21/2023] Open
Abstract
Introduction Lipid metabolic reprogramming is gaining attention as a hallmark of cancers. Recent mounting evidence indicates that the malignant behavior of breast cancer (BC) is closely related to lipid metabolism. Here, we focus on the estrogen receptor-positive (ER+) subtype, the most common subgroup of BC, to explore immunometabolism landscapes and prognostic significance according to lipid metabolism-related genes (LMRGs). Methods Samples from The Cancer Genome Atlas (TCGA) database were used as training cohort, and samples from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), Gene Expression Omnibus (GEO) datasets and our cohort were applied for external validation. The survival-related LMRG molecular pattern and signature were constructed by unsupervised consensus clustering and least absolute shrinkage and selection operator (LASSO) analysis. A lipid metabolism-related clinicopathologic nomogram was established. Gene enrichment and pathway analysis were performed to explore the underlying mechanism. Immune landscapes, immunotherapy and chemotherapy response were further explored. Moreover, the relationship between gene expression and clinicopathological features was assessed by immunohistochemistry. Results Two LMRG molecular patterns were identified and associated with distinct prognoses and immune cell infiltration. Next, a prognostic signature based on nine survival-related LMRGs was established and validated. The signature was confirmed to be an independent prognostic factor and an optimal nomogram incorporating age and T stage (AUC of 5-year overall survival: 0.778). Pathway enrichment analysis revealed differences in immune activities, lipid biosynthesis and drug metabolism by comparing groups with low- and high-risk scores. Further exploration verified different immune microenvironment profiles, immune checkpoint expression, and sensitivity to immunotherapy and chemotherapy between the two groups. Finally, arachidonate 15-lipoxygenase (ALOX15) was selected as the most prominent differentially expressed gene between the two groups. Its expression was positively related to larger tumor size, more advanced tumor stage and vascular invasion in our cohort (n = 149). Discussion This is the first lipid metabolism-based signature with value for prognosis prediction and immunotherapy or chemotherapy guidance for ER+ BC.
Collapse
Affiliation(s)
- Lesang Shen
- Department of Breast Surgery, Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Tumor Microenvironment and Immune Therapy of Zhejiang Province, Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, China
| | - Huanhuan Huang
- Department of Breast Surgery, Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Tumor Microenvironment and Immune Therapy of Zhejiang Province, Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, China
| | - Jiaxin Li
- Department of Breast Surgery, Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Tumor Microenvironment and Immune Therapy of Zhejiang Province, Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, China
| | - Wuzhen Chen
- Department of Breast Surgery, Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Tumor Microenvironment and Immune Therapy of Zhejiang Province, Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, China
| | - Yao Yao
- Department of Breast Surgery, Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Tumor Microenvironment and Immune Therapy of Zhejiang Province, Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, China
| | - Jianming Hu
- Department of Breast Surgery, Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Tumor Microenvironment and Immune Therapy of Zhejiang Province, Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, China
| | - Jun Zhou
- Department of Breast Surgery, Affiliated Hangzhou First People’s Hospital, Zhejiang University, Hangzhou, Zhejiang, China
| | - Fengbo Huang
- Department of Pathology, Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
| | - Chao Ni
- Department of Breast Surgery, Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Tumor Microenvironment and Immune Therapy of Zhejiang Province, Second Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, China
| |
Collapse
|
2
|
Lee K, Yu D, Hyung D, Cho SY, Park C. ASpediaFI: Functional Interaction Analysis of Alternative Splicing Events. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:466-482. [PMID: 35085775 PMCID: PMC9801047 DOI: 10.1016/j.gpb.2021.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 10/15/2021] [Accepted: 11/01/2021] [Indexed: 01/26/2023]
Abstract
Alternative splicing (AS) regulates biological processes governing phenotypes and diseases. Differential AS (DAS) gene test methods have been developed to investigate important exonic expression from high-throughput datasets. However, the DAS events extracted using statistical tests are insufficient to delineate relevant biological processes. In this study, we developed a novel application, Alternative Splicing Encyclopedia: Functional Interaction (ASpediaFI), to systemically identify DAS events and co-regulated genes and pathways. ASpediaFI establishes a heterogeneous interaction network of genes and their feature nodes (i.e., AS events and pathways) connected by co-expression or pathway gene set knowledge. Next, ASpediaFI explores the interaction network using the random walk with restart algorithm and interrogates the proximity from a query gene set. Finally, ASpediaFI extracts significant AS events, genes, and pathways. To evaluate the performance of our method, we simulated RNA sequencing (RNA-seq) datasets to consider various conditions of sequencing depth and sample size. The performance was compared with that of other methods. Additionally, we analyzed three public datasets of cancer patients or cell lines to evaluate how well ASpediaFI detects biologically relevant candidates. ASpediaFI exhibits strong performance in both simulated and public datasets. Our integrative approach reveals that DAS events that recognize a global co-expression network and relevant pathways determine the functional importance of spliced genes in the subnetwork. ASpediaFI is publicly available at https://bioconductor.org/packages/ASpediaFI.
Collapse
|
3
|
Cheung FKM, Qin J. The Methods and Tools for Molecular Network Construction. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11464-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
4
|
Yao Z, Zhang J, Zou X. A general index for linear and nonlinear correlations for high dimensional genomic data. BMC Genomics 2020; 21:846. [PMID: 33256599 PMCID: PMC7706065 DOI: 10.1186/s12864-020-07246-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 11/18/2020] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND With the advance of high throughput sequencing, high-dimensional data are generated. Detecting dependence/correlation between these datasets is becoming one of most important issues in multi-dimensional data integration and co-expression network construction. RNA-sequencing data is widely used to construct gene regulatory networks. Such networks could be more accurate when methylation data, copy number aberration data and other types of data are introduced. Consequently, a general index for detecting relationships between high-dimensional data is indispensable. RESULTS We proposed a Kernel-Based RV-coefficient, named KBRV, for testing both linear and nonlinear correlation between two matrices by introducing kernel functions into RV2 (the modified RV-coefficient). Permutation test and other validation methods were used on simulated data to test the significance and rationality of KBRV. In order to demonstrate the advantages of KBRV in constructing gene regulatory networks, we applied this index on real datasets (ovarian cancer datasets and exon-level RNA-Seq data in human myeloid differentiation) to illustrate its superiority over vector correlation. CONCLUSIONS We concluded that KBRV is an efficient index for detecting both linear and nonlinear relationships in high dimensional data. The correlation method for high dimensional data has possible applications in the construction of gene regulatory network.
Collapse
Affiliation(s)
- Zhihao Yao
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430072 China
- Hubei Key Laboratory of Computational Science, Wuhan University, Wuhan, 430072 China
| | - Jing Zhang
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430072 China
- Hubei Key Laboratory of Computational Science, Wuhan University, Wuhan, 430072 China
| | - Xiufen Zou
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430072 China
- Hubei Key Laboratory of Computational Science, Wuhan University, Wuhan, 430072 China
| |
Collapse
|
5
|
Chowdhury HA, Bhattacharyya DK, Kalita JK. (Differential) Co-Expression Analysis of Gene Expression: A Survey of Best Practices. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1154-1173. [PMID: 30668502 DOI: 10.1109/tcbb.2019.2893170] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Analysis of gene expression data is widely used in transcriptomic studies to understand functions of molecules inside a cell and interactions among molecules. Differential co-expression analysis studies diseases and phenotypic variations by finding modules of genes whose co-expression patterns vary across conditions. We review the best practices in gene expression data analysis in terms of analysis of (differential) co-expression, co-expression network, differential networking, and differential connectivity considering both microarray and RNA-seq data along with comparisons. We highlight hurdles in RNA-seq data analysis using methods developed for microarrays. We include discussion of necessary tools for gene expression analysis throughout the paper. In addition, we shed light on scRNA-seq data analysis by including preprocessing and scRNA-seq in co-expression analysis along with useful tools specific to scRNA-seq. To get insights, biological interpretation and functional profiling is included. Finally, we provide guidelines for the analyst, along with research issues and challenges which should be addressed.
Collapse
|
6
|
Wang D, Zou X, Fai Au K. A network-based computational framework to predict and differentiate functions for gene isoforms using exon-level expression data. Methods 2020; 189:54-64. [PMID: 32534132 DOI: 10.1016/j.ymeth.2020.06.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 05/22/2020] [Accepted: 06/06/2020] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Alternative splicing makes significant contributions to functional diversity of transcripts and proteins. Many alternatively spliced gene isoforms have been shown to perform specific biological functions under different contexts. In addition to gene-level expression, the advances of high-throughput sequencing offer a chance to estimate isoform-specific exon expression with a high resolution, which is informative for studying splice variants with network analysis. RESULTS In this study, we propose a novel network-based analysis framework to predict isoform-specific functions from exon-level RNA-Seq data. In particular, based on exon-level expression data, we firstly propose a unified framework, referred to as Iso-Net, to integrate two new mathematical methods (named MINet and RVNet) that infer co-expression networks at different data scenarios. We demonstrate the superior prediction accuracy of Iso-Net over the existing methods for most simulation data, especially in two extreme cases: sample size is very small and exon numbers of two isoforms are quite different. Furthermore, by defining relevant quantitative measures (e.g., Jaccard correlation coefficient) and combining differential co-expression network analysis and GO functional enrichment analysis, a co-expression network analysis framework is developed to predict functions of isoforms and further, to discover their distinct functions within the same gene. We apply Iso-Net to study gene isoforms for several important transcription factors in human myeloid differentiation with the exon-level RNA-Seq data from three different cell lines. AVAILABILITY AND IMPLEMENTATION Iso-Net is open source and freely available from https://github.com/Dingjie-Wang/Iso-Net.
Collapse
Affiliation(s)
- Dingjie Wang
- Department of Biomedical Informatics, The Ohio State University, OH 43210, USA; School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China; Computational Science Hubei Key Laboratory, Wuhan University, Wuhan 430072, China
| | - Xiufen Zou
- School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China; Computational Science Hubei Key Laboratory, Wuhan University, Wuhan 430072, China.
| | - Kin Fai Au
- Department of Biomedical Informatics, The Ohio State University, OH 43210, USA.
| |
Collapse
|
7
|
Pacini C, Koziol MJ. Bioinformatics challenges and perspectives when studying the effect of epigenetic modifications on alternative splicing. Philos Trans R Soc Lond B Biol Sci 2019; 373:rstb.2017.0073. [PMID: 29685977 PMCID: PMC5915717 DOI: 10.1098/rstb.2017.0073] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/14/2017] [Indexed: 02/07/2023] Open
Abstract
It is widely known that epigenetic modifications are important in regulating transcription, but several have also been reported in alternative splicing. The regulation of pre-mRNA splicing is important to explain proteomic diversity and the misregulation of splicing has been implicated in many diseases. Here, we give a brief overview of the role of epigenetics in alternative splicing and disease. We then discuss the bioinformatics methods that can be used to model interactions between epigenetic marks and regulators of splicing. These models can be used to identify alternative splicing and epigenetic changes across different phenotypes. This article is part of a discussion meeting issue ‘Frontiers in epigenetic chemical biology’.
Collapse
Affiliation(s)
- Clare Pacini
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.,Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
| | - Magdalena J Koziol
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK .,Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
| |
Collapse
|
8
|
Ye W, Long Y, Ji G, Su Y, Ye P, Fu H, Wu X. Cluster analysis of replicated alternative polyadenylation data using canonical correlation analysis. BMC Genomics 2019; 20:75. [PMID: 30669970 PMCID: PMC6343338 DOI: 10.1186/s12864-019-5433-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2018] [Accepted: 01/03/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Alternative polyadenylation (APA) has emerged as a pervasive mechanism that contributes to the transcriptome complexity and dynamics of gene regulation. The current tsunami of whole genome poly(A) site data from various conditions generated by 3' end sequencing provides a valuable data source for the study of APA-related gene expression. Cluster analysis is a powerful technique for investigating the association structure among genes, however, conventional gene clustering methods are not suitable for APA-related data as they fail to consider the information of poly(A) sites (e.g., location, abundance, number, etc.) within each gene or measure the association among poly(A) sites between two genes. RESULTS Here we proposed a computational framework, named PASCCA, for clustering genes from replicated or unreplicated poly(A) site data using canonical correlation analysis (CCA). PASCCA incorporates multiple layers of gene expression data from both the poly(A) site level and gene level and takes into account the number of replicates and the variability within each experimental group. Moreover, PASCCA characterizes poly(A) sites in various ways including the abundance and relative usage, which can exploit the advantages of 3' end deep sequencing in quantifying APA sites. Using both real and synthetic poly(A) site data sets, the cluster analysis demonstrates that PASCCA outperforms other widely-used distance measures under five performance metrics including connectivity, the Dunn index, average distance, average distance between means, and the biological homogeneity index. We also used PASCCA to infer APA-specific gene modules from recently published poly(A) site data of rice and discovered some distinct functional gene modules. We have made PASCCA an easy-to-use R package for APA-related gene expression analyses, including the characterization of poly(A) sites, quantification of association between genes, and clustering of genes. CONCLUSIONS By providing a better treatment of the noise inherent in repeated measurements and taking into account multiple layers of poly(A) site data, PASCCA could be a general tool for clustering and analyzing APA-specific gene expression data. PASCCA could be used to elucidate the dynamic interplay of genes and their APA sites among various biological conditions from emerging 3' end sequencing data to address the complex biological phenomenon.
Collapse
Affiliation(s)
- Wenbin Ye
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China
| | - Yuqi Long
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Software Quality Testing Engineering Research Center, China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou, 510610, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China
| | - Yaru Su
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou, 350116, China
| | - Pengchao Ye
- Department of Automation, Xiamen University, Xiamen, 361005, China
| | - Hongjuan Fu
- Department of Automation, Xiamen University, Xiamen, 361005, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, 361005, China. .,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China.
| |
Collapse
|
9
|
van Dam S, Võsa U, van der Graaf A, Franke L, de Magalhães JP. Gene co-expression analysis for functional classification and gene-disease predictions. Brief Bioinform 2018; 19:575-592. [PMID: 28077403 PMCID: PMC6054162 DOI: 10.1093/bib/bbw139] [Citation(s) in RCA: 422] [Impact Index Per Article: 70.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Revised: 12/01/2016] [Indexed: 01/06/2023] Open
Abstract
Gene co-expression networks can be used to associate genes of unknown function with biological processes, to prioritize candidate disease genes or to discern transcriptional regulatory programmes. With recent advances in transcriptomics and next-generation sequencing, co-expression networks constructed from RNA sequencing data also enable the inference of functions and disease associations for non-coding genes and splice variants. Although gene co-expression networks typically do not provide information about causality, emerging methods for differential co-expression analysis are enabling the identification of regulatory genes underlying various phenotypes. Here, we introduce and guide researchers through a (differential) co-expression analysis. We provide an overview of methods and tools used to create and analyse co-expression networks constructed from gene expression data, and we explain how these can be used to identify genes with a regulatory role in disease. Furthermore, we discuss the integration of other data types with co-expression networks and offer future perspectives of co-expression analysis.
Collapse
Affiliation(s)
- Sipko van Dam
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | - Urmo Võsa
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | | | - Lude Franke
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | | |
Collapse
|
10
|
Yu H, Jiao B, Lu L, Wang P, Chen S, Liang C, Liu W. NetMiner-an ensemble pipeline for building genome-wide and high-quality gene co-expression network using massive-scale RNA-seq samples. PLoS One 2018; 13:e0192613. [PMID: 29425247 PMCID: PMC5806890 DOI: 10.1371/journal.pone.0192613] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Accepted: 01/27/2018] [Indexed: 01/10/2023] Open
Abstract
Accurately reconstructing gene co-expression network is of great importance for uncovering the genetic architecture underlying complex and various phenotypes. The recent availability of high-throughput RNA-seq sequencing has made genome-wide detecting and quantifying of the novel, rare and low-abundance transcripts practical. However, its potential merits in reconstructing gene co-expression network have still not been well explored. Using massive-scale RNA-seq samples, we have designed an ensemble pipeline, called NetMiner, for building genome-scale and high-quality Gene Co-expression Network (GCN) by integrating three frequently used inference algorithms. We constructed a RNA-seq-based GCN in one species of monocot rice. The quality of network obtained by our method was verified and evaluated by the curated gene functional association data sets, which obviously outperformed each single method. In addition, the powerful capability of network for associating genes with functions and agronomic traits was shown by enrichment analysis and case studies. In particular, we demonstrated the potential value of our proposed method to predict the biological roles of unknown protein-coding genes, long non-coding RNA (lncRNA) genes and circular RNA (circRNA) genes. Our results provided a valuable and highly reliable data source to select key candidate genes for subsequent experimental validation. To facilitate identification of novel genes regulating important biological processes and phenotypes in other plants or animals, we have published the source code of NetMiner, making it freely available at https://github.com/czllab/NetMiner.
Collapse
Affiliation(s)
- Hua Yu
- Nantong Medical College and School of Pharmacy, Nantong University, Nantong, China
- State Key Laboratory of Plant Genomics, Institute of Genetic and Developmental Biology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- * E-mail: , , (HY); (CL); (WL)
| | - Bingke Jiao
- State Key Laboratory of Plant Genomics, Institute of Genetic and Developmental Biology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Lu Lu
- Nantong Polytechnic College, Nantong, China
| | - Pengfei Wang
- Nantong Medical College and School of Pharmacy, Nantong University, Nantong, China
| | - Shuangcheng Chen
- Nantong Medical College and School of Pharmacy, Nantong University, Nantong, China
| | - Chengzhi Liang
- State Key Laboratory of Plant Genomics, Institute of Genetic and Developmental Biology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- * E-mail: , , (HY); (CL); (WL)
| | - Wei Liu
- Nantong Medical College and School of Pharmacy, Nantong University, Nantong, China
- * E-mail: , , (HY); (CL); (WL)
| |
Collapse
|
11
|
Yalamanchili HK, Wan YW, Liu Z. Data Analysis Pipeline for RNA-seq Experiments: From Differential Expression to Cryptic Splicing. ACTA ACUST UNITED AC 2017; 59:11.15.1-11.15.21. [PMID: 28902396 DOI: 10.1002/cpbi.33] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
RNA sequencing (RNA-seq) is a high-throughput technology that provides unique insights into the transcriptome. It has a wide variety of applications in quantifying genes/isoforms and in detecting non-coding RNA, alternative splicing, and splice junctions. It is extremely important to comprehend the entire transcriptome for a thorough understanding of the cellular system. Several RNA-seq analysis pipelines have been proposed to date. However, no single analysis pipeline can capture dynamics of the entire transcriptome. Here, we compile and present a robust and commonly used analytical pipeline covering the entire spectrum of transcriptome analysis, including quality checks, alignment of reads, differential gene/transcript expression analysis, discovery of cryptic splicing events, and visualization. Challenges, critical parameters, and possible downstream functional analysis pipelines associated with each step are highlighted and discussed. This unit provides a comprehensive understanding of state-of-the-art RNA-seq analysis pipeline and a greater understanding of the transcriptome. © 2017 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Hari Krishna Yalamanchili
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.,Bioinformatics Core, Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas
| | - Ying-Wooi Wan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.,Bioinformatics Core, Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas
| | - Zhandong Liu
- Bioinformatics Core, Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas.,Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas.,Department of Pediatrics-Neurology, Baylor College of Medicine, Houston, Texas
| |
Collapse
|
12
|
Ji G, Lin Q, Long Y, Ye C, Ye W, Wu X. PAcluster: Clustering polyadenylation site data using canonical correlation analysis. J Bioinform Comput Biol 2017; 15:1750018. [PMID: 28874086 DOI: 10.1142/s0219720017500184] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Alternative polyadenylation (APA) is a pervasive mechanism that contributes to gene regulation. Increasing sequenced poly(A) sites are placing new demands for the development of computational methods to investigate APA regulation. Cluster analysis is important to identify groups of co-expressed genes. However, clustering of poly(A) sites has not been extensively studied in APA, where most APA studies failed to consider the distribution, abundance, and variation of APA sites in each gene. Here we constructed a two-layer model based on canonical correlation analysis (CCA) to explore the underlying biological mechanisms in APA regulation. The first layer quantifies the general correlation of APA sites across various conditions between each gene and the second layer identifies genes with statistically significant correlation on their APA patterns to infer APA-specific gene clusters. Using hierarchical clustering, we comprehensively compared our method with four other widely used distance measures based on three performance indexes. Results showed that our method significantly enhanced the clustering performance for both synthetic and real poly(A) site data and could generate clusters with more biological meaning. We have implemented the CCA-based method as a publically available R package called PAcluster, which provides an efficient solution to the clustering of large APA-specific biological dataset.
Collapse
Affiliation(s)
- Guoli Ji
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| | - Qianmin Lin
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| | - Yuqi Long
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| | - Congting Ye
- † College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, P. R. China
| | - Wenbin Ye
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| | - Xiaohui Wu
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| |
Collapse
|
13
|
Tan Q, Yalamanchili HK, Park J, De Maio A, Lu HC, Wan YW, White JJ, Bondar VV, Sayegh LS, Liu X, Gao Y, Sillitoe RV, Orr HT, Liu Z, Zoghbi HY. Extensive cryptic splicing upon loss of RBM17 and TDP43 in neurodegeneration models. Hum Mol Genet 2017; 25:5083-5093. [PMID: 28007900 DOI: 10.1093/hmg/ddw337] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Accepted: 09/29/2016] [Indexed: 12/12/2022] Open
Abstract
Splicing regulation is an important step of post-transcriptional gene regulation. It is a highly dynamic process orchestrated by RNA-binding proteins (RBPs). RBP dysfunction and global splicing dysregulation have been implicated in many human diseases, but the in vivo functions of most RBPs and the splicing outcome upon their loss remain largely unexplored. Here we report that constitutive deletion of Rbm17, which encodes an RBP with a putative role in splicing, causes early embryonic lethality in mice and that its loss in Purkinje neurons leads to rapid degeneration. Transcriptome profiling of Rbm17-deficient and control neurons and subsequent splicing analyses using CrypSplice, a new computational method that we developed, revealed that more than half of RBM17-dependent splicing changes are cryptic. Importantly, RBM17 represses cryptic splicing of genes that likely contribute to motor coordination and cell survival. This finding prompted us to re-analyze published datasets from a recent report on TDP-43, an RBP implicated in amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD), as it was demonstrated that TDP-43 represses cryptic exon splicing to promote cell survival. We uncovered a large number of TDP-43-dependent splicing defects that were not previously discovered, revealing that TDP-43 extensively regulates cryptic splicing. Moreover, we found a significant overlap in genes that undergo both RBM17- and TDP-43-dependent cryptic splicing repression, many of which are associated with survival. We propose that repression of cryptic splicing by RBPs is critical for neuronal health and survival. CrypSplice is available at www.liuzlab.org/CrypSplice.
Collapse
Affiliation(s)
- Qiumin Tan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.,Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas 77030, USA
| | - Hari Krishna Yalamanchili
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.,Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas 77030, USA
| | - Jeehye Park
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.,Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas 77030, USA
| | - Antonia De Maio
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas 77030, USA.,Program in Developmental Biology
| | - Hsiang-Chih Lu
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas 77030, USA.,Program in Developmental Biology
| | - Ying-Wooi Wan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.,Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas 77030, USA
| | - Joshua J White
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas 77030, USA.,Department of Neuroscience.,Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Vitaliy V Bondar
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.,Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas 77030, USA
| | - Layal S Sayegh
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.,Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas 77030, USA
| | - Xiuyun Liu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.,Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas 77030, USA
| | - Yan Gao
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.,Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas 77030, USA
| | - Roy V Sillitoe
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas 77030, USA.,Program in Developmental Biology.,Department of Neuroscience.,Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Harry T Orr
- Institute for Translational Neuroscience.,Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Zhandong Liu
- Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas 77030, USA.,Department of Pediatrics
| | - Huda Y Zoghbi
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.,Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, Texas 77030, USA.,Program in Developmental Biology.,Department of Neuroscience.,Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas 77030, USA.,Howard Hughes Medical Institute, Baylor College of Medicine, Houston, Texas 77030, USA
| |
Collapse
|
14
|
Wang Z, Fang H, Tang NLS, Deng M. VCNet: vector-based gene co-expression network construction and its application to RNA-seq data. Bioinformatics 2017; 33:2173-2181. [DOI: 10.1093/bioinformatics/btx131] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Accepted: 03/07/2017] [Indexed: 11/12/2022] Open
Affiliation(s)
- Zengmiao Wang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Huaying Fang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- LMAM, School of Mathematical Sciences, Peking University, Beijing, China
| | - Nelson Leung-Sang Tang
- Department of Chemical Pathology and Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, China
| | - Minghua Deng
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- LMAM, School of Mathematical Sciences, Peking University, Beijing, China
- Center for Statistical Science, Peking University, Beijing, China
| |
Collapse
|
15
|
Li W, Chen J, Yao J. Testing the independence of two random vectors where only one dimension is large. STATISTICS-ABINGDON 2016. [DOI: 10.1080/02331888.2016.1266988] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Weiming Li
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, People's Republic of China
| | - Jiaqi Chen
- Department of Mathematics, Harbin Institute of Technology, Harbin, People's Republic of China
| | - Jianfeng Yao
- Department of Statistics and Actuarial Sciences, The University of Hong Kong, Hongkong, People's Republic of China
| |
Collapse
|
16
|
Abstract
The laboratory mouse is the primary mammalian species used for studying alternative splicing events. Recent studies have generated computational models to predict functions for splice isoforms in the mouse. However, the functional relationship network, describing the probability of splice isoforms participating in the same biological process or pathway, has not yet been studied in the mouse. Here we describe a rich genome-wide resource of mouse networks at the isoform level, which was generated using a unique framework that was originally developed to infer isoform functions. This network was built through integrating heterogeneous genomic and protein data, including RNA-seq, exon array, protein docking and pseudo-amino acid composition. Through simulation and cross-validation studies, we demonstrated the accuracy of the algorithm in predicting isoform-level functional relationships. We showed that this network enables the users to reveal functional differences of the isoforms of the same gene, as illustrated by literature evidence with Anxa6 (annexin a6) as an example. We expect this work will become a useful resource for the mouse genetics community to understand gene functions. The network is publicly available at: http://guanlab.ccmb.med.umich.edu/isoformnetwork.
Collapse
|
17
|
Oh S. How are Bayesian and Non-Parametric Methods Doing a Great Job in RNA-Seq Differential Expression Analysis? : A Review. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2015. [DOI: 10.5351/csam.2015.22.2.181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Sunghee Oh
- Department of Veterinary Medicine, Jeju National University, Korea
| |
Collapse
|