1
|
Marku M, Pancaldi V. From time-series transcriptomics to gene regulatory networks: A review on inference methods. PLoS Comput Biol 2023; 19:e1011254. [PMID: 37561790 PMCID: PMC10414591 DOI: 10.1371/journal.pcbi.1011254] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023] Open
Abstract
Inference of gene regulatory networks has been an active area of research for around 20 years, leading to the development of sophisticated inference algorithms based on a variety of assumptions and approaches. With the ever increasing demand for more accurate and powerful models, the inference problem remains of broad scientific interest. The abstract representation of biological systems through gene regulatory networks represents a powerful method to study such systems, encoding different amounts and types of information. In this review, we summarize the different types of inference algorithms specifically based on time-series transcriptomics, giving an overview of the main applications of gene regulatory networks in computational biology. This review is intended to give an updated reference of regulatory networks inference tools to biologists and researchers new to the topic and guide them in selecting the appropriate inference method that best fits their questions, aims, and experimental data.
Collapse
Affiliation(s)
- Malvina Marku
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
| | - Vera Pancaldi
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
- Barcelona Supercomputing Center, Barcelona, Spain
| |
Collapse
|
2
|
Franchini M, Pellecchia S, Viscido G, Gambardella G. Single-cell gene set enrichment analysis and transfer learning for functional annotation of scRNA-seq data. NAR Genom Bioinform 2023; 5:lqad024. [PMID: 36879897 PMCID: PMC9985338 DOI: 10.1093/nargab/lqad024] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 01/16/2023] [Accepted: 02/20/2023] [Indexed: 03/07/2023] Open
Abstract
Although an essential step, cell functional annotation often proves particularly challenging from single-cell transcriptional data. Several methods have been developed to accomplish this task. However, in most cases, these rely on techniques initially developed for bulk RNA sequencing or simply make use of marker genes identified from cell clustering followed by supervised annotation. To overcome these limitations and automatize the process, we have developed two novel methods, the single-cell gene set enrichment analysis (scGSEA) and the single-cell mapper (scMAP). scGSEA combines latent data representations and gene set enrichment scores to detect coordinated gene activity at single-cell resolution. scMAP uses transfer learning techniques to re-purpose and contextualize new cells into a reference cell atlas. Using both simulated and real datasets, we show that scGSEA effectively recapitulates recurrent patterns of pathways' activity shared by cells from different experimental conditions. At the same time, we show that scMAP can reliably map and contextualize new single-cell profiles on a breast cancer atlas we recently released. Both tools are provided in an effective and straightforward workflow providing a framework to determine cell function and significantly improve annotation and interpretation of scRNA-seq data.
Collapse
Affiliation(s)
- Melania Franchini
- Telethon Institute of Genetics and Medicine, Pozzuoli 80078 Naples, Italy.,Department of Electrical Engineering and Information Technologies, University of Naples Federico II, 80125 Naples, Italy
| | - Simona Pellecchia
- Telethon Institute of Genetics and Medicine, Pozzuoli 80078 Naples, Italy
| | - Gaetano Viscido
- Telethon Institute of Genetics and Medicine, Pozzuoli 80078 Naples, Italy
| | - Gennaro Gambardella
- Telethon Institute of Genetics and Medicine, Pozzuoli 80078 Naples, Italy.,Department of Chemical Materials and Industrial Engineering, University of Naples Federico II, 80125 Naples, Italy
| |
Collapse
|
3
|
Schneider N, Reed E, Kamel F, Ferrari E, Soloviev M. Rational Approach to Finding Genes Encoding Molecular Biomarkers: Focus on Breast Cancer. Genes (Basel) 2022; 13:genes13091538. [PMID: 36140706 PMCID: PMC9498645 DOI: 10.3390/genes13091538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 08/18/2022] [Accepted: 08/23/2022] [Indexed: 12/04/2022] Open
Abstract
Early detection of cancer facilitates treatment and improves patient survival. We hypothesized that molecular biomarkers of cancer could be rationally predicted based on even partial knowledge of transcriptional regulation, functional pathways and gene co-expression networks. To test our data mining approach, we focused on breast cancer, as one of the best-studied models of this disease. We were particularly interested to check whether such a ‘guilt by association’ approach would lead to pan-cancer markers generally known in the field or whether molecular subtype-specific ‘seed’ markers will yield subtype-specific extended sets of breast cancer markers. The key challenge of this investigation was to utilize a small number of well-characterized, largely intracellular, breast cancer-related proteins to uncover similarly regulated and functionally related genes and proteins with the view to predicting a much-expanded range of disease markers, especially that of extracellular molecular markers, potentially suitable for the early non-invasive detection of the disease. We selected 23 previously characterized proteins specific to three major molecular subtypes of breast cancer and analyzed their established transcription factor networks, their known metabolic and functional pathways and the existing experimentally derived protein co-expression data. Having started with largely intracellular and transmembrane marker ‘seeds’ we predicted the existence of as many as 150 novel biomarker genes to be associated with the selected three major molecular sub-types of breast cancer all coding for extracellularly targeted or secreted proteins and therefore being potentially most suitable for molecular diagnosis of the disease. Of the 150 such predicted protein markers, 114 were predicted to be linked through the combination of regulatory networks to basal breast cancer, 48 to luminal and 7 to Her2-positive breast cancer. The reported approach to mining molecular markers is not limited to breast cancer and therefore offers a widely applicable strategy of biomarker mining.
Collapse
Affiliation(s)
- Nathalie Schneider
- Department of Biological Sciences, Royal Holloway University of London, Egham, Surrey TW20 0EX, UK
| | - Ellen Reed
- Department of Biological Sciences, Royal Holloway University of London, Egham, Surrey TW20 0EX, UK
| | - Faddy Kamel
- Department of Biological Sciences, Royal Holloway University of London, Egham, Surrey TW20 0EX, UK
| | - Enrico Ferrari
- School of Life Sciences, University of Lincoln, Lincoln LN6 7TS, UK
| | - Mikhail Soloviev
- Department of Biological Sciences, Royal Holloway University of London, Egham, Surrey TW20 0EX, UK
- Correspondence:
| |
Collapse
|
4
|
Yu S, Drton M, Promislow DEL, Shojaie A. CorDiffViz: an R package for visualizing multi-omics differential correlation networks. BMC Bioinformatics 2021; 22:486. [PMID: 34627139 PMCID: PMC8501646 DOI: 10.1186/s12859-021-04383-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 09/20/2021] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Differential correlation networks are increasingly used to delineate changes in interactions among biomolecules. They characterize differences between omics networks under two different conditions, and can be used to delineate mechanisms of disease initiation and progression. RESULTS We present a new R package, CorDiffViz, that facilitates the estimation and visualization of differential correlation networks using multiple correlation measures and inference methods. The software is implemented in R, HTML and Javascript, and is available at https://github.com/sqyu/CorDiffViz . Visualization has been tested for the Chrome and Firefox web browsers. A demo is available at https://diffcornet.github.io/CorDiffViz/demo.html . CONCLUSIONS Our software offers considerable flexibility by allowing the user to interact with the visualization and choose from different estimation methods and visualizations. It also allows the user to easily toggle between correlation networks for samples under one condition and differential correlations between samples under two conditions. Moreover, the software facilitates integrative analysis of cross-correlation networks between two omics data sets.
Collapse
Affiliation(s)
- Shiqing Yu
- Department of Statistics, University of Washington, NE Stevens Way, Seattle, WA, 98195, USA.
| | - Mathias Drton
- Department of Mathematics, Technical University of Munich, Boltzmannstraße, 85748, Garching bei München, Germany
| | - Daniel E L Promislow
- Departments of Pathology and Biology, University of Washington, NE Pacific St, Seattle, WA, 98195, USA
| | - Ali Shojaie
- Department of Biostatistics, University of Washington, NE Pacific St, Seattle, WA, 98195, USA
| |
Collapse
|
5
|
Signorelli M, Cutillo L. On community structure validation in real networks. Comput Stat 2021. [DOI: 10.1007/s00180-021-01156-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractCommunity structure is a commonly observed feature of real networks. The term refers to the presence in a network of groups of nodes (communities) that feature high internal connectivity, but are poorly connected between each other. Whereas the issue of community detection has been addressed in several works, the problem of validating a partition of nodes as a good community structure for a real network has received considerably less attention and remains an open issue. We propose a set of indices for community structure validation of network partitions that are based on an hypothesis testing procedure that assesses the distribution of links between and within communities. Using both simulations and real data, we illustrate how the proposed indices can be employed to compare the adequacy of different partitions of nodes as community structures in a given network, to assess whether two networks share the same or similar community structures, and to evaluate the performance of different network clustering algorithms.
Collapse
|
6
|
Shojaie A. Differential Network Analysis: A Statistical Perspective. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2021; 13:e1508. [PMID: 37050915 PMCID: PMC10088462 DOI: 10.1002/wics.1508] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 03/03/2020] [Indexed: 11/06/2022]
Abstract
Networks effectively capture interactions among components of complex systems, and have thus become a mainstay in many scientific disciplines. Growing evidence, especially from biology, suggest that networks undergo changes over time, and in response to external stimuli. In biology and medicine, these changes have been found to be predictive of complex diseases. They have also been used to gain insight into mechanisms of disease initiation and progression. Primarily motivated by biological applications, this article provides a review of recent statistical machine learning methods for inferring networks and identifying changes in their structures.
Collapse
Affiliation(s)
- Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle WA
| |
Collapse
|
7
|
Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021; 36:89-108. [PMID: 34305304 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, Australia
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
| | | | - Haiyan Huang
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
8
|
Lim JT, Chen C, Grant AD, Padi M. Generating Ensembles of Gene Regulatory Networks to Assess Robustness of Disease Modules. Front Genet 2021; 11:603264. [PMID: 33519907 PMCID: PMC7841433 DOI: 10.3389/fgene.2020.603264] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Accepted: 12/23/2020] [Indexed: 12/24/2022] Open
Abstract
The use of biological networks such as protein-protein interaction and transcriptional regulatory networks is becoming an integral part of genomics research. However, these networks are not static, and during phenotypic transitions like disease onset, they can acquire new "communities" (or highly interacting groups) of genes that carry out cellular processes. Disease communities can be detected by maximizing a modularity-based score, but since biological systems and network inference algorithms are inherently noisy, it remains a challenge to determine whether these changes represent real cellular responses or whether they appeared by random chance. Here, we introduce Constrained Random Alteration of Network Edges (CRANE), a method for randomizing networks with fixed node strengths. CRANE can be used to generate a null distribution of gene regulatory networks that can in turn be used to rank the most significant changes in candidate disease communities. Compared to other approaches, such as consensus clustering or commonly used generative models, CRANE emulates biologically realistic networks and recovers simulated disease modules with higher accuracy. When applied to breast and ovarian cancer networks, CRANE improves the identification of cancer-relevant GO terms while reducing the signal from non-specific housekeeping processes.
Collapse
Affiliation(s)
- James T. Lim
- Department of Molecular and Cellular Biology, The University of Arizona, Tucson, AZ, United States
| | - Chen Chen
- Department of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, The University of Arizona, Tucson, AZ, United States
| | - Adam D. Grant
- University of Arizona Cancer Center, The University of Arizona, Tucson, AZ, United States
| | - Megha Padi
- Department of Molecular and Cellular Biology, The University of Arizona, Tucson, AZ, United States
- University of Arizona Cancer Center, The University of Arizona, Tucson, AZ, United States
| |
Collapse
|
9
|
Lin W, Ji J, Zhu Y, Li M, Zhao J, Xue F, Yuan Z. PMINR: Pointwise Mutual Information-Based Network Regression - With Application to Studies of Lung Cancer and Alzheimer's Disease. Front Genet 2020; 11:556259. [PMID: 33193633 PMCID: PMC7594515 DOI: 10.3389/fgene.2020.556259] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 08/12/2020] [Indexed: 11/13/2022] Open
Abstract
Complex diseases are believed to be the consequence of intracellular network(s) involving a range of factors. An improved understanding of a disease-predisposing biological network could lead to better identification of genes and pathways that confer disease risk and therefore inform drug development. The group difference in biological networks, as is often characterized by graphs of nodes and edges, is attributable to effects of these nodes and edges. Here we introduced pointwise mutual information (PMI) as a measure of the connection between a pair of nodes with either a linear relationship or nonlinear dependence. We then proposed a PMI-based network regression (PMINR) model to differentiate patterns of network changes (in node or edge) linking a disease outcome. Through simulation studies with various sample sizes and inter-node correlation structures, we showed that PMINR can accurately identify these changes with higher power than current methods and be robust to the network topology. Finally, we illustrated, with publicly available data on lung cancer and gene methylation data on aging and Alzheimer’s disease, an evaluation of the practical performance of PMINR. We concluded that PMI is able to capture the generic inter-node correlation pattern in biological networks, and PMINR is a powerful and efficient approach for biological network analysis.
Collapse
Affiliation(s)
- Weiqiang Lin
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Jiadong Ji
- Department of Data Science, School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | - Yuchen Zhu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Mingzhuo Li
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Jinghua Zhao
- Cardiovasucular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| |
Collapse
|
10
|
Chowdhury HA, Bhattacharyya DK, Kalita JK. (Differential) Co-Expression Analysis of Gene Expression: A Survey of Best Practices. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1154-1173. [PMID: 30668502 DOI: 10.1109/tcbb.2019.2893170] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Analysis of gene expression data is widely used in transcriptomic studies to understand functions of molecules inside a cell and interactions among molecules. Differential co-expression analysis studies diseases and phenotypic variations by finding modules of genes whose co-expression patterns vary across conditions. We review the best practices in gene expression data analysis in terms of analysis of (differential) co-expression, co-expression network, differential networking, and differential connectivity considering both microarray and RNA-seq data along with comparisons. We highlight hurdles in RNA-seq data analysis using methods developed for microarrays. We include discussion of necessary tools for gene expression analysis throughout the paper. In addition, we shed light on scRNA-seq data analysis by including preprocessing and scRNA-seq in co-expression analysis along with useful tools specific to scRNA-seq. To get insights, biological interpretation and functional profiling is included. Finally, we provide guidelines for the analyst, along with research issues and challenges which should be addressed.
Collapse
|
11
|
Basha O, Argov CM, Artzy R, Zoabi Y, Hekselman I, Alfandari L, Chalifa-Caspi V, Yeger-Lotem E. Differential network analysis of multiple human tissue interactomes highlights tissue-selective processes and genetic disorder genes. Bioinformatics 2020; 36:2821-2828. [DOI: 10.1093/bioinformatics/btaa034] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 01/07/2020] [Accepted: 01/16/2020] [Indexed: 01/19/2023] Open
Abstract
Abstract
Motivation
Differential network analysis, designed to highlight network changes between conditions, is an important paradigm in network biology. However, differential network analysis methods have been typically designed to compare between two conditions and were rarely applied to multiple protein interaction networks (interactomes). Importantly, large-scale benchmarks for their evaluation have been lacking.
Results
Here, we present a framework for assessing the ability of differential network analysis of multiple human tissue interactomes to highlight tissue-selective processes and disorders. For this, we created a benchmark of 6499 curated tissue-specific Gene Ontology biological processes. We applied five methods, including four differential network analysis methods, to construct weighted interactomes for 34 tissues. Rigorous assessment of this benchmark revealed that differential analysis methods perform well in revealing tissue-selective processes (AUCs of 0.82–0.9). Next, we applied differential network analysis to illuminate the genes underlying tissue-selective hereditary disorders. For this, we curated a dataset of 1305 tissue-specific hereditary disorders and their manifesting tissues. Focusing on subnetworks containing the top 1% differential interactions in disease-relevant tissue interactomes revealed significant enrichment for disorder-causing genes in 18.6% of the cases, with a significantly high success rate for blood, nerve, muscle and heart diseases.
Summary
Altogether, we offer a framework that includes expansive manually curated datasets of tissue-selective processes and disorders to be used as benchmarks or to illuminate tissue-selective processes and genes. Our results demonstrate that differential analysis of multiple human tissue interactomes is a powerful tool for highlighting processes and genes with tissue-selective functionality and clinical impact.
Availability and implementation
Datasets are available as part of the Supplementary data.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Omer Basha
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences
| | - Chanan M Argov
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences
| | - Raviv Artzy
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences
| | - Yazeed Zoabi
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences
| | - Idan Hekselman
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences
| | - Liad Alfandari
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences
| | - Vered Chalifa-Caspi
- National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| | - Esti Yeger-Lotem
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences
- National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| |
Collapse
|
12
|
Zhang Q. Direct estimation of differential networks under high‐dimensional nonparanormal graphical models. CAN J STAT 2019. [DOI: 10.1002/cjs.11526] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Qingyang Zhang
- Department of Mathematical SciencesUniversity of ArkansasFayetteville AR U.S.A
| |
Collapse
|
13
|
Chen H, He Y, Ji J, Shi Y. A Machine Learning Method for Identifying Critical Interactions Between Gene Pairs in Alzheimer's Disease Prediction. Front Neurol 2019; 10:1162. [PMID: 31736866 PMCID: PMC6834789 DOI: 10.3389/fneur.2019.01162] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Accepted: 10/15/2019] [Indexed: 12/26/2022] Open
Abstract
Background: Alzheimer's disease (AD) is the most common type of dementia. Scientists have discovered that the causes of AD may include a combination of genetic, lifestyle, and environmental factors, but the exact cause has not yet been elucidated. Effective strategies to prevent and treat AD therefore remain elusive. The identified genetic causes of AD mainly focus on individual genes, but growing evidence has shown that complex diseases are usually affected by the interaction of genes in a network. Few studies have focused on the interactions and correlations between genes and how they are gradually destroyed or disappear during AD progression. A differential network analysis has been recognized as an essential tool for identifying the underlying pathogenic mechanisms and significant genes for prediction analysis. We therefore aim to conduct a differential network analysis to reveal potential networks involved in the neuropathogenesis of AD and identify genes for AD prediction. Methods: In this paper, we selected 365 samples from the Religious Orders Study and the Rush Memory and Aging Project, including 193 clinically and neuropathologically confirmed AD subjects and 172 no cognitive impairment (NCI) controls. Then, we selected 158 genes belonging to the AD pathway (hsa05010) of the Kyoto Encyclopedia of Genes and Genomes. We employed a machine learning method, namely, joint density-based non-parametric differential interaction network analysis and classification (JDINAC), in the analysis of gene expression data (RNA-seq data). We searched for the differential networks in the RNA-seq data with a pathological diagnosis of AD. Finally, an optimal prediction model was built through cross-validation, which showed good discrimination and calibration for AD prediction. Results: We used JDINAC to derive a gene co-expression network and to explore the relationship between the interaction of gene pairs and AD, and the top 10 differential gene pairs were identified. We then compared the prediction performance between JDINAC and individual genes based on prediction methods. JDINAC provides better accuracy of classification than the latest methods, such as random forest and penalized logistic regression. Conclusions: The interaction between gene pairs is related to AD and can provide more insight than the individual genes in AD prediction.
Collapse
Affiliation(s)
- Hao Chen
- School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | - Yong He
- School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | - Jiadong Ji
- School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | - Yufeng Shi
- School of Statistics, Shandong University of Finance and Economics, Jinan, China
- Institute for Financial Studies and School of Mathematics, Shandong University, Jinan, China
| |
Collapse
|
14
|
De Bastiani MA, Klamt F. Integrated transcriptomics reveals master regulators of lung adenocarcinoma and novel repositioning of drug candidates. Cancer Med 2019; 8:6717-6729. [PMID: 31503425 PMCID: PMC6825976 DOI: 10.1002/cam4.2493] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 07/18/2019] [Accepted: 07/31/2019] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Lung adenocarcinoma is the major cause of cancer-related deaths in the world. Given this, the importance of research on its pathophysiology and therapy remains a key health issue. To assist in this endeavor, recent oncology studies are adopting Systems Biology approaches and bioinformatics to analyze and understand omics data, bringing new insights about this disease and its treatment. METHODS We used reverse engineering of transcriptomic data to reconstruct nontumorous lung reference networks, focusing on transcription factors (TFs) and their inferred target genes, referred as regulatory units or regulons. Afterwards, we used 13 case-control studies to identify TFs acting as master regulators of the disease and their regulatory units. Furthermore, the inferred activation patterns of regulons were used to evaluate patient survival and search drug candidates for repositioning. RESULTS The regulatory units under the influence of ATOH8, DACH1, EPAS1, ETV5, FOXA2, FOXM1, HOXA4, SMAD6, and UHRF1 transcription factors were consistently associated with the pathological phenotype, suggesting that they may be master regulators of lung adenocarcinoma. We also observed that the inferred activity of FOXA2, FOXM1, and UHRF1 was significantly associated with risk of death in patients. Finally, we obtained deptropine, promazine, valproic acid, azacyclonol, methotrexate, and ChemBridge ID compound 5109870 as potential candidates to revert the molecular profile leading to decreased survival. CONCLUSION Using an integrated transcriptomics approach, we identified master regulator candidates involved with the development and prognostic of lung adenocarcinoma, as well as potential drugs for repurposing.
Collapse
Affiliation(s)
- Marco Antônio De Bastiani
- Laboratory of Cellular Biochemistry, Department of Biochemistry, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, RS, Brazil.,National Institute of Science and Technology for Translational Medicine (INCT-TM), Porto Alegre, RS, Brazil
| | - Fábio Klamt
- Laboratory of Cellular Biochemistry, Department of Biochemistry, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, RS, Brazil.,National Institute of Science and Technology for Translational Medicine (INCT-TM), Porto Alegre, RS, Brazil
| |
Collapse
|
15
|
Tang Z, Yu Z, Wang C. A fast iterative algorithm for high-dimensional differential network. Comput Stat 2019. [DOI: 10.1007/s00180-019-00915-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
16
|
Gambardella G, di Bernardo D. A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining. Front Genet 2019; 10:734. [PMID: 31447887 PMCID: PMC6696874 DOI: 10.3389/fgene.2019.00734] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Accepted: 07/12/2019] [Indexed: 11/28/2022] Open
Abstract
Gene expression in individual cells can now be measured for thousands of cells in a single experiment thanks to innovative sample-preparation and sequencing technologies. State-of-the-art computational pipelines for single-cell RNA-sequencing data, however, still employ computational methods that were developed for traditional bulk RNA-sequencing data, thus not accounting for the peculiarities of single-cell data, such as sparseness and zero-inflated counts. Here, we present a ready-to-use pipeline named gf-icf (gene frequency–inverse cell frequency) for normalization of raw counts, feature selection, and dimensionality reduction of scRNA-seq data for their visualization and subsequent analyses. Our work is based on a data transformation model named term frequency–inverse document frequency (TF-IDF), which has been extensively used in the field of text mining where extremely sparse and zero-inflated data are common. Using benchmark scRNA-seq datasets, we show that the gf-icf pipeline outperforms existing state-of-the-art methods in terms of improved visualization and ability to separate and distinguish different cell types.
Collapse
Affiliation(s)
- Gennaro Gambardella
- University of Naples Federico II, Department of Chemical Materials and Industrial Engineering, Naples, Italy.,Telethon Institute of Genetics and Medicine, Naples, Italy
| | - Diego di Bernardo
- University of Naples Federico II, Department of Chemical Materials and Industrial Engineering, Naples, Italy.,Telethon Institute of Genetics and Medicine, Naples, Italy
| |
Collapse
|
17
|
Erola P, Bonnet E, Michoel T. Learning Differential Module Networks Across Multiple Experimental Conditions. Methods Mol Biol 2019; 1883:303-321. [PMID: 30547406 DOI: 10.1007/978-1-4939-8882-2_13] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Module network inference is a statistical method to reconstruct gene regulatory networks, which uses probabilistic graphical models to learn modules of coregulated genes and their upstream regulatory programs from genome-wide gene expression and other omics data. Here, we review the basic theory of module network inference, present protocols for common gene regulatory network reconstruction scenarios based on the Lemon-Tree software, and show, using human gene expression data, how the software can also be applied to learn differential module networks across multiple experimental conditions.
Collapse
Affiliation(s)
- Pau Erola
- Division of Genetics and Genomics, Roslin Institute, University of Edinburgh, Midlothian, Scotland, UK
| | - Eric Bonnet
- Centre National de Recherche en Génomique Humaine, Institut de Biologie François Jacob, Direction de la Recherche Fondamentale, CEA, Evry, France
| | - Tom Michoel
- Division of Genetics and Genomics, The Roslin Institute, University of Edinburgh, Midlothian, Scotland, UK.
- Current Address: Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway.
| |
Collapse
|
18
|
Ji J, He D, Feng Y, He Y, Xue F, Xie L. JDINAC: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data. Bioinformatics 2018; 33:3080-3087. [PMID: 28582486 DOI: 10.1093/bioinformatics/btx360] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 06/01/2017] [Indexed: 12/26/2022] Open
Abstract
Motivation A complex disease is usually driven by a number of genes interwoven into networks, rather than a single gene product. Network comparison or differential network analysis has become an important means of revealing the underlying mechanism of pathogenesis and identifying clinical biomarkers for disease classification. Most studies, however, are limited to network correlations that mainly capture the linear relationship among genes, or rely on the assumption of a parametric probability distribution of gene measurements. They are restrictive in real application. Results We propose a new Joint density based non-parametric Differential Interaction Network Analysis and Classification (JDINAC) method to identify differential interaction patterns of network activation between two groups. At the same time, JDINAC uses the network biomarkers to build a classification model. The novelty of JDINAC lies in its potential to capture non-linear relations between molecular interactions using high-dimensional sparse data as well as to adjust confounding factors, without the need of the assumption of a parametric probability distribution of gene measurements. Simulation studies demonstrate that JDINAC provides more accurate differential network estimation and lower classification error than that achieved by other state-of-the-art methods. We apply JDINAC to a Breast Invasive Carcinoma dataset, which includes 114 patients who have both tumor and matched normal samples. The hub genes and differential interaction patterns identified were consistent with existing experimental studies. Furthermore, JDINAC discriminated the tumor and normal sample with high accuracy by virtue of the identified biomarkers. JDINAC provides a general framework for feature selection and classification using high-dimensional sparse omics data. Availability and implementation R scripts available at https://github.com/jijiadong/JDINAC. Contact lxie@iscb.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiadong Ji
- Department of Mathematical Statistics, School of Statistics, Shandong University of Finance and Economics, Jinan 250014, China
| | - Di He
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, NY 10016, USA
| | - Yang Feng
- Department of Statistics, Columbia University, New York, NY 10027, USA
| | - Yong He
- Department of Mathematical Statistics, School of Statistics, Shandong University of Finance and Economics, Jinan 250014, China
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Shandong University, Jinan 250012, China
| | - Lei Xie
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, NY 10016, USA.,Department of Computer Science, Hunter College, The City University of New York, NY 10065, USA
| |
Collapse
|
19
|
Jung S, Hartmann A, Del Sol A. RefBool: a reference-based algorithm for discretizing gene expression data. Bioinformatics 2018; 33:1953-1962. [PMID: 28334101 DOI: 10.1093/bioinformatics/btx111] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Accepted: 02/21/2017] [Indexed: 12/26/2022] Open
Abstract
Motivation The identification of genes or molecular regulatory mechanisms implicated in biological processes often requires the discretization, and in particular booleanization, of gene expression measurements. However, currently used methods mostly classify each measurement into an active or inactive state regardless of its statistical support possibly leading to downstream analysis conclusions based on spurious booleanization results. Results In order to overcome the lack of certainty inherent in current methodologies and to improve the process of discretization, we introduce RefBool, a reference-based algorithm for discretizing gene expression data. Instead of requiring each measurement to be classified as active or inactive, RefBool allows for the classification of a third state that can be interpreted as an intermediate expression of genes. Furthermore, each measurement is associated to a p- and q-value indicating the significance of each classification. Validation of RefBool on a neuroepithelial differentiation study and subsequent qualitative and quantitative comparison against 10 currently used methods supports its advantages and shows clear improvements of resulting clusterings. Availability and Implementation The software is available as MATLAB files in the Supplementary Information and as an online repository ( https://github.com/saschajung/RefBool ). Contact antonio.delsol@uni.lu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sascha Jung
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, Luxembourg
| | - Andras Hartmann
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, Luxembourg
| | - Antonio Del Sol
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, Luxembourg
| |
Collapse
|
20
|
Detecting phenotype-driven transitions in regulatory network structure. NPJ Syst Biol Appl 2018; 4:16. [PMID: 29707235 PMCID: PMC5908977 DOI: 10.1038/s41540-018-0052-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Revised: 03/29/2018] [Accepted: 04/02/2018] [Indexed: 12/05/2022] Open
Abstract
Complex traits and diseases like human height or cancer are often not caused by a single mutation or genetic variant, but instead arise from functional changes in the underlying molecular network. Biological networks are known to be highly modular and contain dense “communities” of genes that carry out cellular processes, but these structures change between tissues, during development, and in disease. While many methods exist for inferring networks and analyzing their topologies separately, there is a lack of robust methods for quantifying differences in network structure. Here, we describe ALPACA (ALtered Partitions Across Community Architectures), a method for comparing two genome-scale networks derived from different phenotypic states to identify condition-specific modules. In simulations, ALPACA leads to more nuanced, sensitive, and robust module discovery than currently available network comparison methods. As an application, we use ALPACA to compare transcriptional networks in three contexts: angiogenic and non-angiogenic subtypes of ovarian cancer, human fibroblasts expressing transforming viral oncogenes, and sexual dimorphism in human breast tissue. In each case, ALPACA identifies modules enriched for processes relevant to the phenotype. For example, modules specific to angiogenic ovarian tumors are enriched for genes associated with blood vessel development, and modules found in female breast tissue are enriched for genes involved in estrogen receptor and ERK signaling. The functional relevance of these new modules suggests that not only can ALPACA identify structural changes in complex networks, but also that these changes may be relevant for characterizing biological phenotypes. Cells are controlled by complex regulatory networks, and disruptions in the structure of these networks can lead to disease. Understanding disease requires that we accurately identify changes in gene regulatory network structure. However, cellular networks have tens of thousands of components with complex connections between them. Megha Padi from the University of Arizona and John Quackenbush from Dana-Farber Cancer Institute developed a new algorithm that is far more effective than previous methods at finding disease-associated modules in regulatory networks. Applying this to ovarian cancer, they found new regulatory processes that may lead to more targeted treatments. In human breast tissue, they found that sex-specific differences were driven by hormone signaling and differentiation pathways. Decoding how network modules promote new functions may help to better model the relationship between genotype and phenotype.
Collapse
|
21
|
Singh AJ, Ramsey SA, Filtz TM, Kioussi C. Differential gene regulatory networks in development and disease. Cell Mol Life Sci 2018; 75:1013-1025. [PMID: 29018868 PMCID: PMC11105524 DOI: 10.1007/s00018-017-2679-6] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Revised: 09/19/2017] [Accepted: 10/04/2017] [Indexed: 02/02/2023]
Abstract
Gene regulatory networks, in which differential expression of regulator genes induce differential expression of their target genes, underlie diverse biological processes such as embryonic development, organ formation and disease pathogenesis. An archetypical systems biology approach to mapping these networks involves the combined application of (1) high-throughput sequencing-based transcriptome profiling (RNA-seq) of biopsies under diverse network perturbations and (2) network inference based on gene-gene expression correlation analysis. The comparative analysis of such correlation networks across cell types or states, differential correlation network analysis, can identify specific molecular signatures and functional modules that underlie the state transition or have context-specific function. Here, we review the basic concepts of network biology and correlation network inference, and the prevailing methods for differential analysis of correlation networks. We discuss applications of gene expression network analysis in the context of embryonic development, cancer, and congenital diseases.
Collapse
Affiliation(s)
- Arun J Singh
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, OR, 97331, USA
| | - Stephen A Ramsey
- Department of Biomedical Sciences, College of Veterinary Medicine, Oregon State University, Corvallis, OR, 97331, USA
- School of Electrical Engineering and Computer Science, College of Engineering, Oregon State University, Corvallis, OR, 97331, USA
| | - Theresa M Filtz
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, OR, 97331, USA
| | - Chrissa Kioussi
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, OR, 97331, USA.
| |
Collapse
|
22
|
Basha O, Shpringer R, Argov CM, Yeger-Lotem E. The DifferentialNet database of differential protein-protein interactions in human tissues. Nucleic Acids Res 2018; 46:D522-D526. [PMID: 29069447 PMCID: PMC5753382 DOI: 10.1093/nar/gkx981] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Revised: 09/19/2017] [Accepted: 10/10/2017] [Indexed: 11/22/2022] Open
Abstract
DifferentialNet is a novel database that provides users with differential interactome analysis of human tissues (http://netbio.bgu.ac.il/diffnet/). Users query DifferentialNet by protein, and retrieve its differential protein-protein interactions (PPIs) per tissue via an interactive graphical interface. To compute differential PPIs, we integrated available data of experimentally detected PPIs with RNA-sequencing profiles of tens of human tissues gathered by the Genotype-Tissue Expression consortium (GTEx) and by the Human Protein Atlas (HPA). We associated each PPI with a score that reflects whether its corresponding genes were expressed similarly across tissues, or were up- or down-regulated in the selected tissue. By this, users can identify tissue-specific interactions, filter out PPIs that are relatively stable across tissues, and highlight PPIs that show relative changes across tissues. The differential PPIs can be used to identify tissue-specific processes and to decipher tissue-specific phenotypes. Moreover, they unravel processes that are tissue-wide yet tailored to the specific demands of each tissue.
Collapse
Affiliation(s)
- Omer Basha
- Department of Clinical Biochemistry & Pharmacology, Faculty of Health Sciences
| | - Rotem Shpringer
- Department of Clinical Biochemistry & Pharmacology, Faculty of Health Sciences
| | - Chanan M Argov
- Department of Clinical Biochemistry & Pharmacology, Faculty of Health Sciences
| | - Esti Yeger-Lotem
- Department of Clinical Biochemistry & Pharmacology, Faculty of Health Sciences
- National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| |
Collapse
|
23
|
Kim Y, Hao J, Gautam Y, Mersha TB, Kang M. DiffGRN: differential gene regulatory network analysis. INT J DATA MIN BIOIN 2018; 20:362-379. [PMID: 31114627 DOI: 10.1504/ijdmb.2018.094891] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Identification of differential gene regulators with significant changes under disparate conditions is essential to understand complex biological mechanism in a disease. Differential Network Analysis (DiNA) examines different biological processes based on gene regulatory networks that represent regulatory interactions between genes with a graph model. While most studies in DiNA have considered correlation-based inference to construct gene regulatory networks from gene expression data due to its intuitive representation and simple implementation, the approach lacks in the representation of causal effects and multivariate effects between genes. In this paper, we propose an approach named Differential Gene Regulatory Network (DiffGRN) that infers differential gene regulation between two groups. We infer gene regulatory networks of two groups using Random LASSO, and then we identify differential gene regulations by the proposed significance test. The advantages of DiffGRN are to capture multivariate effects of genes that regulate a gene simultaneously, to identify causality of gene regulations, and to discover differential gene regulators between regression-based gene regulatory networks. We assessed DiffGRN by simulation experiments and showed its outstanding performance than the current state-of-the-art correlation-based method, DINGO. DiffGRN is applied to gene expression data in asthma. The DiNA with asthma data showed a number of gene regulations, such as ADAM12 and RELB, reported in biological literature.
Collapse
Affiliation(s)
- Youngsoon Kim
- Department of Computer Science, Kennesaw State University, Marietta, GA, USA
| | - Jie Hao
- Analytics and Data Science Institute, Kennesaw State University, Kennesaw, GA, USA
| | - Yadu Gautam
- Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | - Tesfaye B Mersha
- Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | - Mingon Kang
- Department of Computer Science, Kennesaw State University, Marietta, GA, USA
| |
Collapse
|
24
|
Gonzalez-Valbuena EE, Treviño V. Metrics to estimate differential co-expression networks. BioData Min 2017; 10:32. [PMID: 29151892 PMCID: PMC5681815 DOI: 10.1186/s13040-017-0152-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 10/30/2017] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Detecting the differences in gene expression data is important for understanding the underlying molecular mechanisms. Although the differentially expressed genes are a large component, differences in correlation are becoming an interesting approach to achieving deeper insights. However, diverse metrics have been used to detect differential correlation, making selection and use of a single metric difficult. In addition, available implementations are metric-specific, complicating their use in different contexts. Moreover, because the analyses in the literature have been performed on real data, there are uncertainties regarding the performance of metrics and procedures. RESULTS In this work, we compare four novel and two previously proposed metrics to detect differential correlations. We generated well-controlled datasets into which differences in correlations were carefully introduced by controlled multivariate normal correlation networks and addition of noise. The comparisons were performed on three datasets derived from real tumor data. Our results show that metrics differ in their detection performance and computational time. No single metric was the best in all datasets, but trends show that three metrics are highly correlated and are very good candidates for real data analysis. In contrast, other metrics proposed in the literature seem to show low performance and different detections. Overall, our results suggest that metrics that do not filter correlations perform better. We also show an additional analysis of TCGA breast cancer subtypes. CONCLUSIONS We show a methodology to generate controlled datasets for the objective evaluation of differential correlation pipelines, and compare the performance of several metrics. We implemented in R a package called DifCoNet that can provide easy-to-use functions for differential correlation analyses.
Collapse
Affiliation(s)
| | - Víctor Treviño
- Cátedra de Bioinformática, Escuela de Medicina, Tecnológico de Monterrey, 64710 Monterrey, Nuevo León Mexico
| |
Collapse
|
25
|
Steinhoff G, Nesteruk J, Wolfien M, Große J, Ruch U, Vasudevan P, Müller P. Stem cells and heart disease - Brake or accelerator? Adv Drug Deliv Rev 2017; 120:2-24. [PMID: 29054357 DOI: 10.1016/j.addr.2017.10.007] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Revised: 10/12/2017] [Accepted: 10/13/2017] [Indexed: 12/11/2022]
Abstract
After two decades of intensive research and attempts of clinical translation, stem cell based therapies for cardiac diseases are not getting closer to clinical success. This review tries to unravel the obstacles and focuses on underlying mechanisms as the target for regenerative therapies. At present, the principal outcome in clinical therapy does not reflect experimental evidence. It seems that the scientific obstacle is a lack of integration of knowledge from tissue repair and disease mechanisms. Recent insights from clinical trials delineate mechanisms of stem cell dysfunction and gene defects in repair mechanisms as cause of atherosclerosis and heart disease. These findings require a redirection of current practice of stem cell therapy and a reset using more detailed analysis of stem cell function interfering with disease mechanisms. To accelerate scientific development the authors suggest intensifying unified computational data analysis and shared data knowledge by using open-access data platforms.
Collapse
Affiliation(s)
- Gustav Steinhoff
- University Medicine Rostock, Department of Cardiac Surgery, Reference and Translation Center for Cardiac Stem Cell Therapy, University Medical Center Rostock, Schillingallee 35, 18055 Rostock, Germany.
| | - Julia Nesteruk
- University Medicine Rostock, Department of Cardiac Surgery, Reference and Translation Center for Cardiac Stem Cell Therapy, University Medical Center Rostock, Schillingallee 35, 18055 Rostock, Germany.
| | - Markus Wolfien
- University Rostock, Institute of Computer Science, Department of Systems Biology and Bioinformatics, Ulmenstraße 69, 18057 Rostock, Germany.
| | - Jana Große
- University Medicine Rostock, Department of Cardiac Surgery, Reference and Translation Center for Cardiac Stem Cell Therapy, University Medical Center Rostock, Schillingallee 35, 18055 Rostock, Germany.
| | - Ulrike Ruch
- University Medicine Rostock, Department of Cardiac Surgery, Reference and Translation Center for Cardiac Stem Cell Therapy, University Medical Center Rostock, Schillingallee 35, 18055 Rostock, Germany.
| | - Praveen Vasudevan
- University Medicine Rostock, Department of Cardiac Surgery, Reference and Translation Center for Cardiac Stem Cell Therapy, University Medical Center Rostock, Schillingallee 35, 18055 Rostock, Germany.
| | - Paula Müller
- University Medicine Rostock, Department of Cardiac Surgery, Reference and Translation Center for Cardiac Stem Cell Therapy, University Medical Center Rostock, Schillingallee 35, 18055 Rostock, Germany.
| |
Collapse
|
26
|
Ayyildiz D, Gov E, Sinha R, Arga KY. Ovarian Cancer Differential Interactome and Network Entropy Analysis Reveal New Candidate Biomarkers. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2017; 21:285-294. [PMID: 28375712 DOI: 10.1089/omi.2017.0010] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Ovarian cancer is one of the most common cancers and has a high mortality rate due to insidious symptoms and lack of robust diagnostics. A hitherto understudied concept in cancer pathogenesis may offer new avenues for innovation in ovarian cancer biomarker development. Cancer cells are characterized by an increase in network entropy, and several studies have exploited this concept to identify disease-associated gene and protein modules. We report in this study the changes in protein-protein interactions (PPIs) in ovarian cancer within a differential network (interactome) analysis framework utilizing the entropy concept and gene expression data. A compendium of six transcriptome datasets that included 140 samples from laser microdissected epithelial cells of ovarian cancer patients and 51 samples from healthy population was obtained from Gene Expression Omnibus, and the high confidence human protein interactome (31,465 interactions among 10,681 proteins) was used. The uncertainties of the up- or downregulation of PPIs in ovarian cancer were estimated through an entropy formulation utilizing combined expression levels of genes, and the interacting protein pairs with minimum uncertainty were identified. We identified 105 proteins with differential PPI patterns scattered in 11 modules, each indicating significantly affected biological pathways in ovarian cancer such as DNA repair, cell proliferation-related mechanisms, nucleoplasmic translocation of estrogen receptor, extracellular matrix degradation, and inflammation response. In conclusion, we suggest several PPIs as biomarker candidates for ovarian cancer and discuss their future biological implications as potential molecular targets for pharmaceutical development as well. In addition, network entropy analysis is a concept that deserves greater research attention for diagnostic innovation in oncology and tumor pathogenesis.
Collapse
Affiliation(s)
- Dilara Ayyildiz
- 1 Department of Bioengineering, Marmara University , Istanbul, Turkey .,2 Department of Biomedical Sciences and Biotechnology, University of Udine , Udine, Italy
| | - Esra Gov
- 1 Department of Bioengineering, Marmara University , Istanbul, Turkey .,3 Department of Bioengineering, Adana Science and Technology University , Adana, Turkey
| | - Raghu Sinha
- 4 Department of Biochemistry and Molecular Biology, Penn State College of Medicine , Hershey, Pennsylvania
| | - Kazim Yalcin Arga
- 1 Department of Bioengineering, Marmara University , Istanbul, Turkey
| |
Collapse
|
27
|
Li Q, Li J, Dai W, Li YX, Li YY. Differential regulation analysis reveals dysfunctional regulatory mechanism involving transcription factors and microRNAs in gastric carcinogenesis. Artif Intell Med 2017; 77:12-22. [PMID: 28545608 DOI: 10.1016/j.artmed.2017.02.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Revised: 02/23/2017] [Accepted: 02/23/2017] [Indexed: 12/12/2022]
Abstract
Gastric cancer (GC) is one of the most incident malignancies in the world. Although lots of featured genes and microRNAs (miRNAs) have been identified to be associated with gastric carcinogenesis, underlying regulatory mechanisms still remain unclear. In order to explore the dysfunctional mechanisms of GC, we developed a novel approach to identify carcinogenesis relevant regulatory relationships, which is characterized by quantifying the difference of regulatory relationships between stages. Firstly, we applied the strategy of differential coexpression analysis (DCEA) to transcriptomic datasets including paired mRNA and miRNA of gastric samples to identify a set of genes/miRNAs related to gastric cancer progression. Based on these genes/miRNAs, we constructed conditional combinatorial gene regulatory networks (cGRNs) involving both transcription factors (TFs) and miRNAs. Enrichment of known cancer genes/miRNAs and predicted prognostic genes/miRNAs was observed in each cGRN. Then we designed a quantitative method to measure differential regulation level of every regulatory relationship between normal and cancer, and the known cancer genes/miRNAs proved to be ranked significantly higher. Meanwhile, we defined differentially regulated link (DRL) by combining differential regulation, differential expression and the regulation contribution of the regulator to the target. By integrating survival analysis and DRL identification, three master regulators TCF7L1, TCF4, and MEIS1 were identified and testable hypotheses of dysfunctional mechanisms underlying gastric carcinogenesis related to them were generated. The fine-tuning effects of miRNAs were also observed. We propose that this differential regulation network analysis framework is feasible to gain insights into dysregulated mechanisms underlying tumorigenesis and other phenotypic changes.
Collapse
Affiliation(s)
- Quanxue Li
- School of biotechnology, East China University of Science and Technology, Shanghai, China; Shanghai Center for Bioinformation Technology, Shanghai, China
| | - Junyi Li
- Shanghai Center for Bioinformation Technology, Shanghai, China; Key Lab of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Wentao Dai
- Shanghai Center for Bioinformation Technology, Shanghai, China; Shanghai Industrial Technology Institute, Shanghai, China; Shanghai Engineering Research Center of Pharmaceutical Translation, Shanghai, China
| | - Yi-Xue Li
- School of biotechnology, East China University of Science and Technology, Shanghai, China; Shanghai Center for Bioinformation Technology, Shanghai, China; Key Lab of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China; Shanghai Industrial Technology Institute, Shanghai, China; Shanghai Engineering Research Center of Pharmaceutical Translation, Shanghai, China.
| | - Yuan-Yuan Li
- Shanghai Center for Bioinformation Technology, Shanghai, China; Shanghai Industrial Technology Institute, Shanghai, China; Shanghai Engineering Research Center of Pharmaceutical Translation, Shanghai, China.
| |
Collapse
|
28
|
Gambardella G, Carissimo A, Chen A, Cutillo L, Nowakowski TJ, di Bernardo D, Blelloch R. The impact of microRNAs on transcriptional heterogeneity and gene co-expression across single embryonic stem cells. Nat Commun 2017; 8:14126. [PMID: 28102192 PMCID: PMC5253645 DOI: 10.1038/ncomms14126] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 12/01/2016] [Indexed: 12/21/2022] Open
Abstract
MicroRNAs act posttranscriptionally to suppress multiple target genes within a cell population. To what extent this multi-target suppression occurs in individual cells and how it impacts transcriptional heterogeneity and gene co-expression remains unknown. Here we used single-cell sequencing combined with introduction of individual microRNAs. miR-294 and let-7c were introduced into otherwise microRNA-deficient Dgcr8 knockout mouse embryonic stem cells. Both microRNAs induce suppression and correlated expression of their respective gene targets. The two microRNAs had opposing effects on transcriptional heterogeneity within the cell population, with let-7c increasing and miR-294 decreasing the heterogeneity between cells. Furthermore, let-7c promotes, whereas miR-294 suppresses, the phasing of cell cycle genes. These results show at the individual cell level how a microRNA simultaneously has impacts on its many targets and how that in turn can influence a population of cells. The findings have important implications in the understanding of how microRNAs influence the co-expression of genes and pathways, and thus ultimately cell fate. MicroRNAs can posttranscriptionally repress multiple targets in a cell population. Here the authors use single-cell sequencing to investigate the effects of an individual miRNA on transcriptional heterogeneity and gene co-expression
Collapse
Affiliation(s)
| | | | - Amy Chen
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, Center for Reproductive Sciences, University of California, San Francisco, San Francisco, California 94143, USA.,Department of Urology, University of California, San Francisco, San Francisco, California 94143, USA
| | - Luisa Cutillo
- Telethon Institute of Genetics and Medicine, Pozzuoli, 80078 Naples, Italy
| | - Tomasz J Nowakowski
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, Center for Reproductive Sciences, University of California, San Francisco, San Francisco, California 94143, USA
| | - Diego di Bernardo
- Telethon Institute of Genetics and Medicine, Pozzuoli, 80078 Naples, Italy.,Department of Chemical, Materials and Industrial Engineering, University of Naples 'Federico II', 80125 Naples, Italy
| | - Robert Blelloch
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, Center for Reproductive Sciences, University of California, San Francisco, San Francisco, California 94143, USA.,Department of Urology, University of California, San Francisco, San Francisco, California 94143, USA
| |
Collapse
|
29
|
Kaushik A, Ali S, Gupta D. Altered Pathway Analyzer: A gene expression dataset analysis tool for identification and prioritization of differentially regulated and network rewired pathways. Sci Rep 2017; 7:40450. [PMID: 28084397 PMCID: PMC5233954 DOI: 10.1038/srep40450] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2016] [Accepted: 12/07/2016] [Indexed: 12/13/2022] Open
Abstract
Gene connection rewiring is an essential feature of gene network dynamics. Apart from its normal functional role, it may also lead to dysregulated functional states by disturbing pathway homeostasis. Very few computational tools measure rewiring within gene co-expression and its corresponding regulatory networks in order to identify and prioritize altered pathways which may or may not be differentially regulated. We have developed Altered Pathway Analyzer (APA), a microarray dataset analysis tool for identification and prioritization of altered pathways, including those which are differentially regulated by TFs, by quantifying rewired sub-network topology. Moreover, APA also helps in re-prioritization of APA shortlisted altered pathways enriched with context-specific genes. We performed APA analysis of simulated datasets and p53 status NCI-60 cell line microarray data to demonstrate potential of APA for identification of several case-specific altered pathways. APA analysis reveals several altered pathways not detected by other tools evaluated by us. APA analysis of unrelated prostate cancer datasets identifies sample-specific as well as conserved altered biological processes, mainly associated with lipid metabolism, cellular differentiation and proliferation. APA is designed as a cross platform tool which may be transparently customized to perform pathway analysis in different gene expression datasets. APA is freely available at http://bioinfo.icgeb.res.in/APA.
Collapse
Affiliation(s)
- Abhinav Kaushik
- Translational Bioinformatics Group, International Centre for Genetic Engineering and Biotechnology, New Delhi 110067, India
| | - Shakir Ali
- Department of Biochemistry, Jamia Hamdard, Deemed University, New Delhi 110062, India
| | - Dinesh Gupta
- Translational Bioinformatics Group, International Centre for Genetic Engineering and Biotechnology, New Delhi 110067, India
| |
Collapse
|
30
|
Lee J, Jo K, Lee S, Kang J, Kim S. Prioritizing biological pathways by recognizing context in time-series gene expression data. BMC Bioinformatics 2016; 17:477. [PMID: 28155707 PMCID: PMC5259824 DOI: 10.1186/s12859-016-1335-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background The primary goal of pathway analysis using transcriptome data is to find significantly perturbed pathways. However, pathway analysis is not always successful in identifying pathways that are truly relevant to the context under study. A major reason for this difficulty is that a single gene is involved in multiple pathways. In the KEGG pathway database, there are 146 genes, each of which is involved in more than 20 pathways. Thus activation of even a single gene will result in activation of many pathways. This complex relationship often makes the pathway analysis very difficult. While we need much more powerful pathway analysis methods, a readily available alternative way is to incorporate the literature information. Results In this study, we propose a novel approach for prioritizing pathways by combining results from both pathway analysis tools and literature information. The basic idea is as follows. Whenever there are enough articles that provide evidence on which pathways are relevant to the context, we can be assured that the pathways are indeed related to the context, which is termed as relevance in this paper. However, if there are few or no articles reported, then we should rely on the results from the pathway analysis tools, which is termed as significance in this paper. We realized this concept as an algorithm by introducing Context Score and Impact Score and then combining the two into a single score. Our method ranked truly relevant pathways significantly higher than existing pathway analysis tools in experiments with two data sets. Conclusions Our novel framework was implemented as ContextTRAP by utilizing two existing tools, TRAP and BEST. ContextTRAP will be a useful tool for the pathway based analysis of gene expression data since the user can specify the context of the biological experiment in a set of keywords. The web version of ContextTRAP is available at http://biohealth.snu.ac.kr/software/contextTRAP. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1335-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jusang Lee
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
| | - Kyuri Jo
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
| | - Sunwon Lee
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Jaewoo Kang
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea. .,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea. .,Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
31
|
Abstract
Developing improved approaches for diagnosis, treatment, and prevention of diseases is a major goal of biomedical research. Therefore, the discovery of biomarker signatures from high-throughput "omics" data is an active research topic in the field of bioinformatics and systems medicine. A major issue is the low reproducibility and the limited biological interpretability of candidate biomarker signatures identified from high-throughput data. This impedes the use of discovered biomarker signatures into clinical applications. Currently, much focus is placed on developing strategies to improve reproducibility and interpretability. Researchers have fruitfully started to incorporate prior knowledge derived from pathways and molecular networks into the process of biomarker identification. In this chapter, after giving a general introduction to the problem of disease classification and biomarker discovery, we will review two types of network-assisted approaches: (1) approaches inferring activity scores for specific pathways which are subsequently used for classification and (2) approaches identifying subnetworks or modules of molecular networks by differential network analysis which can serve as biomarker signatures.
Collapse
|
32
|
Pinelli M, Carissimo A, Cutillo L, Lai CH, Mutarelli M, Moretti MN, Singh MV, Karali M, Carrella D, Pizzo M, Russo F, Ferrari S, Ponzin D, Angelini C, Banfi S, di Bernardo D. An atlas of gene expression and gene co-regulation in the human retina. Nucleic Acids Res 2016; 44:5773-84. [PMID: 27235414 PMCID: PMC4937338 DOI: 10.1093/nar/gkw486] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2016] [Revised: 05/19/2016] [Accepted: 05/20/2016] [Indexed: 12/11/2022] Open
Abstract
The human retina is a specialized tissue involved in light stimulus transduction. Despite its unique biology, an accurate reference transcriptome is still missing. Here, we performed gene expression analysis (RNA-seq) of 50 retinal samples from non-visually impaired post-mortem donors. We identified novel transcripts with high confidence (Observed Transcriptome (ObsT)) and quantified the expression level of known transcripts (Reference Transcriptome (RefT)). The ObsT included 77 623 transcripts (23 960 genes) covering 137 Mb (35 Mb new transcribed genome). Most of the transcripts (92%) were multi-exonic: 81% with known isoforms, 16% with new isoforms and 3% belonging to new genes. The RefT included 13 792 genes across 94 521 known transcripts. Mitochondrial genes were among the most highly expressed, accounting for about 10% of the reads. Of all the protein-coding genes in Gencode, 65% are expressed in the retina. We exploited inter-individual variability in gene expression to infer a gene co-expression network and to identify genes specifically expressed in photoreceptor cells. We experimentally validated the photoreceptors localization of three genes in human retina that had not been previously reported. RNA-seq data and the gene co-expression network are available online (http://retina.tigem.it).
Collapse
Affiliation(s)
- Michele Pinelli
- Telethon Institute of Genetics and Medicine (TIGEM), Via Campi Flegrei 34, 80078 Pozzuoli, Italy
| | - Annamaria Carissimo
- Telethon Institute of Genetics and Medicine (TIGEM), Via Campi Flegrei 34, 80078 Pozzuoli, Italy
| | - Luisa Cutillo
- Telethon Institute of Genetics and Medicine (TIGEM), Via Campi Flegrei 34, 80078 Pozzuoli, Italy Dipartimento Studi Aziendali e Quantitativi (DISAQ), Università degli studi di Napoli 'Parthenope', Via Generale Parisi, 80132 Napoli, Italy
| | - Ching-Hung Lai
- Telethon Institute of Genetics and Medicine (TIGEM), Via Campi Flegrei 34, 80078 Pozzuoli, Italy
| | - Margherita Mutarelli
- Telethon Institute of Genetics and Medicine (TIGEM), Via Campi Flegrei 34, 80078 Pozzuoli, Italy
| | - Maria Nicoletta Moretti
- Telethon Institute of Genetics and Medicine (TIGEM), Via Campi Flegrei 34, 80078 Pozzuoli, Italy
| | - Marwah Veer Singh
- Telethon Institute of Genetics and Medicine (TIGEM), Via Campi Flegrei 34, 80078 Pozzuoli, Italy
| | - Marianthi Karali
- Telethon Institute of Genetics and Medicine (TIGEM), Via Campi Flegrei 34, 80078 Pozzuoli, Italy
| | - Diego Carrella
- Telethon Institute of Genetics and Medicine (TIGEM), Via Campi Flegrei 34, 80078 Pozzuoli, Italy
| | - Mariateresa Pizzo
- Telethon Institute of Genetics and Medicine (TIGEM), Via Campi Flegrei 34, 80078 Pozzuoli, Italy
| | - Francesco Russo
- Istituto per le Applicazioni del Calcolo, Consiglio Nazionale delle Ricerca, Via Pietro Castellino 111, 80131 Napoli, Italy
| | - Stefano Ferrari
- Fondazione Banca degli Occhi del Veneto, Via Paccagnella 11, 30174 Zelarino (Venice), Italy
| | - Diego Ponzin
- Fondazione Banca degli Occhi del Veneto, Via Paccagnella 11, 30174 Zelarino (Venice), Italy
| | - Claudia Angelini
- Istituto per le Applicazioni del Calcolo, Consiglio Nazionale delle Ricerca, Via Pietro Castellino 111, 80131 Napoli, Italy
| | - Sandro Banfi
- Telethon Institute of Genetics and Medicine (TIGEM), Via Campi Flegrei 34, 80078 Pozzuoli, Italy Medical Genetics, Department of Biochemistry, Biophysics and General Pathology, Second University of Naples, via Luigi De Crecchio 7, 80138 Naples (NA), Italy
| | - Diego di Bernardo
- Telethon Institute of Genetics and Medicine (TIGEM), Via Campi Flegrei 34, 80078 Pozzuoli, Italy Dept. Of Chemical, Materials and Industrial Production Engineering, University of Naples 'Federico II', Piazzale Tecchio 80, 80125 Naples, Italy
| |
Collapse
|
33
|
Börnigen D, Tyekucheva S, Wang X, Rider JR, Lee GS, Mucci LA, Sweeney C, Huttenhower C. Computational Reconstruction of NFκB Pathway Interaction Mechanisms during Prostate Cancer. PLoS Comput Biol 2016; 12:e1004820. [PMID: 27078000 PMCID: PMC4831844 DOI: 10.1371/journal.pcbi.1004820] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Accepted: 02/19/2016] [Indexed: 12/21/2022] Open
Abstract
Molecular research in cancer is one of the largest areas of bioinformatic investigation, but it remains a challenge to understand biomolecular mechanisms in cancer-related pathways from high-throughput genomic data. This includes the Nuclear-factor-kappa-B (NFκB) pathway, which is central to the inflammatory response and cell proliferation in prostate cancer development and progression. Despite close scrutiny and a deep understanding of many of its members’ biomolecular activities, the current list of pathway members and a systems-level understanding of their interactions remains incomplete. Here, we provide the first steps toward computational reconstruction of interaction mechanisms of the NFκB pathway in prostate cancer. We identified novel roles for ATF3, CXCL2, DUSP5, JUNB, NEDD9, SELE, TRIB1, and ZFP36 in this pathway, in addition to new mechanistic interactions between these genes and 10 known NFκB pathway members. A newly predicted interaction between NEDD9 and ZFP36 in particular was validated by co-immunoprecipitation, as was NEDD9's potential biological role in prostate cancer cell growth regulation. We combined 651 gene expression datasets with 1.4M gene product interactions to predict the inclusion of 40 additional genes in the pathway. Molecular mechanisms of interaction among pathway members were inferred using recent advances in Bayesian data integration to simultaneously provide information specific to biological contexts and individual biomolecular activities, resulting in a total of 112 interactions in the fully reconstructed NFκB pathway: 13 (11%) previously known, 29 (26%) supported by existing literature, and 70 (63%) novel. This method is generalizable to other tissue types, cancers, and organisms, and this new information about the NFκB pathway will allow us to further understand prostate cancer and to develop more effective prevention and treatment strategies. In molecular research in cancer it remains challenging to uncover biomolecular mechanisms in cancer-related pathways from high-throughput genomic data, including the Nuclear-factor-kappa-B (NFκB) pathway. Despite close scrutiny and a deep understanding of many of the NFκB pathway members’ biomolecular activities, the current list of pathway members and a systems-level understanding of their interactions remains incomplete. In this study, we provide the first steps toward computational reconstruction of interaction mechanisms of the NFκB pathway in prostate cancer. We identified novel roles for 8 genes in this pathway and new mechanistic interactions between these genes and 10 known pathway members. We combined 651 gene expression datasets with 1.4M interactions to predict the inclusion of 40 additional genes in the pathway. Molecular mechanisms of interaction were inferred using recent advances in Bayesian data integration to simultaneously provide information specific to biological contexts and individual biomolecular activities, resulting in 112 interactions in the fully reconstructed NFκB pathway. This method is generalizable, and this new information about the NFκB pathway will allow us to further understand prostate cancer.
Collapse
Affiliation(s)
- Daniela Börnigen
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America.,The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Svitlana Tyekucheva
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Xiaodong Wang
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Jennifer R Rider
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - Gwo-Shu Lee
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Lorelei A Mucci
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - Christopher Sweeney
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America.,The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| |
Collapse
|
34
|
Differential network analysis reveals the genome-wide landscape of estrogen receptor modulation in hormonal cancers. Sci Rep 2016; 6:23035. [PMID: 26972162 PMCID: PMC4789788 DOI: 10.1038/srep23035] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 02/23/2016] [Indexed: 12/14/2022] Open
Abstract
Several mutual information (MI)-based algorithms have been developed to identify dynamic gene-gene and function-function interactions governed by key modulators (genes, proteins, etc.). Due to intensive computation, however, these methods rely heavily on prior knowledge and are limited in genome-wide analysis. We present the modulated gene/gene set interaction (MAGIC) analysis to systematically identify genome-wide modulation of interaction networks. Based on a novel statistical test employing conjugate Fisher transformations of correlation coefficients, MAGIC features fast computation and adaption to variations of clinical cohorts. In simulated datasets MAGIC achieved greatly improved computation efficiency and overall superior performance than the MI-based method. We applied MAGIC to construct the estrogen receptor (ER) modulated gene and gene set (representing biological function) interaction networks in breast cancer. Several novel interaction hubs and functional interactions were discovered. ER+ dependent interaction between TGFβ and NFκB was further shown to be associated with patient survival. The findings were verified in independent datasets. Using MAGIC, we also assessed the essential roles of ER modulation in another hormonal cancer, ovarian cancer. Overall, MAGIC is a systematic framework for comprehensively identifying and constructing the modulated interaction networks in a whole-genome landscape. MATLAB implementation of MAGIC is available for academic uses at https://github.com/chiuyc/MAGIC.
Collapse
|
35
|
Wu MY, Zhang XF, Dai DQ, Ou-Yang L, Zhu Y, Yan H. Regularized logistic regression with network-based pairwise interaction for biomarker identification in breast cancer. BMC Bioinformatics 2016; 17:108. [PMID: 26921029 PMCID: PMC4769543 DOI: 10.1186/s12859-016-0951-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2015] [Accepted: 01/28/2016] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND To facilitate advances in personalized medicine, it is important to detect predictive, stable and interpretable biomarkers related with different clinical characteristics. These clinical characteristics may be heterogeneous with respect to underlying interactions between genes. Usually, traditional methods just focus on detection of differentially expressed genes without taking the interactions between genes into account. Moreover, due to the typical low reproducibility of the selected biomarkers, it is difficult to give a clear biological interpretation for a specific disease. Therefore, it is necessary to design a robust biomarker identification method that can predict disease-associated interactions with high reproducibility. RESULTS In this article, we propose a regularized logistic regression model. Different from previous methods which focus on individual genes or modules, our model takes gene pairs, which are connected in a protein-protein interaction network, into account. A line graph is constructed to represent the adjacencies between pairwise interactions. Based on this line graph, we incorporate the degree information in the model via an adaptive elastic net, which makes our model less dependent on the expression data. Experimental results on six publicly available breast cancer datasets show that our method can not only achieve competitive performance in classification, but also retain great stability in variable selection. Therefore, our model is able to identify the diagnostic and prognostic biomarkers in a more robust way. Moreover, most of the biomarkers discovered by our model have been verified in biochemical or biomedical researches. CONCLUSIONS The proposed method shows promise in the diagnosis of disease pathogenesis with different clinical characteristics. These advances lead to more accurate and stable biomarker discovery, which can monitor the functional changes that are perturbed by diseases. Based on these predictions, researchers may be able to provide suggestions for new therapeutic approaches.
Collapse
Affiliation(s)
- Meng-Yun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Guoding Road, Shanghai, 200433, China. .,Key Laboratory of Mathematical Economics SUFE, Ministry of Education, Guoding Road, Shanghai, 200433, China.
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Luoyu Road, Wuhan, 430079, China.
| | - Dao-Qing Dai
- Intelligent Data Center and Department of Mathematics, Sun Yat-Sen University, Xingang West Road, Guangzhou, 510275, China.
| | - Le Ou-Yang
- College of Information Engineering, Shenzhen University, Nanhai Avenue, Shenzhen, 518060, China.
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Lumo Road, Wuhan, 430074, China.
| | - Hong Yan
- Department of Electronic and Engineering, City University of Hong Kong, Tat Chee Avenue, Hong Kong, 999077, China.
| |
Collapse
|
36
|
He Y, Shao F, Pi W, Shi C, Chen Y, Gong D, Wang B, Cao Z, Tang K. Largescale Transcriptomics Analysis Suggests Over-Expression of BGH3, MMP9 and PDIA3 in Oral Squamous Cell Carcinoma. PLoS One 2016; 11:e0146530. [PMID: 26745629 PMCID: PMC4706424 DOI: 10.1371/journal.pone.0146530] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 12/18/2015] [Indexed: 12/12/2022] Open
Abstract
Oral squamous cell carcinoma (OSCC) has been reported as the most prevalent cancer of the head and neck region, while early diagnosis remains challenging. Here we took a comprehensive bioinformatics study on microarray data of 326 OSCC clinical samples with control of 165 normal tissues. The cell interaction pathways of ECM-receptor interaction and focal adhesion were found to be significantly regulated in OSCC samples. Further analysis of the topological properties and expression consistency identified that three hub genes in the gene interaction network, MMP9, PDIA3 and BGH3, were consistently up-expressed in OSCC samples. When being validated on additional microarray datasets of 41 OSCC samples, the validation rate of over-expressed BGH3, MMP9, and PDIA3 reached 90%, 90% and 84% respectively. At last, immuno-histochemical assays were done to test the protein expression of the three genes on newly collected clinical samples of 35 OSCC, 20 samples of pre-OSCC stage, and 12 normal oral mucosa specimens. Their protein expression levels were also found to progressively increase from normal mucosa to pre-OSCC stage and further to OSCC (ANOVA p = 0.000), suggesting their key roles in OSCC pathogenesis. Based on above solid validation, we propose BGH3, MMP9 and PDIA3 might be further explored as potential biomarkers to aid OSCC diagnosis.
Collapse
Affiliation(s)
- Yuan He
- Department of Oral Medicine, School of Stomatology, Tongji University, Shanghai, 200092, China
| | - Fangyang Shao
- Department of Oral Medicine, School of Stomatology, Tongji University, Shanghai, 200092, China
| | - Weidong Pi
- School of Life Science and Technology, Tongji University, Shanghai, 200092, China
| | - Cong Shi
- Department of Oral Medicine, School of Stomatology, Tongji University, Shanghai, 200092, China
| | - Yujia Chen
- School of Life Science and Technology, Tongji University, Shanghai, 200092, China
| | - Diping Gong
- Department of Oral Medicine, School of Stomatology, Tongji University, Shanghai, 200092, China
| | - Bingjie Wang
- Department of Oral Medicine, School of Stomatology, Tongji University, Shanghai, 200092, China
| | - Zhiwei Cao
- School of Life Science and Technology, Tongji University, Shanghai, 200092, China
| | - Kailin Tang
- Advanced Institute of Translational Medicine, Tongji University, Shanghai, 200092, China
- * E-mail:
| |
Collapse
|
37
|
Gambardella G, Peluso I, Montefusco S, Bansal M, Medina DL, Lawrence N, di Bernardo D. A reverse-engineering approach to dissect post-translational modulators of transcription factor's activity from transcriptional data. BMC Bioinformatics 2015; 16:279. [PMID: 26334955 PMCID: PMC4559297 DOI: 10.1186/s12859-015-0700-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2015] [Accepted: 08/11/2015] [Indexed: 11/13/2022] Open
Abstract
Background Transcription factors (TFs) act downstream of the major signalling pathways functioning as master regulators of cell fate. Their activity is tightly regulated at the transcriptional, post-transcriptional and post-translational level. Proteins modifying TF activity are not easily identified by experimental high-throughput methods. Results We developed a computational strategy, called Differential Multi-Information (DMI), to infer post-translational modulators of a transcription factor from a compendium of gene expression profiles (GEPs). DMI is built on the hypothesis that the modulator of a TF (i.e. kinase/phosphatases), when expressed in the cell, will cause the TF target genes to be co-expressed. On the contrary, when the modulator is not expressed, the TF will be inactive resulting in a loss of co-regulation across its target genes. DMI detects the occurrence of changes in target gene co-regulation for each candidate modulator, using a measure called Multi-Information. We validated the DMI approach on a compendium of 5,372 GEPs showing its predictive ability in correctly identifying kinases regulating the activity of 14 different transcription factors. Conclusions DMI can be used in combination with experimental approaches as high-throughput screening to efficiently improve both pathway and target discovery. An on-line web-tool enabling the user to use DMI to identify post-transcriptional modulators of a transcription factor of interest che be found at http://dmi.tigem.it. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0700-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gennaro Gambardella
- The Telethon Institute of Genetics and Medicine, Naples, Italy. .,Present Address: Department of Cancer Studies, King's College London, NHH, London, UK.
| | - Ivana Peluso
- The Telethon Institute of Genetics and Medicine, Naples, Italy.
| | | | - Mukesh Bansal
- Columbia Initiative in Systems Biology and Center for Computational Biology and Bioinformatics, Columbia University, New York, NY, USA.
| | - Diego L Medina
- The Telethon Institute of Genetics and Medicine, Naples, Italy.
| | - Neil Lawrence
- Department of Computer Science, University of Sheffield, Sheffield, UK.
| | | |
Collapse
|
38
|
Cao MS, Liu BY, Dai WT, Zhou WX, Li YX, Li YY. Differential network analysis reveals dysfunctional regulatory networks in gastric carcinogenesis. Am J Cancer Res 2015; 5:2605-2625. [PMID: 26609471 PMCID: PMC4633893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 08/04/2015] [Indexed: 06/05/2023] Open
Abstract
Gastric Carcinoma is one of the most common cancers in the world. A large number of differentially expressed genes have been identified as being associated with gastric cancer progression, however, little is known about the underlying regulatory mechanisms. To address this problem, we developed a differential networking approach that is characterized by including a nascent methodology, differential coexpression analysis (DCEA), and two novel quantitative methods for differential regulation analysis. We first applied DCEA to a gene expression dataset of gastric normal mucosa, adenoma and carcinoma samples to identify gene interconnection changes during cancer progression, based on which we inferred normal, adenoma, and carcinoma-specific gene regulation networks by using linear regression model. It was observed that cancer genes and drug targets were enriched in each network. To investigate the dynamic changes of gene regulation during carcinogenesis, we then designed two quantitative methods to prioritize differentially regulated genes (DRGs) and gene pairs or links (DRLs) between adjacent stages. It was found that known cancer genes and drug targets are significantly higher ranked. The top 4% normal vs. adenoma DRGs (36 genes) and top 6% adenoma vs. carcinoma DRGs (56 genes) proved to be worthy of further investigation to explore their association with gastric cancer. Out of the 16 DRGs involved in two top-10 DRG lists of normal vs. adenoma and adenoma vs. carcinoma comparisons, 15 have been reported to be gastric cancer or cancer related. Based on our inferred differential networking information and known signaling pathways, we generated testable hypotheses on the roles of GATA6, ESRRG and their signaling pathways in gastric carcinogenesis. Compared with established approaches which build genome-scale GRNs, or sub-networks around differentially expressed genes, the present one proved to be better at enriching cancer genes and drug targets, and prioritizing disease-related genes on the dataset we considered. We propose this extendable differential networking framework as a promising way to gain insights into gene regulatory mechanisms underlying cancer progression and other phenotypic changes.
Collapse
Affiliation(s)
- Mu-Shui Cao
- School of Life Science and Technology, Tongji UniversityShanghai 200092, P. R. China
- Shanghai Center for Bioinformation TechnologyShanghai 200235, P. R. China
- Shanghai Industrial Technology Institute1278 Keyuan Road, Shanghai 201203, P. R. China
| | - Bing-Ya Liu
- Shanghai Key Laboratory of Gastric Neoplasms, Shanghai Institute of Digestive Surgery, Ruijin Hospital, Shanghai Jiao Tong University School of MedicineShanghai 200025, P. R. China
| | - Wen-Tao Dai
- Shanghai Center for Bioinformation TechnologyShanghai 200235, P. R. China
- Shanghai Industrial Technology Institute1278 Keyuan Road, Shanghai 201203, P. R. China
| | - Wei-Xin Zhou
- Shanghai Center for Bioinformation TechnologyShanghai 200235, P. R. China
- Shanghai Industrial Technology Institute1278 Keyuan Road, Shanghai 201203, P. R. China
- Shanghai Engineering Research Center of Pharmaceutical Translation1278 Keyuan Road, Shanghai 201203, P. R. China
| | - Yi-Xue Li
- School of Life Science and Technology, Tongji UniversityShanghai 200092, P. R. China
- Shanghai Center for Bioinformation TechnologyShanghai 200235, P. R. China
- Shanghai Industrial Technology Institute1278 Keyuan Road, Shanghai 201203, P. R. China
- Shanghai Engineering Research Center of Pharmaceutical Translation1278 Keyuan Road, Shanghai 201203, P. R. China
| | - Yuan-Yuan Li
- Shanghai Center for Bioinformation TechnologyShanghai 200235, P. R. China
- Shanghai Industrial Technology Institute1278 Keyuan Road, Shanghai 201203, P. R. China
- Shanghai Engineering Research Center of Pharmaceutical Translation1278 Keyuan Road, Shanghai 201203, P. R. China
| |
Collapse
|
39
|
Qian J, Zou Y, Wang J, Zhang B, Massion PP. Global gene expression profiling reveals a suppressed immune response pathway associated with 3q amplification in squamous carcinoma of the lung. GENOMICS DATA 2015; 5:272-4. [PMID: 26484266 PMCID: PMC4583673 DOI: 10.1016/j.gdata.2015.06.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 06/01/2015] [Indexed: 11/20/2022]
Abstract
Chromosome 3q26–28 is a critical region of genomic amplification in non-small cell lung cancer (NSCLC), particularly lung squamous cell carcinomas (SCCs). No molecular therapeutic target has shown clinical utility for SCC, in contrast with adenocarcinomas of the lung. To identify novel candidate drivers in this region, we performed both Array Comparative Genomic Hybridization (array CGH, Agilent Human Genome CGH 244A oligo-microarrays) and Gene Expression Microarray (Agilent Human Gene Expression 4 × 44 K microarray) on 24 untreated lung SCC specimens. Using our previously published integrative genomics approach, we identified 12 top amplified driver genes within this region that are highly correlated and overexpressed in lung SCC. We further demonstrated one of the 12 top amplified driver Fragile X mental retardation-related protein 1 (FXR1) as a novel cancer gene in NSCLC and FXR1 executes its regulatory function by forming a novel complex with two other oncogenes, protein kinase C, iota ( PRKCI) and epithelial cell transforming 2 (ECT2) within the same amplicon in lung cancer cell. Here we report that immune response pathways are significantly suppressed in lung SCC and negatively associated with 3q driver gene expression, implying a potential role of 3q drivers in cancer immune-surveillance. In light of the attractive immunotherapy strategy using blockade of negative regulators of T cell function for multiple human cancer including lung SCC, our findings may provide a rationale for targeting 3q drivers in combination of immunotherapies for human tumors harboring the 3q amplicon. The data have been deposited in NCBI's Gene Expression Omnibus and are accessible through GEO Series accession number GSE40089.
Collapse
Affiliation(s)
- Jun Qian
- Thoracic Program at the Vanderbilt Ingram Cancer Center, Division of Pulmonary and Critical Care Medicine, Department of Medicine, Nashville, TN, USA
| | - Yong Zou
- Thoracic Program at the Vanderbilt Ingram Cancer Center, Division of Pulmonary and Critical Care Medicine, Department of Medicine, Nashville, TN, USA
| | - Jing Wang
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Bing Zhang
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Pierre P. Massion
- Thoracic Program at the Vanderbilt Ingram Cancer Center, Division of Pulmonary and Critical Care Medicine, Department of Medicine, Nashville, TN, USA
- Veterans Affairs Medical Center, Nashville, TN, USA
- Corresponding author at: Thoracic Program, Vanderbilt-Ingram Cancer Center, 2220 Pierce Avenue, Preston Research Building 640, Nashville, TN 37232-6838, USA. Tel.: + 1 615 936 2256; fax: + 1 615 936 1790.
| |
Collapse
|
40
|
Zhang Y, Liu ZL, Song M. ChiNet uncovers rewired transcription subnetworks in tolerant yeast for advanced biofuels conversion. Nucleic Acids Res 2015; 43:4393-407. [PMID: 25897127 PMCID: PMC4482087 DOI: 10.1093/nar/gkv358] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Accepted: 04/06/2015] [Indexed: 12/14/2022] Open
Abstract
Analysis of rewired upstream subnetworks impacting downstream differential gene expression aids the delineation of evolving molecular mechanisms. Cumulative statistics based on conventional differential correlation are limited for subnetwork rewiring analysis since rewiring is not necessarily equivalent to change in correlation coefficients. Here we present a computational method ChiNet to quantify subnetwork rewiring by statistical heterogeneity that enables detection of potential genotype changes causing altered transcription regulation in evolving organisms. Given a differentially expressed downstream gene set, ChiNet backtracks a rewired upstream subnetwork from a super-network including gene interactions known to occur under various molecular contexts. We benchmarked ChiNet for its high accuracy in distinguishing rewired artificial subnetworks, in silico yeast transcription-metabolic subnetworks, and rewired transcription subnetworks for Candida albicans versus Saccharomyces cerevisiae, against two differential-correlation based subnetwork rewiring approaches. Then, using transcriptome data from tolerant S. cerevisiae strain NRRL Y-50049 and a wild-type intolerant strain, ChiNet identified 44 metabolic pathways affected by rewired transcription subnetworks anchored to major adaptively activated transcription factor genes YAP1, RPN4, SFP1 and ROX1, in response to toxic chemical challenges involved in lignocellulose-to-biofuels conversion. These findings support the use of ChiNet in rewiring analysis of subnetworks where differential interaction patterns resulting from divergent nonlinear dynamics abound.
Collapse
Affiliation(s)
- Yang Zhang
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA
| | - Z Lewis Liu
- National Center for Agricultural Utilization Research, Agricultural Research Service, U.S. Department of Agriculture, Peoria, IL 61604, USA
| | - Mingzhou Song
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA
| |
Collapse
|
41
|
Fibroblast growth factor signalling controls nervous system patterning and pigment cell formation in Ciona intestinalis. Nat Commun 2014; 5:4830. [PMID: 25189217 DOI: 10.1038/ncomms5830] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 07/25/2014] [Indexed: 11/09/2022] Open
Abstract
During the development of the central nervous system (CNS), combinations of transcription factors and signalling molecules orchestrate patterning, specification and differentiation of neural cell types. In vertebrates, three types of melanin-containing pigment cells, exert a variety of functional roles including visual perception. Here we analysed the mechanisms underlying pigment cell specification within the CNS of a simple chordate, the ascidian Ciona intestinalis. Ciona tadpole larvae exhibit a basic chordate body plan characterized by a small number of neural cells. We employed lineage-specific transcription profiling to characterize the expression of genes downstream of fibroblast growth factor signalling, which govern pigment cell formation. We demonstrate that FGF signalling sequentially imposes a pigment cell identity at the expense of anterior neural fates. We identify FGF-dependent and pigment cell-specific factors, including the small GTPase, Rab32/38 and demonstrated its requirement for the pigmentation of larval sensory organs.
Collapse
|
42
|
Yalamanchili HK, Li Z, Wang P, Wong MP, Yao J, Wang J. SpliceNet: recovering splicing isoform-specific differential gene networks from RNA-Seq data of normal and diseased samples. Nucleic Acids Res 2014; 42:e121. [PMID: 25034693 PMCID: PMC4150760 DOI: 10.1093/nar/gku577] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Conventionally, overall gene expressions from microarrays are used to infer gene networks, but it is challenging to account splicing isoforms. High-throughput RNA Sequencing has made splice variant profiling practical. However, its true merit in quantifying splicing isoforms and isoform-specific exon expressions is not well explored in inferring gene networks. This study demonstrates SpliceNet, a method to infer isoform-specific co-expression networks from exon-level RNA-Seq data, using large dimensional trace. It goes beyond differentially expressed genes and infers splicing isoform network changes between normal and diseased samples. It eases the sample size bottleneck; evaluations on simulated data and lung cancer-specific ERBB2 and MAPK signaling pathways, with varying number of samples, evince the merit in handling high exon to sample size ratio datasets. Inferred network rewiring of well established Bcl-x and EGFR centered networks from lung adenocarcinoma expression data is in good agreement with literature. Gene level evaluations demonstrate a substantial performance of SpliceNet over canonical correlation analysis, a method that is currently applied to exon level RNA-Seq data. SpliceNet can also be applied to exon array data. SpliceNet is distributed as an R package available at http://www.jjwanglab.org/SpliceNet.
Collapse
Affiliation(s)
- Hari Krishna Yalamanchili
- Department of Biochemistry, The University of Hong Kong, Hong Kong (SAR), China Department of Pathology, The University of Hong Kong, Hong Kong (SAR), China
| | - Zhaoyuan Li
- Centre for Genomic Sciences, L.K.S. Faculty of Medicine, The University of Hong Kong, Hong Kong (SAR), China
| | - Panwen Wang
- Department of Biochemistry, The University of Hong Kong, Hong Kong (SAR), China Department of Pathology, The University of Hong Kong, Hong Kong (SAR), China
| | - Maria P Wong
- Department of Pathology, The University of Hong Kong, Hong Kong (SAR), China Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, China
| | - Jianfeng Yao
- Centre for Genomic Sciences, L.K.S. Faculty of Medicine, The University of Hong Kong, Hong Kong (SAR), China
| | - Junwen Wang
- Department of Biochemistry, The University of Hong Kong, Hong Kong (SAR), China Department of Pathology, The University of Hong Kong, Hong Kong (SAR), China Department of Statistics & Actuarial Science, Faculty of Science, The University of Hong Kong, Hong Kong (SAR), China
| |
Collapse
|
43
|
Network-based inference framework for identifying cancer genes from gene expression data. BIOMED RESEARCH INTERNATIONAL 2013; 2013:401649. [PMID: 24073403 PMCID: PMC3774028 DOI: 10.1155/2013/401649] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Revised: 07/15/2013] [Accepted: 07/17/2013] [Indexed: 12/17/2022]
Abstract
Great efforts have been devoted to alleviate uncertainty of detected cancer genes as accurate identification of oncogenes is of tremendous significance and helps unravel the biological behavior of tumors. In this paper, we present a differential network-based framework to detect biologically meaningful cancer-related genes. Firstly, a gene regulatory network construction algorithm is proposed, in which a boosting regression based on likelihood score and informative prior is employed for improving accuracy of identification. Secondly, with the algorithm, two gene regulatory networks are constructed from case and control samples independently. Thirdly, by subtracting the two networks, a differential-network model is obtained and then used to rank differentially expressed hub genes for identification of cancer biomarkers. Compared with two existing gene-based methods (t-test and lasso), the method has a significant improvement in accuracy both on synthetic datasets and two real breast cancer datasets. Furthermore, identified six genes (TSPYL5, CD55, CCNE2, DCK, BBC3, and MUC1) susceptible to breast cancer were verified through the literature mining, GO analysis, and pathway functional enrichment analysis. Among these oncogenes, TSPYL5 and CCNE2 have been already known as prognostic biomarkers in breast cancer, CD55 has been suspected of playing an important role in breast cancer prognosis from literature evidence, and other three genes are newly discovered breast cancer biomarkers. More generally, the differential-network schema can be extended to other complex diseases for detection of disease associated-genes.
Collapse
|
44
|
|