1
|
McKeague IW, Zhang X. Significance testing for canonical correlation analysis in high dimensions. Biometrika 2022; 109:1067-1083. [PMID: 36685139 PMCID: PMC9857302 DOI: 10.1093/biomet/asab059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
We consider the problem of testing for the presence of linear relationships between large sets of random variables based on a post-selection inference approach to canonical correlation analysis. The challenge is to adjust for the selection of subsets of variables having linear combinations with maximal sample correlation. To this end, we construct a stabilized one-step estimator of the euclidean-norm of the canonical correlations maximized over subsets of variables of pre-specified cardinality. This estimator is shown to be consistent for its target parameter and asymptotically normal, provided the dimensions of the variables do not grow too quickly with sample size. We also develop a greedy search algorithm to accurately compute the estimator, leading to a computationally tractable omnibus test for the global null hypothesis that there are no linear relationships between any subsets of variables having the pre-specified cardinality. We further develop a confidence interval that takes the variable selection into account.
Collapse
Affiliation(s)
- Ian W McKeague
- Department of Biostatistics, Columbia University, Room R639, 722 West 168th Street, New York, NY 10032, USA
| | - Xin Zhang
- Department of Statistics, Florida State University, 214 OSB, 117 N. Woodward Ave., Tallahassee, FL 32306, USA
| |
Collapse
|
2
|
Eigenvector-based sparse canonical correlation analysis: Fast computation for estimation of multiple canonical vectors. J MULTIVARIATE ANAL 2021. [DOI: 10.1016/j.jmva.2021.104781] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
3
|
Jernigan R, Jia K, Ren Z, Zhou W. Large-scale multiple inference of collective dependence with applications to protein function. Ann Appl Stat 2021; 15:902-924. [DOI: 10.1214/20-aoas1431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Robert Jernigan
- Department of Biochemistry, Biophysics, and Molecular Biology, Program of Bioinformatics and Computational Biology, Iowa State University
| | - Kejue Jia
- Department of Biochemistry, Biophysics, and Molecular Biology, Program of Bioinformatics and Computational Biology, Iowa State University
| | - Zhao Ren
- Department of Statistics, University of Pittsburgh
| | - Wen Zhou
- Department of Statistics, Colorado State University
| |
Collapse
|
4
|
Chi C, Ye Y, Chen B, Huang H. Bipartite graph-based approach for clustering of cell lines by gene expression-drug response associations. Bioinformatics 2021; 37:2617-2626. [PMID: 33682877 PMCID: PMC8428606 DOI: 10.1093/bioinformatics/btab143] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 02/16/2021] [Accepted: 03/01/2021] [Indexed: 01/29/2023] Open
Abstract
MOTIVATION In pharmacogenomic studies, the biological context of cell lines influences the predictive ability of drug-response models and the discovery of biomarkers. Thus, similar cell lines are often studied together based on prior knowledge of biological annotations. However, this selection approach is not scalable with the number of annotations, and the relationship between gene-drug association patterns and biological context may not be obvious. RESULTS We present a procedure to compare cell lines based on their gene-drug association patterns. Starting with a grouping of cell lines from biological annotation, we model gene-drug association patterns for each group as a bipartite graph between genes and drugs. This is accomplished by applying sparse canonical correlation analysis (SCCA) to extract the gene-drug associations, and using the canonical vectors to construct the edge weights. Then, we introduce a nuclear norm-based dissimilarity measure to compare the bipartite graphs. Accompanying our procedure is a permutation test to evaluate the significance of similarity of cell line groups in terms of gene-drug associations. In the pharmacogenomics datasets CTRP2, GDSC2, and CCLE, hierarchical clustering of carcinoma groups based on this dissimilarity measure uniquely reveals clustering patterns driven by carcinoma subtype rather than primary site. Next, we show that the top associated drugs or genes from SCCA can be used to characterize the clustering patterns of haematopoietic and lymphoid malignancies. Finally, we confirm by simulation that when drug responses are linearly-dependent on expression, our approach is the only one that can effectively infer the true hierarchy compared to existing approaches. AVAILABILITY Bipartite graph-based hierarchical clustering is implemented in R and can be obtained from CRAN: https://CRAN.R-project.org/package=hierBipartite. The source code is available at https://github.com/CalvinTChi/hierBipartite. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Calvin Chi
- Center for Computational Biology, University of California, Berkeley, CA 94720, USA
| | - Yuting Ye
- Division of Biostatistics, University of California, Berkeley, CA 94720, USA
| | - Bin Chen
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 48912, USA.,Department of Pharmacology and Toxicology, Michigan State University, Grand Rapids, MI 48824, USA
| | - Haiyan Huang
- Center for Computational Biology, University of California, Berkeley, CA 94720, USA.,Department of Statistics, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
5
|
Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021; 36:89-108. [PMID: 34305304 PMCID: PMC8296984 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, Australia
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
| | | | - Haiyan Huang
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
6
|
Saha S, Bandopadhyay S, Ghosh A. Identifying the degree of genetic interactions using Restricted Boltzmann Machine-A study on colorectal cancer. IET Syst Biol 2020; 15:26-39. [PMID: 33590963 PMCID: PMC8675802 DOI: 10.1049/syb2.12009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 09/30/2020] [Accepted: 10/19/2020] [Indexed: 12/02/2022] Open
Abstract
The phenomenon of two or more genes affecting the expression of each other in various ways in the development of a single character of an organism is known as gene interaction. Gene interaction not only applies to normal human traits but to the diseased samples as well. Thus, an analysis of gene interaction could help us to differentiate between the normal and the diseased samples or between the two/more phases any diseased samples. At the first stage of this work we have used restricted Boltzmann machine model to find such significant interactions present in normal and/or cancer samples of every gene pairs of 20 genes of colorectal cancer data set (GDS4382) along with the weight/degree of those interactions. Later on, we are looking for those interactions present in adenoma and/or carcinoma samples of the same 20 genes of colorectal cancer data set (GDS1777). The weight/degree of those interactions represents how strong/weak an interaction is. At the end we will create a gene regulatory network with the help of those interactions, where the regulatory genes are identified by using Naïve Bayes Classifier. Experimental results are validated biologically by comparing the interactions with NCBI databases.
Collapse
Affiliation(s)
- Sujay Saha
- Department of Computer Science & Engineering, Heritage Institute of Technology, Kolkata, India
| | - Saikat Bandopadhyay
- Department of Computer Science & Engineering, Heritage Institute of Technology, Kolkata, India
| | - Anupam Ghosh
- Department of Computer Science & Engineering, Netaji Subhash Engineering College, Kolkata, India
| |
Collapse
|
7
|
Kontio JAJ, Rinta-Aho MJ, Sillanpää MJ. Estimating Linear and Nonlinear Gene Coexpression Networks by Semiparametric Neighborhood Selection. Genetics 2020; 215:597-607. [PMID: 32414870 PMCID: PMC7337083 DOI: 10.1534/genetics.120.303186] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Accepted: 05/11/2020] [Indexed: 11/18/2022] Open
Abstract
Whereas nonlinear relationships between genes are acknowledged, there exist only a few methods for estimating nonlinear gene coexpression networks or gene regulatory networks (GCNs/GRNs) with common deficiencies. These methods often consider only pairwise associations between genes, and are, therefore, poorly capable of identifying higher-order regulatory patterns when multiple genes should be considered simultaneously. Another critical issue in current nonlinear GCN/GRN estimation approaches is that they consider linear and nonlinear dependencies at the same time in confounded form nonparametrically. This severely undermines the possibilities for nonlinear associations to be found, since the power of detecting nonlinear dependencies is lower compared to linear dependencies, and the sparsity-inducing procedures might favor linear relationships over nonlinear ones only due to small sample sizes. In this paper, we propose a method to estimate undirected nonlinear GCNs independently from the linear associations between genes based on a novel semiparametric neighborhood selection procedure capable of identifying complex nonlinear associations between genes. Simulation studies using the common DREAM3 and DREAM9 datasets show that the proposed method compares superiorly to the current nonlinear GCN/GRN estimation methods.
Collapse
Affiliation(s)
- Juho A J Kontio
- Research Unit of Mathematical Sciences, Biocenter Oulu, University of Oulu, 90014, Finland
| | - Marko J Rinta-Aho
- Research Unit of Mathematical Sciences, Biocenter Oulu, University of Oulu, 90014, Finland
| | - Mikko J Sillanpää
- Research Unit of Mathematical Sciences, Biocenter Oulu, University of Oulu, 90014, Finland
- Infotech Oulu, University of Oulu, 90014, Finland
| |
Collapse
|
8
|
Shi WJ, Zhuang Y, Russell PH, Hobbs BD, Parker MM, Castaldi PJ, Rudra P, Vestal B, Hersh CP, Saba LM, Kechris K. Unsupervised discovery of phenotype-specific multi-omics networks. Bioinformatics 2020; 35:4336-4343. [PMID: 30957844 DOI: 10.1093/bioinformatics/btz226] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 02/01/2019] [Accepted: 04/05/2019] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Complex diseases often involve a wide spectrum of phenotypic traits. Better understanding of the biological mechanisms relevant to each trait promotes understanding of the etiology of the disease and the potential for targeted and effective treatment plans. There have been many efforts towards omics data integration and network reconstruction, but limited work has examined the incorporation of relevant (quantitative) phenotypic traits. RESULTS We propose a novel technique, sparse multiple canonical correlation network analysis (SmCCNet), for integrating multiple omics data types along with a quantitative phenotype of interest, and for constructing multi-omics networks that are specific to the phenotype. As a case study, we focus on miRNA-mRNA networks. Through simulations, we demonstrate that SmCCNet has better overall prediction performance compared to popular gene expression network construction and integration approaches under realistic settings. Applying SmCCNet to studies on chronic obstructive pulmonary disease (COPD) and breast cancer, we found enrichment of known relevant pathways (e.g. the Cadherin pathway for COPD and the interferon-gamma signaling pathway for breast cancer) as well as less known omics features that may be important to the diseases. Although those applications focus on miRNA-mRNA co-expression networks, SmCCNet is applicable to a variety of omics and other data types. It can also be easily generalized to incorporate multiple quantitative phenotype simultaneously. The versatility of SmCCNet suggests great potential of the approach in many areas. AVAILABILITY AND IMPLEMENTATION The SmCCNet algorithm is written in R, and is freely available on the web at https://cran.r-project.org/web/packages/SmCCNet/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- W Jenny Shi
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Yonghua Zhuang
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Pamela H Russell
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Brian D Hobbs
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Margaret M Parker
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Peter J Castaldi
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Pratyaydipta Rudra
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Department of Statistics, Oklahoma State University, Stillwater, OK
| | - Brian Vestal
- Center for Genes, Environment & Health, National Jewish Health, Denver, CO, USA
| | - Craig P Hersh
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Laura M Saba
- Department of Pharmaceutical Sciences, University of Colorado, Aurora, CO, USA
| | - Katerina Kechris
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
9
|
Huang N, Elhilali M. Push-pull competition between bottom-up and top-down auditory attention to natural soundscapes. eLife 2020; 9:52984. [PMID: 32196457 PMCID: PMC7083598 DOI: 10.7554/elife.52984] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 02/13/2020] [Indexed: 12/17/2022] Open
Abstract
In everyday social environments, demands on attentional resources dynamically shift to balance our attention to targets of interest while alerting us to important objects in our surrounds. The current study uses electroencephalography to explore how the push-pull interaction between top-down and bottom-up attention manifests itself in dynamic auditory scenes. Using natural soundscapes as distractors while subjects attend to a controlled rhythmic sound sequence, we find that salient events in background scenes significantly suppress phase-locking and gamma responses to the attended sequence, countering enhancement effects observed for attended targets. In line with a hypothesis of limited attentional resources, the modulation of neural activity by bottom-up attention is graded by degree of salience of ambient events. The study also provides insights into the interplay between endogenous and exogenous attention during natural soundscapes, with both forms of attention engaging a common fronto-parietal network at different time lags.
Collapse
Affiliation(s)
- Nicholas Huang
- Laboratory for Computational Audio Perception, Department of Electrical Engineering, Johns Hopkins University, Baltimore, United States
| | - Mounya Elhilali
- Laboratory for Computational Audio Perception, Department of Electrical Engineering, Johns Hopkins University, Baltimore, United States
| |
Collapse
|
10
|
Zhang J, Wei Sun W, Li L. Mixed-Effect Time-Varying Network Model and Application in Brain Connectivity Analysis. J Am Stat Assoc 2020; 115:2022-2036. [PMID: 34321703 DOI: 10.1080/01621459.2019.1677242] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Time-varying networks are fast emerging in a wide range of scientific and business applications. Most existing dynamic network models are limited to a single-subject and discrete-time setting. In this article, we propose a mixed-effect network model that characterizes the continuous time-varying behavior of the network at the population level, meanwhile taking into account both the individual subject variability as well as the prior module information. We develop a multistep optimization procedure for a constrained likelihood estimation and derive the associated asymptotic properties. We demonstrate the effectiveness of our method through both simulations and an application to a study of brain development in youth. Supplementary materials for this article are available online.
Collapse
Affiliation(s)
- Jingfei Zhang
- Department of Management Science, Miami Business School, University of Miami, Miami, FL
| | - Will Wei Sun
- Krannert School of Management, Purdue University, West Lafayette, IN
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California at Berkeley, Berkeley, CA
| |
Collapse
|
11
|
Funke T, Becker T. Stochastic block models: A comparison of variants and inference methods. PLoS One 2019; 14:e0215296. [PMID: 31013290 PMCID: PMC6478296 DOI: 10.1371/journal.pone.0215296] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 03/30/2019] [Indexed: 11/19/2022] Open
Abstract
Finding communities in complex networks is a challenging task and one promising approach is the Stochastic Block Model (SBM). But the influences from various fields led to a diversity of variants and inference methods. Therefore, a comparison of the existing techniques and an independent analysis of their capabilities and weaknesses is needed. As a first step, we review the development of different SBM variants such as the degree-corrected SBM of Karrer and Newman or Peixoto's hierarchical SBM. Beside stating all these variants in a uniform notation, we show the reasons for their development. Knowing the variants, we discuss a variety of approaches to infer the optimal partition like the Metropolis-Hastings algorithm. We perform our analysis based on our extension of the Girvan-Newman test and the Lancichinetti-Fortunato-Radicchi benchmark as well as a selection of some real world networks. Using these results, we give some guidance to the challenging task of selecting an inference method and SBM variant. In addition, we give a simple heuristic to determine the number of steps for the Metropolis-Hastings algorithms that lack a usual stop criterion. With our comparison, we hope to guide researches in the field of SBM and highlight the problem of existing techniques to focus future research. Finally, by making our code freely available, we want to promote a faster development, integration and exchange of new ideas.
Collapse
Affiliation(s)
- Thorben Funke
- Production Systems and Logistic Systems, BIBA - Bremer Institut für Produktion und Logistik GmbH at the University of Bremen, Bremen, Bremen, Germany
- Faculty of Production Engineering, University of Bremen, Bremen, Bremen, Germany
| | - Till Becker
- Production Systems and Logistic Systems, BIBA - Bremer Institut für Produktion und Logistik GmbH at the University of Bremen, Bremen, Bremen, Germany
- Faculty of Business Studies, University of Applied Sciences Emden/Leer, Emden, Lower Saxony, Germany
| |
Collapse
|
12
|
Mai Q, Zhang X. An iterative penalized least squares approach to sparse canonical correlation analysis. Biometrics 2019; 75:734-744. [PMID: 30714093 DOI: 10.1111/biom.13043] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2017] [Accepted: 01/29/2019] [Indexed: 11/30/2022]
Abstract
It is increasingly interesting to model the relationship between two sets of high-dimensional measurements with potentially high correlations. Canonical correlation analysis (CCA) is a classical tool that explores the dependency of two multivariate random variables and extracts canonical pairs of highly correlated linear combinations. Driven by applications in genomics, text mining, and imaging research, among others, many recent studies generalize CCA to high-dimensional settings. However, most of them either rely on strong assumptions on covariance matrices, or do not produce nested solutions. We propose a new sparse CCA (SCCA) method that recasts high-dimensional CCA as an iterative penalized least squares problem. Thanks to the new iterative penalized least squares formulation, our method directly estimates the sparse CCA directions with efficient algorithms. Therefore, in contrast to some existing methods, the new SCCA does not impose any sparsity assumptions on the covariance matrices. The proposed SCCA is also very flexible in the sense that it can be easily combined with properly chosen penalty functions to perform structured variable selection and incorporate prior information. Moreover, our proposal of SCCA produces nested solutions and thus provides great convenient in practice. Theoretical results show that SCCA can consistently estimate the true canonical pairs with an overwhelming probability in ultra-high dimensions. Numerical results also demonstrate the competitive performance of SCCA.
Collapse
Affiliation(s)
- Qing Mai
- Department of Statistics, Florida State University, Tallahassee, Florida
| | - Xin Zhang
- Department of Statistics, Florida State University, Tallahassee, Florida
| |
Collapse
|
13
|
Biclustering by sparse canonical correlation analysis. QUANTITATIVE BIOLOGY 2018. [DOI: 10.1007/s40484-017-0127-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
14
|
Gene-Gene Interaction Analysis: Correlation, Relative Entropy and Rough Set Theory Based Approach. BIOINFORMATICS AND BIOMEDICAL ENGINEERING 2018. [DOI: 10.1007/978-3-319-78759-6_36] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
15
|
Data Wisdom in Computational Genomics Research. STATISTICS IN BIOSCIENCES 2017. [DOI: 10.1007/s12561-016-9173-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
16
|
Li H, He Y, Hao P, Liu P. Identification of characteristic gene modules of osteosarcoma using bioinformatics analysis indicates the possible molecular pathogenesis. Mol Med Rep 2017; 15:2113-2119. [PMID: 28259906 PMCID: PMC5364958 DOI: 10.3892/mmr.2017.6245] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2015] [Accepted: 12/06/2016] [Indexed: 11/06/2022] Open
Abstract
The aim of the present study was to investigate the possible pathogenesis of osteosarcoma using bioinformatics analysis to examine gene‑gene interactions. A total of three datasets associated with osteosarcoma were downloaded from the Gene Expression Omnibus. The differentially expressed genes (DEGs) were identified using the significance analysis of microarrays method, which then were subjected to the Human Protein Reference Database to identify the protein‑protein interaction (PPI) pairs and to construct a PPI network of the DEGs. Subsequent multilevel community analysis was applied to mine the modules in the network, followed by screening of the differential expression module using the GlobalAncova package. The genes in the differential expression modules were verified in the valid datasets. The verified genes underwent functional and pathway enrichment analysis. A total of 616 DEGs were selected to construct the PPI network, which included 5,808 osteosarcoma‑specific interaction pairs and 8,012 normal‑specific pairs. Tumor protein p53 (TP53), mitogen-activated protein kinase 1 (MAPK1) and estrogen receptor 1 (ESR1) were identified the most important osteosarcoma‑associated genes, with the highest levels of topological properties. Neurogenic locus notch homolog protein 3 (NOTCH3) and caspase 1 (CASP1) were identified as the osteosarcoma‑specific interaction pairs. Among all 23 mined modules, three were identified as differential expression modules, which were verified in the other two datasets. The genes in these modules were predominantly enriched in the FGFR, MAPK and Notch signaling pathways. Therefore, TP53, MAPK1, ESR1, NOTCH3 and CASP1 may be important in the development of osteosarcoma, and provides valuable clues to investigate the pathogenesis of osteosarcoma using the three differential expression modules.
Collapse
Affiliation(s)
- Hongmin Li
- Cancer Center, Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital, Chengdu, Sichuan 610072, P.R. China
- Cancer Center, Affiliated Medical School, University of Electronic Science and Technology, Chengdu, Sichuan 610051, P.R. China
| | - Yangke He
- Cancer Center, Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital, Chengdu, Sichuan 610072, P.R. China
- Cancer Center, Affiliated Medical School, University of Electronic Science and Technology, Chengdu, Sichuan 610051, P.R. China
| | - Peng Hao
- Department of Orthopedics, Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital, Chengdu, Sichuan 610072, P.R. China
- Department of Orthopedics, Affiliated Medical School, University of Electronic Science and Technology, Chengdu, Sichuan 610051, P.R. China
| | - Pan Liu
- Department of Orthopedics, Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital, Chengdu, Sichuan 610072, P.R. China
- Department of Orthopedics, Affiliated Medical School, University of Electronic Science and Technology, Chengdu, Sichuan 610051, P.R. China
| |
Collapse
|
17
|
Hu Y, Zhao H, Ai X. Inferring Weighted Directed Association Network from Multivariate Time Series with a Synthetic Method of Partial Symbolic Transfer Entropy Spectrum and Granger Causality. PLoS One 2016; 11:e0166084. [PMID: 27832153 PMCID: PMC5104482 DOI: 10.1371/journal.pone.0166084] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Accepted: 10/21/2016] [Indexed: 11/18/2022] Open
Abstract
Complex network methodology is very useful for complex system explorer. However, the relationships among variables in complex system are usually not clear. Therefore, inferring association networks among variables from their observed data has been a popular research topic. We propose a synthetic method, named small-shuffle partial symbolic transfer entropy spectrum (SSPSTES), for inferring association network from multivariate time series. The method synthesizes surrogate data, partial symbolic transfer entropy (PSTE) and Granger causality. A proper threshold selection is crucial for common correlation identification methods and it is not easy for users. The proposed method can not only identify the strong correlation without selecting a threshold but also has the ability of correlation quantification, direction identification and temporal relation identification. The method can be divided into three layers, i.e. data layer, model layer and network layer. In the model layer, the method identifies all the possible pair-wise correlation. In the network layer, we introduce a filter algorithm to remove the indirect weak correlation and retain strong correlation. Finally, we build a weighted adjacency matrix, the value of each entry representing the correlation level between pair-wise variables, and then get the weighted directed association network. Two numerical simulated data from linear system and nonlinear system are illustrated to show the steps and performance of the proposed approach. The ability of the proposed method is approved by an application finally.
Collapse
Affiliation(s)
- Yanzhu Hu
- Beijing Key Laboratory of Work Safety Intelligent Monitoring, Beijing University of Posts and Telecommunications, Beijing, 100876, China
| | - Huiyang Zhao
- Beijing Key Laboratory of Work Safety Intelligent Monitoring, Beijing University of Posts and Telecommunications, Beijing, 100876, China
- School of Information Engineering, Xuchang University, Xuchang, 461000, China
- * E-mail:
| | - Xinbo Ai
- Beijing Key Laboratory of Work Safety Intelligent Monitoring, Beijing University of Posts and Telecommunications, Beijing, 100876, China
| |
Collapse
|
18
|
Inferring Weighted Directed Association Networks from Multivariate Time Series with the Small-Shuffle Symbolic Transfer Entropy Spectrum Method. ENTROPY 2016. [DOI: 10.3390/e18090328] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
19
|
Wang YXR, Jiang K, Feldman LJ, Bickel PJ, Huang H. Inferring gene–gene interactions and functional modules using sparse canonical correlation analysis. Ann Appl Stat 2015. [DOI: 10.1214/14-aoas792] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
20
|
Wang YXR, Huang H. Review on statistical methods for gene network reconstruction using expression data. J Theor Biol 2014; 362:53-61. [PMID: 24726980 DOI: 10.1016/j.jtbi.2014.03.040] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2014] [Revised: 03/29/2014] [Accepted: 03/31/2014] [Indexed: 12/16/2022]
Abstract
Network modeling has proven to be a fundamental tool in analyzing the inner workings of a cell. It has revolutionized our understanding of biological processes and made significant contributions to the discovery of disease biomarkers. Much effort has been devoted to reconstruct various types of biochemical networks using functional genomic datasets generated by high-throughput technologies. This paper discusses statistical methods used to reconstruct gene regulatory networks using gene expression data. In particular, we highlight progress made and challenges yet to be met in the problems involved in estimating gene interactions, inferring causality and modeling temporal changes of regulation behaviors. As rapid advances in technologies have made available diverse, large-scale genomic data, we also survey methods of incorporating all these additional data to achieve better, more accurate inference of gene networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- Department of Statistics, University of California, Berkeley, CA 94720, USA.
| | - Haiyan Huang
- Department of Statistics, University of California, Berkeley, CA 94720, USA.
| |
Collapse
|