1
|
Liu Y, Darville T, Zheng X, Li Q. Decomposition of variation of mixed variables by a latent mixed Gaussian copula model. Biometrics 2023; 79:1187-1200. [PMID: 35304917 PMCID: PMC10019899 DOI: 10.1111/biom.13660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 03/03/2022] [Indexed: 11/27/2022]
Abstract
Many biomedical studies collect data of mixed types of variables from multiple groups of subjects. Some of these studies aim to find the group-specific and the common variation among all these variables. Even though similar problems have been studied by some previous works, their methods mainly rely on the Pearson correlation, which cannot handle mixed data. To address this issue, we propose a latent mixed Gaussian copula (LMGC) model that can quantify the correlations among binary, ordinal, continuous, and truncated variables in a unified framework. We also provide a tool to decompose the variation into the group-specific and the common variation over multiple groups via solving a regularized M-estimation problem. We conduct extensive simulation studies to show the advantage of our proposed method over the Pearson correlation-based methods. We also demonstrate that by jointly solving the M-estimation problem over multiple groups, our method is better than decomposing the variation group by group. We also apply our method to a Chlamydia trachomatis genital tract infection study to demonstrate how it can be used to discover informative biomarkers that differentiate patients.
Collapse
Affiliation(s)
- Yutong Liu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Toni Darville
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Xiaojing Zheng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Quefeng Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
2
|
Panditrao G, Bhowmick R, Meena C, Sarkar RR. Emerging landscape of molecular interaction networks: Opportunities, challenges and prospects. J Biosci 2022. [PMID: 36210749 PMCID: PMC9018971 DOI: 10.1007/s12038-022-00253-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Network biology finds application in interpreting molecular interaction networks and providing insightful inferences using graph theoretical analysis of biological systems. The integration of computational bio-modelling approaches with different hybrid network-based techniques provides additional information about the behaviour of complex systems. With increasing advances in high-throughput technologies in biological research, attempts have been made to incorporate this information into network structures, which has led to a continuous update of network biology approaches over time. The newly minted centrality measures accommodate the details of omics data and regulatory network structure information. The unification of graph network properties with classical mathematical and computational modelling approaches and technologically advanced approaches like machine-learning- and artificial intelligence-based algorithms leverages the potential application of these techniques. These computational advances prove beneficial and serve various applications such as essential gene prediction, identification of drug–disease interaction and gene prioritization. Hence, in this review, we have provided a comprehensive overview of the emerging landscape of molecular interaction networks using graph theoretical approaches. With the aim to provide information on the wide range of applications of network biology approaches in understanding the interaction and regulation of genes, proteins, enzymes and metabolites at different molecular levels, we have reviewed the methods that utilize network topological properties, emerging hybrid network-based approaches and applications that integrate machine learning techniques to analyse molecular interaction networks. Further, we have discussed the applications of these approaches in biomedical research with a note on future prospects.
Collapse
Affiliation(s)
- Gauri Panditrao
- Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory, Pune, 411008 India
| | - Rupa Bhowmick
- Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory, Pune, 411008 India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002 India
| | - Chandrakala Meena
- Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory, Pune, 411008 India
| | - Ram Rup Sarkar
- Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory, Pune, 411008 India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002 India
| |
Collapse
|
3
|
Wu G, Li X, Guo W, Wei Z, Hu T, Shan Y, Gu J. JEBIN: analyzing gene co-expressions across multiple datasets by joint network embedding. Brief Bioinform 2022; 23:6519533. [PMID: 35134135 DOI: 10.1093/bib/bbab603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 12/15/2021] [Accepted: 12/27/2021] [Indexed: 11/13/2022] Open
Abstract
The inference of gene co-expression associations is one of the fundamental tasks for large-scale transcriptomic data analysis. Due to the high dimensionality and high noises in transcriptomic data, it is difficult to infer stable gene co-expression associations from single dataset. Meta-analysis of multisource data can effectively tackle this problem. We proposed Joint Embedding of multiple BIpartite Networks (JEBIN) to learn the low-dimensional consensus representation for genes by integrating multiple expression datasets. JEBIN infers gene co-expression associations in a nonlinear and global similarity manner and can integrate datasets with different distributions in linear time complexity with the gene and total sample size. The effectiveness and scalability of JEBIN were verified by simulation experiments, and its superiority over the commonly used integration methods was proved by three indexes on real biological datasets. Then, JEBIN was applied to study the gene co-expression patterns of hepatocellular carcinoma (HCC) based on multiple expression datasets of HCC and adjacent normal tissues, and further on latest HCC single-cell RNA-seq data. Results show that gene co-expressions are highly different between bulk and single-cell datasets. Finally, many differentially co-expressed ligand-receptor pairs were discovered by comparing HCC with adjacent normal data, providing candidate HCC targets for abnormal cell-cell communications.
Collapse
Affiliation(s)
- Guiying Wu
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiangyu Li
- School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China
| | - Wenbo Guo
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Zheng Wei
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Tao Hu
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Yiran Shan
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Jin Gu
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
4
|
Choi D, Lee S. SNeCT: Scalable Network Constrained Tucker Decomposition for Multi-Platform Data Profiling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1785-1796. [PMID: 30908262 DOI: 10.1109/tcbb.2019.2906205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
How do we integratively profile large-scale multi-platform genomic data that are high dimensional and sparse? Furthermore, how can we incorporate prior knowledge, such as the association between genes, in the analysis systematically to find better latent relationships? To solve this problem, we propose a Scalable Network Constrained Tucker decomposition method (SNeCT). SNeCT adopts parallel stochastic gradient descent approach on the proposed parallelizable network constrained optimization function. SNeCT decomposition is applied to a tensor constructed from a large scale multi-platform multi-cohort cancer data, PanCan12, constrained on a network built from PathwayCommons database. The decomposed factor matrices are applied to stratify cancers, to search for top- k similar patients given a new patient, and to illustrate how the matrices can be used to identify significant genomic patterns in each patient. In the stratification test, combined twelve-cohort data is clustered to form thirteen subclasses. The similarity of the top- k patient to the query was high for 23 clinical features, including estrogen/progesterone receptor statuses of BRCA patients with average precision value ranges from 0.72 to 0.86 and from 0.68 to 0.86, respectively. We also illustrate how the factor matrices can be used for identifying significant patterns for each patient. Resources are available at: https://github.com/leesael/SNeCT.
Collapse
|
5
|
Rockne RC, Branciamore S, Qi J, Frankhouser DE, O'Meally D, Hua WK, Cook G, Carnahan E, Zhang L, Marom A, Wu H, Maestrini D, Wu X, Yuan YC, Liu Z, Wang LD, Forman S, Carlesso N, Kuo YH, Marcucci G. State-Transition Analysis of Time-Sequential Gene Expression Identifies Critical Points That Predict Development of Acute Myeloid Leukemia. Cancer Res 2020; 80:3157-3169. [PMID: 32414754 PMCID: PMC7416495 DOI: 10.1158/0008-5472.can-20-0354] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 04/06/2020] [Accepted: 05/12/2020] [Indexed: 12/13/2022]
Abstract
Temporal dynamics of gene expression inform cellular and molecular perturbations associated with disease development and evolution. Given the complexity of high-dimensional temporal genomic data, an analytic framework guided by a robust theory is needed to interpret time-sequential changes and to predict system dynamics. Here we model temporal dynamics of the transcriptome of peripheral blood mononuclear cells in a two-dimensional state-space representing states of health and leukemia using time-sequential bulk RNA-seq data from a murine model of acute myeloid leukemia (AML). The state-transition model identified critical points that accurately predict AML development and identifies stepwise transcriptomic perturbations that drive leukemia progression. The geometry of the transcriptome state-space provided a biological interpretation of gene dynamics, aligned gene signals that are not synchronized in time across mice, and allowed quantification of gene and pathway contributions to leukemia development. Our state-transition model synthesizes information from multiple cell types in the peripheral blood and identifies critical points in the transition from health to leukemia to guide interpretation of changes in the transcriptome as a whole to predict disease progression. SIGNIFICANCE: These findings apply the theory of state transitions to model the initiation and development of acute myeloid leukemia, identifying transcriptomic perturbations that accurately predict time to disease development.See related commentary by Kuijjer, p. 3072 GRAPHICAL ABSTRACT: http://cancerres.aacrjournals.org/content/canres/80/15/3157/F1.large.jpg.
Collapse
Affiliation(s)
- Russell C Rockne
- Division of Mathematical Oncology, Department of Computational and Quantitative Medicine, Beckman Research Institute, City of Hope Medical Center, Duarte, California.
| | - Sergio Branciamore
- Department of Diabetes Complications & Metabolism, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - Jing Qi
- Department of Hematological Malignancies Translational Science, Hematology & Hematopoietic Cell Transplantation and the Gehr Family Center for Leukemia Research, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - David E Frankhouser
- Department of Diabetes Complications & Metabolism, Beckman Research Institute, City of Hope Medical Center, Duarte, California
- Department of Population Sciences, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - Denis O'Meally
- Center for Gene Therapy, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - Wei-Kai Hua
- Department of Hematological Malignancies Translational Science, Hematology & Hematopoietic Cell Transplantation and the Gehr Family Center for Leukemia Research, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - Guerry Cook
- Department of Hematological Malignancies Translational Science, Hematology & Hematopoietic Cell Transplantation and the Gehr Family Center for Leukemia Research, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - Emily Carnahan
- Department of Hematological Malignancies Translational Science, Hematology & Hematopoietic Cell Transplantation and the Gehr Family Center for Leukemia Research, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - Lianjun Zhang
- Department of Hematological Malignancies Translational Science, Hematology & Hematopoietic Cell Transplantation and the Gehr Family Center for Leukemia Research, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - Ayelet Marom
- Department of Hematological Malignancies Translational Science, Hematology & Hematopoietic Cell Transplantation and the Gehr Family Center for Leukemia Research, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - Herman Wu
- Department of Hematological Malignancies Translational Science, Hematology & Hematopoietic Cell Transplantation and the Gehr Family Center for Leukemia Research, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - Davide Maestrini
- Division of Mathematical Oncology, Department of Computational and Quantitative Medicine, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - Xiwei Wu
- Department of Molecular Medicine; Bioinformatics Core, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - Yate-Ching Yuan
- Department of Molecular Medicine; Bioinformatics Core, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - Zheng Liu
- Department of Molecular and Cellular Biology; Integrative Genomics Core, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - Leo D Wang
- Department of Immuno-Oncology, Beckman Research Institute, City of Hope Medical Center, Duarte, California
- Department of Pediatrics, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - Stephen Forman
- Department of Hematological Malignancies Translational Science, Hematology & Hematopoietic Cell Transplantation and the Gehr Family Center for Leukemia Research, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - Nadia Carlesso
- Department of Hematological Malignancies Translational Science, Hematology & Hematopoietic Cell Transplantation and the Gehr Family Center for Leukemia Research, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| | - Ya-Huei Kuo
- Department of Hematological Malignancies Translational Science, Hematology & Hematopoietic Cell Transplantation and the Gehr Family Center for Leukemia Research, Beckman Research Institute, City of Hope Medical Center, Duarte, California.
| | - Guido Marcucci
- Department of Hematological Malignancies Translational Science, Hematology & Hematopoietic Cell Transplantation and the Gehr Family Center for Leukemia Research, Beckman Research Institute, City of Hope Medical Center, Duarte, California
| |
Collapse
|
6
|
Chowdhury HA, Bhattacharyya DK, Kalita JK. (Differential) Co-Expression Analysis of Gene Expression: A Survey of Best Practices. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1154-1173. [PMID: 30668502 DOI: 10.1109/tcbb.2019.2893170] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Analysis of gene expression data is widely used in transcriptomic studies to understand functions of molecules inside a cell and interactions among molecules. Differential co-expression analysis studies diseases and phenotypic variations by finding modules of genes whose co-expression patterns vary across conditions. We review the best practices in gene expression data analysis in terms of analysis of (differential) co-expression, co-expression network, differential networking, and differential connectivity considering both microarray and RNA-seq data along with comparisons. We highlight hurdles in RNA-seq data analysis using methods developed for microarrays. We include discussion of necessary tools for gene expression analysis throughout the paper. In addition, we shed light on scRNA-seq data analysis by including preprocessing and scRNA-seq in co-expression analysis along with useful tools specific to scRNA-seq. To get insights, biological interpretation and functional profiling is included. Finally, we provide guidelines for the analyst, along with research issues and challenges which should be addressed.
Collapse
|
7
|
Ponnapalli SP, Bradley MW, Devine K, Bowen J, Coppens SE, Leraas KM, Milash BA, Li F, Luo H, Qiu S, Wu K, Yang H, Wittwer CT, Palmer CA, Jensen RL, Gastier-Foster JM, Hanson HA, Barnholtz-Sloan JS, Alter O. Retrospective clinical trial experimentally validates glioblastoma genome-wide pattern of DNA copy-number alterations predictor of survival. APL Bioeng 2020; 4:026106. [PMID: 32478280 PMCID: PMC7229984 DOI: 10.1063/1.5142559] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 04/27/2020] [Indexed: 12/20/2022] Open
Abstract
Modeling of genomic profiles from the Cancer Genome Atlas (TCGA) by using recently developed mathematical frameworks has associated a genome-wide pattern of DNA copy-number alterations with a shorter, roughly one-year, median survival time in glioblastoma (GBM) patients. Here, to experimentally test this relationship, we whole-genome sequenced DNA from tumor samples of patients. We show that the patients represent the U.S. adult GBM population in terms of most normal and disease phenotypes. Intratumor heterogeneity affects ≈ 11 % and profiling technology and reference human genome specifics affect <1% of the classifications of the tumors by the pattern, where experimental batch effects normally reduce the reproducibility, i.e., precision, of classifications based upon between one to a few hundred genomic loci by >30%. With a 2.25-year Kaplan-Meier median survival difference, a 3.5 univariate Cox hazard ratio, and a 0.78 concordance index, i.e., accuracy, the pattern predicts survival better than and independent of age at diagnosis, which has been the best indicator since 1950. The prognostic classification by the pattern may, therefore, help to manage GBM pseudoprogression. The diagnostic classification may help drugs progress to regulatory approval. The therapeutic predictions, of previously unrecognized targets that are correlated with survival, may lead to new drugs. Other methods missed this relationship in the roughly 3B-nucleotide genomes of the small, order of magnitude of 100, patient cohorts, e.g., from TCGA. Previous attempts to associate GBM genotypes with patient phenotypes were unsuccessful. This is a proof of principle that the frameworks are uniquely suitable for discovering clinically actionable genotype-phenotype relationships.
Collapse
Affiliation(s)
- Sri Priya Ponnapalli
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah 84112, USA
| | | | - Karen Devine
- Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106, USA
| | - Jay Bowen
- The Research Institute at Nationwide Children's Hospital, Columbus, Ohio 43205, USA
| | - Sara E. Coppens
- The Research Institute at Nationwide Children's Hospital, Columbus, Ohio 43205, USA
| | - Kristen M. Leraas
- The Research Institute at Nationwide Children's Hospital, Columbus, Ohio 43205, USA
| | - Brett A. Milash
- Center for High-Performance Computing, University of Utah, Salt Lake City, Utah 84112, USA
| | - Fuqiang Li
- Beijing Genomics Institute (BGI) -Shenzhen, Shenzhen, Guangdong 518083, China
| | - Huijuan Luo
- Beijing Genomics Institute (BGI) -Shenzhen, Shenzhen, Guangdong 518083, China
| | - Shi Qiu
- BGI-Americas, Cambridge, Massachusetts 02142, USA
| | | | | | - Carl T. Wittwer
- Department of Pathology, University of Utah, Salt Lake City, Utah 84112, USA
| | | | | | | | | | - Jill S. Barnholtz-Sloan
- Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106, USA
| | - Orly Alter
- Author to whom correspondence should be addressed:
| |
Collapse
|
8
|
Independent vector analysis for common subspace analysis: Application to multi-subject fMRI data yields meaningful subgroups of schizophrenia. Neuroimage 2020; 216:116872. [PMID: 32353485 DOI: 10.1016/j.neuroimage.2020.116872] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Revised: 04/13/2020] [Accepted: 04/21/2020] [Indexed: 11/22/2022] Open
Abstract
The extraction of common and distinct biomedical signatures among different populations allows for a more detailed study of the group-specific as well as distinct information of different populations. A number of subspace analysis algorithms have been developed and successfully applied to data fusion, however they are limited to joint analysis of only a couple of datasets. Since subspace analysis is very promising for analysis of multi-subject medical imaging data as well, we focus on this problem and propose a new method based on independent vector analysis (IVA) for common subspace extraction (IVA-CS) for multi-subject data analysis. IVA-CS leverages the strength of IVA in identification of a complete subspace structure across multiple datasets along with an efficient solution that uses only second-order statistics. We propose a subset analysis approach within IVA-CS to mitigate issues in estimation in IVA due to high dimensionality, both in terms of components estimated and the number of datasets. We introduce a scheme to determine a desirable size for the subset that is high enough to exploit the dependence across datasets and is not affected by the high dimensionality issue. We demonstrate the success of IVA-CS in extracting complex subset structures and apply the method to analysis of functional magnetic resonance imaging data from 179 subjects and show that it successfully identifies shared and complementary brain patterns from patients with schizophrenia (SZ) and healthy controls group. Two components with linked resting-state networks are identified to be unique to the SZ group providing evidence of functional dysconnectivity. IVA-CS also identifies subgroups of SZs that show significant differences in terms of their brain networks and clinical symptoms.
Collapse
|
9
|
Erola P, Björkegren JLM, Michoel T. Model-based clustering of multi-tissue gene expression data. Bioinformatics 2020; 36:1807-1813. [PMID: 31688915 PMCID: PMC7162352 DOI: 10.1093/bioinformatics/btz805] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2018] [Revised: 09/05/2019] [Accepted: 10/31/2019] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Recently, it has become feasible to generate large-scale, multi-tissue gene expression data, where expression profiles are obtained from multiple tissues or organs sampled from dozens to hundreds of individuals. When traditional clustering methods are applied to this type of data, important information is lost, because they either require all tissues to be analyzed independently, ignoring dependencies and similarities between tissues, or to merge tissues in a single, monolithic dataset, ignoring individual characteristics of tissues. RESULTS We developed a Bayesian model-based multi-tissue clustering algorithm, revamp, which can incorporate prior information on physiological tissue similarity, and which results in a set of clusters, each consisting of a core set of genes conserved across tissues as well as differential sets of genes specific to one or more subsets of tissues. Using data from seven vascular and metabolic tissues from over 100 individuals in the STockholm Atherosclerosis Gene Expression (STAGE) study, we demonstrate that multi-tissue clusters inferred by revamp are more enriched for tissue-dependent protein-protein interactions compared to alternative approaches. We further demonstrate that revamp results in easily interpretable multi-tissue gene expression associations to key coronary artery disease processes and clinical phenotypes in the STAGE individuals. AVAILABILITY AND IMPLEMENTATION Revamp is implemented in the Lemon-Tree software, available at https://github.com/eb00/lemon-tree. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pau Erola
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Midlothian EH25 9RG, UK
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol BS8 2BN, UK
| | - Johan L M Björkegren
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Integrated Cardio Metabolic Centre (ICMC), Karolinska Institutet, Huddinge 141 57, Sweden
| | - Tom Michoel
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Midlothian EH25 9RG, UK
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen N-5020, Norway
| |
Collapse
|
10
|
Bradley MW, Aiello KA, Ponnapalli SP, Hanson HA, Alter O. GSVD- and tensor GSVD-uncovered patterns of DNA copy-number alterations predict adenocarcinomas survival in general and in response to platinum. APL Bioeng 2019; 3:036104. [PMID: 31463421 PMCID: PMC6701977 DOI: 10.1063/1.5099268] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 08/06/2019] [Indexed: 12/14/2022] Open
Abstract
More than a quarter of lung, uterine, and ovarian adenocarcinoma (LUAD, USEC, and OV) tumors are resistant to platinum drugs. Only recently and only in OV, patterns of copy-number alterations that predict survival in response to platinum were discovered, and only by using the tensor GSVD to compare Agilent microarray platform-matched profiles of patient-matched normal and primary tumor DNA. Here, we use the GSVD to compare whole-genome sequencing (WGS) and Affymetrix microarray profiles of patient-matched normal and primary LUAD, USEC, and OV tumor DNA. First, the GSVD uncovers patterns similar to one Agilent OV pattern, where a loss of most of the chromosome arm 6p combined with a gain of 12p encode for transformation. Like the Agilent OV pattern, the WGS LUAD and Affymetrix LUAD, USEC, and OV patterns are correlated with shorter survival, in general and in response to platinum. Like the tensor GSVD, the GSVD separates these tumor-exclusive genotypes from experimental inconsistencies. Second, by identifying the shorter survival phenotypes among the WGS- and Affymetrix-profiled tumors, the Agilent pattern proves to be a technology-independent predictor of survival, independent also of the best other indicator at diagnosis, i.e., stage. Third, like no other indicator, the pattern predicts the overall survival of OV patients experiencing progression-free survival, in general and in response to platinum. We conclude that comparative spectral decompositions, such as the GSVD and tensor GSVD, underlie a mathematically universal description of the relationships between a primary tumor's genotype and a patient's overall survival phenotype, which other methods miss.
Collapse
Affiliation(s)
| | | | - Sri Priya Ponnapalli
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah 84112, USA
| | | | | |
Collapse
|
11
|
Barkas N, Petukhov V, Nikolaeva D, Lozinsky Y, Demharter S, Khodosevich K, Kharchenko PV. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat Methods 2019; 16:695-698. [PMID: 31308548 PMCID: PMC6684315 DOI: 10.1038/s41592-019-0466-z] [Citation(s) in RCA: 157] [Impact Index Per Article: 31.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Accepted: 05/24/2019] [Indexed: 01/12/2023]
Abstract
Single-cell RNA sequencing is often applied in study designs that include multiple individuals, conditions or tissues. To identify recurrent cell subpopulations in such heterogeneous collections, we developed Conos, an approach that relies on multiple plausible inter-sample mappings to construct a global graph connecting all measured cells. The graph enables identification of recurrent cell clusters and propagation of information between datasets in multi-sample or atlas-scale collections.
Collapse
Affiliation(s)
- Nikolas Barkas
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Viktor Petukhov
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Biotech Research and Innovation Centre, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Daria Nikolaeva
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Yaroslav Lozinsky
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Samuel Demharter
- Biotech Research and Innovation Centre, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Konstantin Khodosevich
- Biotech Research and Innovation Centre, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Peter V Kharchenko
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Harvard Stem Cell Institute, Cambridge, MA, USA.
| |
Collapse
|
12
|
Suksiri B, Fukumoto M. An Efficient Framework for Estimating the Direction of Multiple Sound Sources Using Higher-Order Generalized Singular Value Decomposition. SENSORS 2019; 19:s19132977. [PMID: 31284497 PMCID: PMC6651797 DOI: 10.3390/s19132977] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 07/01/2019] [Accepted: 07/03/2019] [Indexed: 11/20/2022]
Abstract
This paper presents an efficient framework for estimating the direction-of-arrival (DOA) of wideband sound sources. The proposed framework provides an efficient way to construct a wideband cross-correlation matrix from multiple narrowband cross-correlation matrices for all frequency bins. In addition, the proposed framework is inspired by the coherent signal subspace technique with further improvement of linear transformation procedure, and the new procedure no longer requires any process of DOA preliminary estimation by exploiting unique cross-correlation matrices between the received signal and itself on distinct frequencies, along with the higher-order generalized singular value decomposition of the array of this unique matrix. Wideband DOAs are estimated by employing any subspace-based technique for estimating narrowband DOAs, but using the proposed wideband correlation instead of the narrowband correlation matrix. It implies that the proposed framework enables cutting-edge studies in the recent narrowband subspace methods to estimate DOAs of the wideband sources directly, which result in reducing computational complexity and facilitating the estimation algorithm. Practical examples are presented to showcase its applicability and effectiveness, and the results show that the performance of fusion methods perform better than others over a range of signal-to-noise ratios with just a few sensors, which make it suitable for practical use.
Collapse
Affiliation(s)
- Bandhit Suksiri
- Department of Engineering, Graduate School of Engineering, Kochi University of Technology, Kami Campus, Kochi 782-0003, Japan
| | - Masahiro Fukumoto
- School of Information, Kochi University of Technology, Kami Campus, Kochi 782-0003, Japan.
| |
Collapse
|
13
|
Smolinska A, Engel J, Szymanska E, Buydens L, Blanchet L. General Framing of Low-, Mid-, and High-Level Data Fusion With Examples in the Life Sciences. DATA HANDLING IN SCIENCE AND TECHNOLOGY 2019. [DOI: 10.1016/b978-0-444-63984-4.00003-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
14
|
Aiello KA, Ponnapalli SP, Alter O. Mathematically universal and biologically consistent astrocytoma genotype encodes for transformation and predicts survival phenotype. APL Bioeng 2018; 2. [PMID: 30397684 PMCID: PMC6215493 DOI: 10.1063/1.5037882] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
DNA alterations have been observed in astrocytoma for decades. A copy-number genotype predictive of a survival phenotype was only discovered by using the generalized singular value decomposition (GSVD) formulated as a comparative spectral decomposition. Here, we use the GSVD to compare whole-genome sequencing (WGS) profiles of patient-matched astrocytoma and normal DNA. First, the GSVD uncovers a genome-wide pattern of copy-number alterations, which is bounded by patterns recently uncovered by the GSVDs of microarray-profiled patient-matched glioblastoma (GBM) and, separately, lower-grade astrocytoma and normal genomes. Like the microarray patterns, the WGS pattern is correlated with an approximately one-year median survival time. By filling in gaps in the microarray patterns, the WGS pattern reveals that this biologically consistent genotype encodes for transformation via the Notch together with the Ras and Shh pathways. Second, like the GSVDs of the microarray profiles, the GSVD of the WGS profiles separates the tumor-exclusive pattern from normal copy-number variations and experimental inconsistencies. These include the WGS technology-specific effects of guanine-cytosine content variations across the genomes that are correlated with experimental batches. Third, by identifying the biologically consistent phenotype among the WGS-profiled tumors, the GBM pattern proves to be a technology-independent predictor of survival and response to chemotherapy and radiation, statistically better than the patient's age and tumor's grade, the best other indicators, and MGMT promoter methylation and IDH1 mutation. We conclude that by using the complex structure of the data, comparative spectral decompositions underlie a mathematically universal description of the genotype-phenotype relations in cancer that other methods miss.
Collapse
Affiliation(s)
- Katherine A Aiello
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah 84112, USA.,Department of Bioengineering, University of Utah, Salt Lake City, Utah 84112, USA
| | - Sri Priya Ponnapalli
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah 84112, USA
| | - Orly Alter
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah 84112, USA.,Department of Bioengineering, University of Utah, Salt Lake City, Utah 84112, USA.,Huntsman Cancer Institute and Department of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA
| |
Collapse
|
15
|
van Dam S, Võsa U, van der Graaf A, Franke L, de Magalhães JP. Gene co-expression analysis for functional classification and gene-disease predictions. Brief Bioinform 2018; 19:575-592. [PMID: 28077403 PMCID: PMC6054162 DOI: 10.1093/bib/bbw139] [Citation(s) in RCA: 422] [Impact Index Per Article: 70.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Revised: 12/01/2016] [Indexed: 01/06/2023] Open
Abstract
Gene co-expression networks can be used to associate genes of unknown function with biological processes, to prioritize candidate disease genes or to discern transcriptional regulatory programmes. With recent advances in transcriptomics and next-generation sequencing, co-expression networks constructed from RNA sequencing data also enable the inference of functions and disease associations for non-coding genes and splice variants. Although gene co-expression networks typically do not provide information about causality, emerging methods for differential co-expression analysis are enabling the identification of regulatory genes underlying various phenotypes. Here, we introduce and guide researchers through a (differential) co-expression analysis. We provide an overview of methods and tools used to create and analyse co-expression networks constructed from gene expression data, and we explain how these can be used to identify genes with a regulatory role in disease. Furthermore, we discuss the integration of other data types with co-expression networks and offer future perspectives of co-expression analysis.
Collapse
Affiliation(s)
- Sipko van Dam
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | - Urmo Võsa
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | | | - Lude Franke
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | | |
Collapse
|
16
|
Yu W, Zhao S, Wang Y, Zhao BN, Zhao W, Zhou X. Identification of cancer prognosis-associated functional modules using differential co-expression networks. Oncotarget 2017; 8:112928-112941. [PMID: 29348878 PMCID: PMC5762563 DOI: 10.18632/oncotarget.22878] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Accepted: 11/15/2017] [Indexed: 01/23/2023] Open
Abstract
The rapid accumulation of cancer-related data owing to high-throughput technologies has provided unprecedented choices to understand the progression of cancer and discover functional networks in multiple cancers. Establishment of co-expression networks will help us to discover the systemic properties of carcinogenesis features and regulatory mechanisms of multiple cancers. Here, we proposed a computational workflow to identify differentially co-expressed gene modules across 8 cancer types by using combined gene differential expression analysis methods and a higher-order generalized singular value decomposition. Four co-expression modules were identified; and oncogenes and tumor suppressors were significantly enriched in these modules. Functional enrichment analysis demonstrated the significantly enriched pathways in these modules, including ECM-receptor interaction, focal adhesion and PI3K-Akt signaling pathway. The top-ranked miRNAs (mir-199, mir-29, mir-200) and transcription factors (FOXO4, E2A, NFAT, and MAZ) were identified, which play an important role in deregulating cellular energetics; and regulating angiogenesis and cancer immune system. The clinical significance of the co-expressed gene clusters was assessed by evaluating their predictability of cancer patients’ survival. The predictive power of different clusters and subclusters was demonstrated. Our results will be valuable in cancer-related gene function annotation and for the evaluation of cancer patients’ prognosis.
Collapse
Affiliation(s)
- Wenshuai Yu
- Key Laboratory of Embedded System and Service Computing, College of Electronics and Information Engineering, The Ministry of Education, Tongji University, Shanghai, China
| | - Shengjie Zhao
- Key Laboratory of Embedded System and Service Computing, College of Electronics and Information Engineering, The Ministry of Education, Tongji University, Shanghai, China.,College of Software Engineering, Tongji University, Shanghai, China
| | - Yongcui Wang
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China
| | | | - Weiling Zhao
- Department of Radiology and Comprehensive Cancer Center, Wake Forest University School of Medicine, Winston Salem, NC, USA
| | - Xiaobo Zhou
- College of Electronics and Information Engineering, Tongji University, Shanghai, China.,Center for Big Data Sciences and Network Security, Tongji University, Shanghai, China.,Center for Bioinformatics and System Biology, Wake Forest University School of Medicine, Winston Salem, NC, USA
| |
Collapse
|
17
|
Wu M, Huang J, Ma S. Identifying gene-gene interactions using penalized tensor regression. Stat Med 2017; 37:598-610. [PMID: 29034516 DOI: 10.1002/sim.7523] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Revised: 09/08/2017] [Accepted: 09/12/2017] [Indexed: 12/15/2022]
Abstract
Gene-gene (G×G) interactions have been shown to be critical for the fundamental mechanisms and development of complex diseases beyond main genetic effects. The commonly adopted marginal analysis is limited by considering only a small number of G factors at a time. With the "main effects, interactions" hierarchical constraint, many of the existing joint analysis methods suffer from prohibitively high computational cost. In this study, we propose a new method for identifying important G×G interactions under joint modeling. The proposed method adopts tensor regression to accommodate high data dimensionality and the penalization technique for selection. It naturally accommodates the strong hierarchical structure without imposing additional constraints, making optimization much simpler and faster than in the existing studies. It outperforms multiple alternatives in simulation. The analysis of The Cancer Genome Atlas (TCGA) data on lung cancer and melanoma demonstrates that it can identify markers with important implications and better prediction performance.
Collapse
Affiliation(s)
- Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, 777 Guoding Road, Shanghai 200433, China.,Department of Biostatistics, School of Public Health, Yale University, 60 College Street, New Haven, CT 06520, USA
| | - Jian Huang
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
| | - Shuangge Ma
- Department of Biostatistics, School of Public Health, Yale University, 60 College Street, New Haven, CT 06520, USA
| |
Collapse
|
18
|
Taguchi YH. Tensor decomposition-based unsupervised feature extraction applied to matrix products for multi-view data processing. PLoS One 2017; 12:e0183933. [PMID: 28841719 PMCID: PMC5571984 DOI: 10.1371/journal.pone.0183933] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Accepted: 08/04/2017] [Indexed: 01/17/2023] Open
Abstract
In the current era of big data, the amount of data available is continuously increasing. Both the number and types of samples, or features, are on the rise. The mixing of distinct features often makes interpretation more difficult. However, separate analysis of individual types requires subsequent integration. A tensor is a useful framework to deal with distinct types of features in an integrated manner without mixing them. On the other hand, tensor data is not easy to obtain since it requires the measurements of huge numbers of combinations of distinct features; if there are m kinds of features, each of which has N dimensions, the number of measurements needed are as many as Nm, which is often too large to measure. In this paper, I propose a new method where a tensor is generated from individual features without combinatorial measurements, and the generated tensor was decomposed back to matrices, by which unsupervised feature extraction was performed. In order to demonstrate the usefulness of the proposed strategy, it was applied to synthetic data, as well as three omics datasets. It outperformed other matrix-based methodologies.
Collapse
Affiliation(s)
- Y-h. Taguchi
- Department of Physics, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan
- * E-mail:
| |
Collapse
|
19
|
Luo Y, Wang F, Szolovits P. Tensor factorization toward precision medicine. Brief Bioinform 2017; 18:511-514. [PMID: 26994614 PMCID: PMC6078180 DOI: 10.1093/bib/bbw026] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Revised: 01/08/2016] [Indexed: 11/13/2022] Open
Abstract
Precision medicine initiatives come amid the rapid growth in quantity and variety of biomedical data, which exceeds the capacity of matrix-oriented data representations and many current analysis algorithms. Tensor factorizations extend the matrix view to multiple modalities and support dimensionality reduction methods that identify latent groups of data for meaningful summarization of both features and instances. In this opinion article, we analyze the modest literature on applying tensor factorization to various biomedical fields including genotyping and phenotyping. Based on the cited work including work of our own, we suggest that tensor applications could serve as an effective tool to enable frequent updating of medical knowledge based on the continually growing scientific and clinical evidence. We encourage extensive experimental studies to tackle challenges including design choice of factorizations, integrating temporality and algorithm scalability.
Collapse
|
20
|
Integrative clustering of multi-level 'omic data based on non-negative matrix factorization algorithm. PLoS One 2017; 12:e0176278. [PMID: 28459819 PMCID: PMC5411077 DOI: 10.1371/journal.pone.0176278] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 04/07/2017] [Indexed: 11/30/2022] Open
Abstract
Integrative analyses of high-throughput ‘omic data, such as DNA methylation, DNA copy number alteration, mRNA and protein expression levels, have created unprecedented opportunities to understand the molecular basis of human disease. In particular, integrative analyses have been the cornerstone in the study of cancer to determine molecular subtypes within a given cancer. As malignant tumors with similar morphological characteristics have been shown to exhibit entirely different molecular profiles, there has been significant interest in using multiple ‘omic data for the identification of novel molecular subtypes of disease, which could impact treatment decisions. Therefore, we have developed intNMF, an integrative approach for disease subtype classification based on non-negative matrix factorization. The proposed approach carries out integrative clustering of multiple high dimensional molecular data in a single comprehensive analysis utilizing the information across multiple biological levels assessed on the same individual. As intNMF does not assume any distributional form for the data, it has obvious advantages over other model based clustering methods which require specific distributional assumptions. Application of intNMF is illustrated using both simulated and real data from The Cancer Genome Atlas (TCGA).
Collapse
|
21
|
|
22
|
Aiello KA, Alter O. Platform-Independent Genome-Wide Pattern of DNA Copy-Number Alterations Predicting Astrocytoma Survival and Response to Treatment Revealed by the GSVD Formulated as a Comparative Spectral Decomposition. PLoS One 2016; 11:e0164546. [PMID: 27798635 PMCID: PMC5087864 DOI: 10.1371/journal.pone.0164546] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 09/27/2016] [Indexed: 01/07/2023] Open
Abstract
We use the generalized singular value decomposition (GSVD), formulated as a comparative spectral decomposition, to model patient-matched grades III and II, i.e., lower-grade astrocytoma (LGA) brain tumor and normal DNA copy-number profiles. A genome-wide tumor-exclusive pattern of DNA copy-number alterations (CNAs) is revealed, encompassed in that previously uncovered in glioblastoma (GBM), i.e., grade IV astrocytoma, where GBM-specific CNAs encode for enhanced opportunities for transformation and proliferation via growth and developmental signaling pathways in GBM relative to LGA. The GSVD separates the LGA pattern from other sources of biological and experimental variation, common to both, or exclusive to one of the tumor and normal datasets. We find, first, and computationally validate, that the LGA pattern is correlated with a patient's survival and response to treatment. Second, the GBM pattern identifies among the LGA patients a subtype, statistically indistinguishable from that among the GBM patients, where the CNA genotype is correlated with an approximately one-year survival phenotype. Third, cross-platform classification of the Affymetrix-measured LGA and GBM profiles by using the Agilent-derived GBM pattern shows that the GBM pattern is a platform-independent predictor of astrocytoma outcome. Statistically, the pattern is a better predictor (corresponding to greater median survival time difference, proportional hazard ratio, and concordance index) than the patient's age and the tumor's grade, which are the best indicators of astrocytoma currently in clinical use, and laboratory tests. The pattern is also statistically independent of these indicators, and, combined with either one, is an even better predictor of astrocytoma outcome. Recurring DNA CNAs have been observed in astrocytoma tumors' genomes for decades, however, copy-number subtypes that are predictive of patients' outcomes were not identified before. This is despite the growing number of datasets recording different aspects of the disease, and due to an existing fundamental need for mathematical frameworks that can simultaneously find similarities and dissimilarities across the datasets. This illustrates the ability of comparative spectral decompositions to find what other methods miss.
Collapse
Affiliation(s)
- Katherine A. Aiello
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
| | - Orly Alter
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
- * E-mail:
| |
Collapse
|
23
|
Wang Y, Zhao W, Zhou X. Matrix factorization reveals aging-specific co-expression gene modules in the fat and muscle tissues in nonhuman primates. Sci Rep 2016; 6:34335. [PMID: 27703186 PMCID: PMC5050522 DOI: 10.1038/srep34335] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 09/12/2016] [Indexed: 11/29/2022] Open
Abstract
Accurate identification of coherent transcriptional modules (subnetworks) in adipose and muscle tissues is important for revealing the related mechanisms and co-regulated pathways involved in the development of aging-related diseases. Here, we proposed a systematically computational approach, called ICEGM, to Identify the Co-Expression Gene Modules through a novel mathematical framework of Higher-Order Generalized Singular Value Decomposition (HO-GSVD). ICEGM was applied on the adipose, and heart and skeletal muscle tissues in old and young female African green vervet monkeys. The genes associated with the development of inflammation, cardiovascular and skeletal disorder diseases, and cancer were revealed by the ICEGM. Meanwhile, genes in the ICEGM modules were also enriched in the adipocytes, smooth muscle cells, cardiac myocytes, and immune cells. Comprehensive disease annotation and canonical pathway analysis indicated that immune cells, adipocytes, cardiomyocytes, and smooth muscle cells played a synergistic role in cardiac and physical functions in the aged monkeys by regulation of the biological processes associated with metabolism, inflammation, and atherosclerosis. In conclusion, the ICEGM provides an efficiently systematic framework for decoding the co-expression gene modules in multiple tissues. Analysis of genes in the ICEGM module yielded important insights on the cooperative role of multiple tissues in the development of diseases.
Collapse
Affiliation(s)
- Yongcui Wang
- Center for Bioinformatics & Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston Salem, NC, USA
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China
| | - Weiling Zhao
- Center for Bioinformatics & Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston Salem, NC, USA
| | - Xiaobo Zhou
- Center for Bioinformatics & Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston Salem, NC, USA
| |
Collapse
|
24
|
Zhao H, Wang DD, Chen L, Liu X, Yan H. Identifying Multi-Dimensional Co-Clusters in Tensors Based on Hyperplane Detection in Singular Vector Spaces. PLoS One 2016; 11:e0162293. [PMID: 27598575 PMCID: PMC5012624 DOI: 10.1371/journal.pone.0162293] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2016] [Accepted: 08/19/2016] [Indexed: 11/18/2022] Open
Abstract
Co-clustering, often called biclustering for two-dimensional data, has found many applications, such as gene expression data analysis and text mining. Nowadays, a variety of multi-dimensional arrays (tensors) frequently occur in data analysis tasks, and co-clustering techniques play a key role in dealing with such datasets. Co-clusters represent coherent patterns and exhibit important properties along all the modes. Development of robust co-clustering techniques is important for the detection and analysis of these patterns. In this paper, a co-clustering method based on hyperplane detection in singular vector spaces (HDSVS) is proposed. Specifically in this method, higher-order singular value decomposition (HOSVD) transforms a tensor into a core part and a singular vector matrix along each mode, whose row vectors can be clustered by a linear grouping algorithm (LGA). Meanwhile, hyperplanar patterns are extracted and successfully supported the identification of multi-dimensional co-clusters. To validate HDSVS, a number of synthetic and biological tensors were adopted. The synthetic tensors attested a favorable performance of this algorithm on noisy or overlapped data. Experiments with gene expression data and lineage data of embryonic cells further verified the reliability of HDSVS to practical problems. Moreover, the detected co-clusters are well consistent with important genetic pathways and gene ontology annotations. Finally, a series of comparisons between HDSVS and state-of-the-art methods on synthetic tensors and a yeast gene expression tensor were implemented, verifying the robust and stable performance of our method.
Collapse
Affiliation(s)
- Hongya Zhao
- Industrial Center, Shenzhen Polytechnic, Shenzhen, China
| | - Debby D. Wang
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
- Caritas Institute of Higher Education, New Territories, Hong Kong
| | - Long Chen
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
- * E-mail:
| | - Xinyu Liu
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
| | - Hong Yan
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
| |
Collapse
|
25
|
Meng C, Zeleznik OA, Thallinger GG, Kuster B, Gholami AM, Culhane AC. Dimension reduction techniques for the integrative analysis of multi-omics data. Brief Bioinform 2016; 17:628-41. [PMID: 26969681 PMCID: PMC4945831 DOI: 10.1093/bib/bbv108] [Citation(s) in RCA: 193] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Revised: 10/26/2015] [Indexed: 01/16/2023] Open
Abstract
State-of-the-art next-generation sequencing, transcriptomics, proteomics and other high-throughput 'omics' technologies enable the efficient generation of large experimental data sets. These data may yield unprecedented knowledge about molecular pathways in cells and their role in disease. Dimension reduction approaches have been widely used in exploratory analysis of single omics data sets. This review will focus on dimension reduction approaches for simultaneous exploratory analyses of multiple data sets. These methods extract the linear relationships that best explain the correlated structure across data sets, the variability both within and between variables (or observations) and may highlight data issues such as batch effects or outliers. We explore dimension reduction techniques as one of the emerging approaches for data integration, and how these can be applied to increase our understanding of biological systems in normal physiological function and disease.
Collapse
|
26
|
van der Kloet FM, Sebastián-León P, Conesa A, Smilde AK, Westerhuis JA. Separating common from distinctive variation. BMC Bioinformatics 2016; 17 Suppl 5:195. [PMID: 27294690 PMCID: PMC4905617 DOI: 10.1186/s12859-016-1037-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Joint and individual variation explained (JIVE), distinct and common simultaneous component analysis (DISCO) and O2-PLS, a two-block (X-Y) latent variable regression method with an integral OSC filter can all be used for the integrated analysis of multiple data sets and decompose them in three terms: a low(er)-rank approximation capturing common variation across data sets, low(er)-rank approximations for structured variation distinctive for each data set, and residual noise. In this paper these three methods are compared with respect to their mathematical properties and their respective ways of defining common and distinctive variation. Results The methods are all applied on simulated data and mRNA and miRNA data-sets from GlioBlastoma Multiform (GBM) brain tumors to examine their overlap and differences. When the common variation is abundant, all methods are able to find the correct solution. With real data however, complexities in the data are treated differently by the three methods. Conclusions All three methods have their own approach to estimate common and distinctive variation with their specific strength and weaknesses. Due to their orthogonality properties and their used algorithms their view on the data is slightly different. By assuming orthogonality between common and distinctive, true natural or biological phenomena that may not be orthogonal at all might be misinterpreted. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1037-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Frans M van der Kloet
- Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, 1098, XH, Amsterdam, The Netherlands
| | | | - Ana Conesa
- Computational Genomics Program, Centro de Investigaciones Príncipe Felipe, Valencia, Spain
| | - Age K Smilde
- Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, 1098, XH, Amsterdam, The Netherlands
| | - Johan A Westerhuis
- Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, 1098, XH, Amsterdam, The Netherlands.
| |
Collapse
|
27
|
McManus J, Cheng Z, Vogel C. Next-generation analysis of gene expression regulation--comparing the roles of synthesis and degradation. MOLECULAR BIOSYSTEMS 2015; 11:2680-9. [PMID: 26259698 PMCID: PMC4573910 DOI: 10.1039/c5mb00310e] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Technological advances now enable routine measurement of mRNA and protein abundances, and estimates of their rates of synthesis and degradation that inform on their values and the degree of change in response to stimuli. Importantly, more and more data on time-series experiments are emerging, e.g. of cells responding to stress, enabling first insights into a new dimension of gene expression regulation - its dynamics and how it allows for very different response signals across genes. This review discusses recently published methods and datasets, their impact on what we now know about the relationships between concentrations and synthesis rates of mRNAs and proteins in yeast and mammalian cells, their evolution, and new hypotheses on translation regulatory mechanisms generated by approaches that involve ribosome footprinting.
Collapse
Affiliation(s)
- Joel McManus
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA.
| | | | | |
Collapse
|
28
|
Adali T, Levin-Schwartz Y, Calhoun VD. Multi-modal data fusion using source separation: Two effective models based on ICA and IVA and their properties. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2015; 103:1478-93. [PMID: 26525830 PMCID: PMC4624202 DOI: 10.1109/jproc.2015.2461624] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Fusion of information from multiple sets of data in order to extract a set of features that are most useful and relevant for the given task is inherent to many problems we deal with today. Since, usually, very little is known about the actual interaction among the datasets, it is highly desirable to minimize the underlying assumptions. This has been the main reason for the growing importance of data-driven methods, and in particular of independent component analysis (ICA) as it provides useful decompositions with a simple generative model and using only the assumption of statistical independence. A recent extension of ICA, independent vector analysis (IVA) generalizes ICA to multiple datasets by exploiting the statistical dependence across the datasets, and hence, as we discuss in this paper, provides an attractive solution to fusion of data from multiple datasets along with ICA. In this paper, we focus on two multivariate solutions for multi-modal data fusion that let multiple modalities fully interact for the estimation of underlying features that jointly report on all modalities. One solution is the Joint ICA model that has found wide application in medical imaging, and the second one is the the Transposed IVA model introduced here as a generalization of an approach based on multi-set canonical correlation analysis. In the discussion, we emphasize the role of diversity in the decompositions achieved by these two models, present their properties and implementation details to enable the user make informed decisions on the selection of a model along with its associated parameters. Discussions are supported by simulation results to help highlight the main issues in the implementation of these methods.
Collapse
Affiliation(s)
- Tülay Adali
- Department of CSEE, University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| | - Yuri Levin-Schwartz
- Department of CSEE, University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| | - Vince D. Calhoun
- University of New Mexico and the Mind Research Network, Albuquerque, NM 87106, USA
| |
Collapse
|
29
|
Sankaranarayanan P, Schomay TE, Aiello KA, Alter O. Tensor GSVD of patient- and platform-matched tumor and normal DNA copy-number profiles uncovers chromosome arm-wide patterns of tumor-exclusive platform-consistent alterations encoding for cell transformation and predicting ovarian cancer survival. PLoS One 2015; 10:e0121396. [PMID: 25875127 PMCID: PMC4398562 DOI: 10.1371/journal.pone.0121396] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 01/31/2015] [Indexed: 11/28/2022] Open
Abstract
The number of large-scale high-dimensional datasets recording different aspects of a single disease is growing, accompanied by a need for frameworks that can create one coherent model from multiple tensors of matched columns, e.g., patients and platforms, but independent rows, e.g., probes. We define and prove the mathematical properties of a novel tensor generalized singular value decomposition (GSVD), which can simultaneously find the similarities and dissimilarities, i.e., patterns of varying relative significance, between any two such tensors. We demonstrate the tensor GSVD in comparative modeling of patient- and platform-matched but probe-independent ovarian serous cystadenocarcinoma (OV) tumor, mostly high-grade, and normal DNA copy-number profiles, across each chromosome arm, and combination of two arms, separately. The modeling uncovers previously unrecognized patterns of tumor-exclusive platform-consistent co-occurring copy-number alterations (CNAs). We find, first, and validate that each of the patterns across only 7p and Xq, and the combination of 6p+12p, is correlated with a patient’s prognosis, is independent of the tumor’s stage, the best predictor of OV survival to date, and together with stage makes a better predictor than stage alone. Second, these patterns include most known OV-associated CNAs that map to these chromosome arms, as well as several previously unreported, yet frequent focal CNAs. Third, differential mRNA, microRNA, and protein expression consistently map to the DNA CNAs. A coherent picture emerges for each pattern, suggesting roles for the CNAs in OV pathogenesis and personalized therapy. In 6p+12p, deletion of the p21-encoding CDKN1A and p38-encoding MAPK14 and amplification of RAD51AP1 and KRAS encode for human cell transformation, and are correlated with a cell’s immortality, and a patient’s shorter survival time. In 7p, RPA3 deletion and POLD2 amplification are correlated with DNA stability, and a longer survival. In Xq, PABPC5 deletion and BCAP31 amplification are correlated with a cellular immune response, and a longer survival.
Collapse
MESH Headings
- Carcinoma, Ovarian Epithelial
- Cell Transformation, Neoplastic/genetics
- Chromosome Mapping
- Chromosomes/genetics
- Cystadenocarcinoma, Serous/diagnosis
- Cystadenocarcinoma, Serous/genetics
- Cystadenocarcinoma, Serous/pathology
- DNA Copy Number Variations/genetics
- Female
- Gene Expression Regulation, Neoplastic
- Humans
- MicroRNAs/biosynthesis
- Models, Theoretical
- Mutation
- Neoplasm Proteins/biosynthesis
- Neoplasms, Glandular and Epithelial/diagnosis
- Neoplasms, Glandular and Epithelial/genetics
- Neoplasms, Glandular and Epithelial/pathology
- Ovarian Neoplasms/diagnosis
- Ovarian Neoplasms/genetics
- Ovarian Neoplasms/pathology
- Prognosis
- RNA, Messenger/biosynthesis
- RNA, Messenger/genetics
- Survival Analysis
Collapse
Affiliation(s)
- Preethi Sankaranarayanan
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
| | - Theodore E. Schomay
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
| | - Katherine A. Aiello
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
| | - Orly Alter
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
- Department of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
- * E-mail:
| |
Collapse
|
30
|
Etges WJ, Trotter MV, de Oliveira CC, Rajpurohit S, Gibbs AG, Tuljapurkar S. Deciphering life history transcriptomes in different environments. Mol Ecol 2014; 24:151-79. [PMID: 25442828 DOI: 10.1111/mec.13017] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2014] [Revised: 10/27/2014] [Accepted: 11/22/2014] [Indexed: 12/25/2022]
Abstract
We compared whole transcriptome variation in six pre-adult stages and seven adult female ages in two populations of cactophilic Drosophila mojavensis reared on two host plants to understand how differences in gene expression influence standing life history variation. We used singular value decomposition (SVD) to identify dominant trajectories of life cycle gene expression variation, performed pairwise comparisons of stage and age differences in gene expression across the life cycle, identified when genes exhibited maximum levels of life cycle gene expression, and assessed population and host cactus effects on gene expression. Life cycle SVD analysis returned four significant components of transcriptional variation, revealing functional enrichment of genes responsible for growth, metabolic function, sensory perception, neural function, translation and ageing. Host cactus effects on female gene expression revealed population- and stage-specific differences, including significant host plant effects on larval metabolism and development, as well as adult neurotransmitter binding and courtship behaviour gene expression levels. In 3- to 6-day-old virgin females, significant upregulation of genes associated with meiosis and oogenesis was accompanied by downregulation of genes associated with somatic maintenance, evidence for a life history trade-off. The transcriptome of D. mojavensis reared in natural environments throughout its life cycle revealed core developmental transitions and genome-wide influences on life history variation in natural populations.
Collapse
Affiliation(s)
- William J Etges
- Program in Ecology and Evolutionary Biology, Dept. of Biological Sciences, University of Arkansas, Fayetteville, AR, 72701, USA
| | | | | | | | | | | |
Collapse
|
31
|
Acar E, Papalexakis EE, Gürdeniz G, Rasmussen MA, Lawaetz AJ, Nilsson M, Bro R. Structure-revealing data fusion. BMC Bioinformatics 2014; 15:239. [PMID: 25015427 PMCID: PMC4117975 DOI: 10.1186/1471-2105-15-239] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2013] [Accepted: 06/26/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analysis of data from multiple sources has the potential to enhance knowledge discovery by capturing underlying structures, which are, otherwise, difficult to extract. Fusing data from multiple sources has already proved useful in many applications in social network analysis, signal processing and bioinformatics. However, data fusion is challenging since data from multiple sources are often (i) heterogeneous (i.e., in the form of higher-order tensors and matrices), (ii) incomplete, and (iii) have both shared and unshared components. In order to address these challenges, in this paper, we introduce a novel unsupervised data fusion model based on joint factorization of matrices and higher-order tensors. RESULTS While the traditional formulation of coupled matrix and tensor factorizations modeling only shared factors fails to capture the underlying structures in the presence of both shared and unshared factors, the proposed data fusion model has the potential to automatically reveal shared and unshared components through modeling constraints. Using numerical experiments, we demonstrate the effectiveness of the proposed approach in terms of identifying shared and unshared components. Furthermore, we measure a set of mixtures with known chemical composition using both LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) and demonstrate that the structure-revealing data fusion model can (i) successfully capture the chemicals in the mixtures and extract the relative concentrations of the chemicals accurately, (ii) provide promising results in terms of identifying shared and unshared chemicals, and (iii) reveal the relevant patterns in LC-MS by coupling with the diffusion NMR data. CONCLUSIONS We have proposed a structure-revealing data fusion model that can jointly analyze heterogeneous, incomplete data sets with shared and unshared components and demonstrated its promising performance as well as potential limitations on both simulated and real data.
Collapse
Affiliation(s)
- Evrim Acar
- Department of Food Science, Faculty of Science, University of Copenhagen, Frederiksberg C, Denmark.
| | | | | | | | | | | | | |
Collapse
|
32
|
Samuels BA, Leonardo ED, Dranovsky A, Williams A, Wong E, Nesbitt AMI, McCurdy RD, Hen R, Alter M. Global state measures of the dentate gyrus gene expression system predict antidepressant-sensitive behaviors. PLoS One 2014; 9:e85136. [PMID: 24465494 PMCID: PMC3894967 DOI: 10.1371/journal.pone.0085136] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2013] [Accepted: 11/23/2013] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Selective serotonin reuptake inhibitors (SSRIs) such as fluoxetine are the most common form of medication treatment for major depression. However, approximately 50% of depressed patients fail to achieve an effective treatment response. Understanding how gene expression systems respond to treatments may be critical for understanding antidepressant resistance. METHODS We take a novel approach to this problem by demonstrating that the gene expression system of the dentate gyrus responds to fluoxetine (FLX), a commonly used antidepressant medication, in a stereotyped-manner involving changes in the expression levels of thousands of genes. The aggregate behavior of this large-scale systemic response was quantified with principal components analysis (PCA) yielding a single quantitative measure of the global gene expression system state. RESULTS Quantitative measures of system state were highly correlated with variability in levels of antidepressant-sensitive behaviors in a mouse model of depression treated with fluoxetine. Analysis of dorsal and ventral dentate samples in the same mice indicated that system state co-varied across these regions despite their reported functional differences. Aggregate measures of gene expression system state were very robust and remained unchanged when different microarray data processing algorithms were used and even when completely different sets of gene expression levels were used for their calculation. CONCLUSIONS System state measures provide a robust method to quantify and relate global gene expression system state variability to behavior and treatment. State variability also suggests that the diversity of reported changes in gene expression levels in response to treatments such as fluoxetine may represent different perspectives on unified but noisy global gene expression system state level responses. Studying regulation of gene expression systems at the state level may be useful in guiding new approaches to augmentation of traditional antidepressant treatments.
Collapse
Affiliation(s)
- Benjamin A. Samuels
- Departments of Psychiatry and Neuroscience, Columbia University, New York, New York, United States of America
| | - E. David Leonardo
- Departments of Psychiatry and Neuroscience, Columbia University, New York, New York, United States of America
| | - Alex Dranovsky
- Departments of Psychiatry and Neuroscience, Columbia University, New York, New York, United States of America
| | - Amanda Williams
- AstraZeneca Pharmaceuticals, CNS Discovery, Wilmington, Delaware, United States of America
| | - Erik Wong
- AstraZeneca Pharmaceuticals, CNS Discovery, Wilmington, Delaware, United States of America
| | - Addie May I. Nesbitt
- Center for Neurobiology and Behavior, Department of Psychiatry, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Richard D. McCurdy
- Center for Neurobiology and Behavior, Department of Psychiatry, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Rene Hen
- Departments of Psychiatry and Neuroscience, Columbia University, New York, New York, United States of America
- * E-mail: (MA); (RH)
| | - Mark Alter
- Center for Neurobiology and Behavior, Department of Psychiatry, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- * E-mail: (MA); (RH)
| |
Collapse
|
33
|
Xiao X, Moreno-Moral A, Rotival M, Bottolo L, Petretto E. Multi-tissue analysis of co-expression networks by higher-order generalized singular value decomposition identifies functionally coherent transcriptional modules. PLoS Genet 2014; 10:e1004006. [PMID: 24391511 PMCID: PMC3879165 DOI: 10.1371/journal.pgen.1004006] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2013] [Accepted: 10/22/2013] [Indexed: 12/27/2022] Open
Abstract
Recent high-throughput efforts such as ENCODE have generated a large body of genome-scale transcriptional data in multiple conditions (e.g., cell-types and disease states). Leveraging these data is especially important for network-based approaches to human disease, for instance to identify coherent transcriptional modules (subnetworks) that can inform functional disease mechanisms and pathological pathways. Yet, genome-scale network analysis across conditions is significantly hampered by the paucity of robust and computationally-efficient methods. Building on the Higher-Order Generalized Singular Value Decomposition, we introduce a new algorithmic approach for efficient, parameter-free and reproducible identification of network-modules simultaneously across multiple conditions. Our method can accommodate weighted (and unweighted) networks of any size and can similarly use co-expression or raw gene expression input data, without hinging upon the definition and stability of the correlation used to assess gene co-expression. In simulation studies, we demonstrated distinctive advantages of our method over existing methods, which was able to recover accurately both common and condition-specific network-modules without entailing ad-hoc input parameters as required by other approaches. We applied our method to genome-scale and multi-tissue transcriptomic datasets from rats (microarray-based) and humans (mRNA-sequencing-based) and identified several common and tissue-specific subnetworks with functional significance, which were not detected by other methods. In humans we recapitulated the crosstalk between cell-cycle progression and cell-extracellular matrix interactions processes in ventricular zones during neocortex expansion and further, we uncovered pathways related to development of later cognitive functions in the cortical plate of the developing brain which were previously unappreciated. Analyses of seven rat tissues identified a multi-tissue subnetwork of co-expressed heat shock protein (Hsp) and cardiomyopathy genes (Bag3, Cryab, Kras, Emd, Plec), which was significantly replicated using separate failing heart and liver gene expression datasets in humans, thus revealing a conserved functional role for Hsp genes in cardiovascular disease.
Collapse
Affiliation(s)
- Xiaolin Xiao
- Medical Research Council (MRC) Clinical Sciences Centre, Faculty of Medicine, Imperial College, London, United Kingdom
| | - Aida Moreno-Moral
- Medical Research Council (MRC) Clinical Sciences Centre, Faculty of Medicine, Imperial College, London, United Kingdom
| | - Maxime Rotival
- Medical Research Council (MRC) Clinical Sciences Centre, Faculty of Medicine, Imperial College, London, United Kingdom
| | - Leonardo Bottolo
- Department of Mathematics, Imperial College, London, United Kingdom
| | - Enrico Petretto
- Medical Research Council (MRC) Clinical Sciences Centre, Faculty of Medicine, Imperial College, London, United Kingdom
- * E-mail:
| |
Collapse
|
34
|
Rotival M, Petretto E. Leveraging gene co-expression networks to pinpoint the regulation of complex traits and disease, with a focus on cardiovascular traits. Brief Funct Genomics 2013; 13:66-78. [PMID: 23960099 DOI: 10.1093/bfgp/elt030] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Over the past decade, the number of genome-scale transcriptional datasets in publicly available databases has climbed to nearly one million, providing an unprecedented opportunity for extensive analyses of gene co-expression networks. In systems-genetic studies of complex diseases researchers increasingly focus on groups of highly interconnected genes within complex transcriptional networks (referred to as clusters, modules or subnetworks) to uncover specific molecular processes that can inform functional disease mechanisms and pathological pathways. Here, we outline the basic paradigms underlying gene co-expression network analysis and critically review the most commonly used computational methods. Finally, we discuss specific applications of network-based approaches to the study of cardiovascular traits, which highlight the power of integrated analyses of networks, genetic and gene-regulation data to elucidate the complex mechanisms underlying cardiovascular disease.
Collapse
Affiliation(s)
- Maxime Rotival
- MRC-Clinical Sciences Centre, Hammersmith Hospital Campus, Imperial College Centre for Translational and Experimental Medicine (ICTEM Building), Du Cane Road, London, W12 0NN UK. Tel.: + 44-020-8383-1468; Fax: +44-208-383-8577;
| | | |
Collapse
|
35
|
Abstract
Systems biology approaches are required to advance our understanding of virus–host interactions, how these interactions cause disease and, ultimately, how to improve diagnostics, therapeutics and vaccines. Over the past decade, the field of systems virology has evolved from using first-generation microarrays to the integration of multidimensional data sets. This has resulted in significant findings, including the identification of gene expression signatures that are predictive of viral pathogenesis and vaccine efficacy, insights into how viruses disrupt cellular metabolism, and the mapping of virus–host interactomes. To fulfil its initial promise of revolutionizing our understanding of virus–host interactions, the field of systems virology must move beyond just the listing of molecules that are differentially expressed following viral infection; it must now look to define the relationships between key host molecules and their interactions with viral components. Several key computational challenges must be addressed in order to move into this new phase of systems virology, including consideration of nonlinear relationships such as the dynamics of the system, the integration of multidimensional data sets and the identification of causal relationships. Virologists, computer scientists and mathematicians must combine their skills and expertise in applying systems approaches to untangle the complex question of how viruses kill.
Katze and colleagues provide an overview of the evolution of systems virology and the insights obtained from using such methodologies to study virus–host interactions. Combining systems, mathematical and computational approaches with traditional virology research will offer a better understanding of how viruses cause disease and will help in the development of therapeutics. High-throughput molecular profiling and computational biology are changing the face of virology, providing a new appreciation of the importance of the host in viral pathogenesis and offering unprecedented opportunities for better diagnostics, therapeutics and vaccines. Here, we provide a snapshot of the evolution of systems virology, from global gene expression profiling and signatures of disease outcome, to geometry-based computational methods that promise to yield novel therapeutic targets, personalized medicine and a deeper understanding of how viruses cause disease. To realize these goals, pipettes and Petri dishes need to join forces with the powers of mathematics and computational biology.
Collapse
|
36
|
Inferring gene regulatory networks by singular value decomposition and gravitation field algorithm. PLoS One 2012; 7:e51141. [PMID: 23226565 PMCID: PMC3514269 DOI: 10.1371/journal.pone.0051141] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2012] [Accepted: 10/29/2012] [Indexed: 11/20/2022] Open
Abstract
Reconstruction of gene regulatory networks (GRNs) is of utmost interest and has become a challenge computational problem in system biology. However, every existing inference algorithm from gene expression profiles has its own advantages and disadvantages. In particular, the effectiveness and efficiency of every previous algorithm is not high enough. In this work, we proposed a novel inference algorithm from gene expression data based on differential equation model. In this algorithm, two methods were included for inferring GRNs. Before reconstructing GRNs, singular value decomposition method was used to decompose gene expression data, determine the algorithm solution space, and get all candidate solutions of GRNs. In these generated family of candidate solutions, gravitation field algorithm was modified to infer GRNs, used to optimize the criteria of differential equation model, and search the best network structure result. The proposed algorithm is validated on both the simulated scale-free network and real benchmark gene regulatory network in networks database. Both the Bayesian method and the traditional differential equation model were also used to infer GRNs, and the results were used to compare with the proposed algorithm in our work. And genetic algorithm and simulated annealing were also used to evaluate gravitation field algorithm. The cross-validation results confirmed the effectiveness of our algorithm, which outperforms significantly other previous algorithms.
Collapse
|
37
|
McDonald M, Higham DJ, Vass JK. Spectral algorithms for heterogeneous biological networks. Brief Funct Genomics 2012; 11:457-68. [PMID: 23117863 DOI: 10.1093/bfgp/els040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Spectral methods, which use information relating to eigenvectors, singular vectors and generalized singular vectors, help us to visualize and summarize sets of pairwise interactions. In this work, we motivate and discuss the use of spectral methods by taking a matrix computation view and applying concepts from applied linear algebra. We show that this unified approach is sufficiently flexible to allow multiple sources of network information to be combined. We illustrate the methods on microarray data arising from a large population-based study in human adipose tissue, combined with related information concerning metabolic pathways.
Collapse
Affiliation(s)
- Martin McDonald
- Department of Mathematics and Statistics, University of Strathclyde, Glasgow G1 1XH, UK
| | | | | |
Collapse
|
38
|
Acar E, Gurdeniz G, Rasmussen MA, Rago D, Dragsted LO, Bro R. Coupled Matrix Factorization with Sparse Factors to Identify Potential Biomarkers in Metabolomics. ACTA ACUST UNITED AC 2012. [DOI: 10.4018/jkdb.2012070102] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Metabolomics focuses on the detection of chemical substances in biological fluids such as urine and blood using a number of analytical techniques including Nuclear Magnetic Resonance (NMR) spectroscopy and Liquid Chromatography-Mass Spectrometry (LC-MS). Among the major challenges in analysis of metabolomics data are (i) joint analysis of data from multiple platforms, and (ii) capturing easily interpretable underlying patterns, which could be further utilized for biomarker discovery. In order to address these challenges, the authors formulate joint analysis of data from multiple platforms as a coupled matrix factorization problem with sparsity penalties on the factor matrices. They developed an all-at-once optimization algorithm, called CMF-SPOPT (Coupled Matrix Factorization with SParse OPTimization), which is a gradient-based optimization approach solving for all factor matrices simultaneously. Using numerical experiments on simulated data, the authors demonstrate that CMF-SPOPT can capture the underlying sparse patterns in data. Furthermore, on a real data set of blood samples collected from a group of rats, the authors use the proposed approach to jointly analyze metabolomics data sets and identify potential biomarkers for apple intake. Advantages and limitations of the proposed approach are also discussed using illustrative examples on metabolomics data sets.
Collapse
Affiliation(s)
- Evrim Acar
- Department of Food Science, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Gozde Gurdeniz
- Department of Human Nutrition, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Morten A. Rasmussen
- Department of Food Science, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Daniela Rago
- Department of Human Nutrition, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Lars O. Dragsted
- Department of Human Nutrition, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Rasmus Bro
- Department of Food Science, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
39
|
Van Deun K, Van Mechelen I, Thorrez L, Schouteden M, De Moor B, van der Werf MJ, De Lathauwer L, Smilde AK, Kiers HAL. DISCO-SCA and properly applied GSVD as swinging methods to find common and distinctive processes. PLoS One 2012; 7:e37840. [PMID: 22693578 PMCID: PMC3365060 DOI: 10.1371/journal.pone.0037840] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2011] [Accepted: 04/29/2012] [Indexed: 11/18/2022] Open
Abstract
Background In systems biology it is common to obtain for the same set of biological entities information from multiple sources. Examples include expression data for the same set of orthologous genes screened in different organisms and data on the same set of culture samples obtained with different high-throughput techniques. A major challenge is to find the important biological processes underlying the data and to disentangle therein processes common to all data sources and processes distinctive for a specific source. Recently, two promising simultaneous data integration methods have been proposed to attain this goal, namely generalized singular value decomposition (GSVD) and simultaneous component analysis with rotation to common and distinctive components (DISCO-SCA). Results Both theoretical analyses and applications to biologically relevant data show that: (1) straightforward applications of GSVD yield unsatisfactory results, (2) DISCO-SCA performs well, (3) provided proper pre-processing and algorithmic adaptations, GSVD reaches a performance level similar to that of DISCO-SCA, and (4) DISCO-SCA is directly generalizable to more than two data sources. The biological relevance of DISCO-SCA is illustrated with two applications. First, in a setting of comparative genomics, it is shown that DISCO-SCA recovers a common theme of cell cycle progression and a yeast-specific response to pheromones. The biological annotation was obtained by applying Gene Set Enrichment Analysis in an appropriate way. Second, in an application of DISCO-SCA to metabolomics data for Escherichia coli obtained with two different chemical analysis platforms, it is illustrated that the metabolites involved in some of the biological processes underlying the data are detected by one of the two platforms only; therefore, platforms for microbial metabolomics should be tailored to the biological question. Conclusions Both DISCO-SCA and properly applied GSVD are promising integrative methods for finding common and distinctive processes in multisource data. Open source code for both methods is provided.
Collapse
Affiliation(s)
- Katrijn Van Deun
- Department of Psychology, Katholieke Universiteit Leuven, Leuven, Belgium.
| | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Lee CH, Alpert BO, Sankaranarayanan P, Alter O. GSVD comparison of patient-matched normal and tumor aCGH profiles reveals global copy-number alterations predicting glioblastoma multiforme survival. PLoS One 2012; 7:e30098. [PMID: 22291905 PMCID: PMC3264559 DOI: 10.1371/journal.pone.0030098] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Accepted: 12/09/2011] [Indexed: 11/18/2022] Open
Abstract
Despite recent large-scale profiling efforts, the best prognostic predictor of glioblastoma multiforme (GBM) remains the patient's age at diagnosis. We describe a global pattern of tumor-exclusive co-occurring copy-number alterations (CNAs) that is correlated, possibly coordinated with GBM patients' survival and response to chemotherapy. The pattern is revealed by GSVD comparison of patient-matched but probe-independent GBM and normal aCGH datasets from The Cancer Genome Atlas (TCGA). We find that, first, the GSVD, formulated as a framework for comparatively modeling two composite datasets, removes from the pattern copy-number variations (CNVs) that occur in the normal human genome (e.g., female-specific X chromosome amplification) and experimental variations (e.g., in tissue batch, genomic center, hybridization date and scanner), without a-priori knowledge of these variations. Second, the pattern includes most known GBM-associated changes in chromosome numbers and focal CNAs, as well as several previously unreported CNAs in >3% of the patients. These include the biochemically putative drug target, cell cycle-regulated serine/threonine kinase-encoding TLK2, the cyclin E1-encoding CCNE1, and the Rb-binding histone demethylase-encoding KDM5A. Third, the pattern provides a better prognostic predictor than the chromosome numbers or any one focal CNA that it identifies, suggesting that the GBM survival phenotype is an outcome of its global genotype. The pattern is independent of age, and combined with age, makes a better predictor than age alone. GSVD comparison of matched profiles of a larger set of TCGA patients, inclusive of the initial set, confirms the global pattern. GSVD classification of the GBM profiles of an independent set of patients validates the prognostic contribution of the pattern.
Collapse
Affiliation(s)
- Cheng H. Lee
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
| | - Benjamin O. Alpert
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
| | - Preethi Sankaranarayanan
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
| | - Orly Alter
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
- Department of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
| |
Collapse
|