1
|
Tan ZC, Meyer AS. The structure is the message: Preserving experimental context through tensor decomposition. Cell Syst 2024; 15:679-693. [PMID: 39173584 PMCID: PMC11366223 DOI: 10.1016/j.cels.2024.07.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 06/25/2024] [Accepted: 07/22/2024] [Indexed: 08/24/2024]
Abstract
Recent biological studies have been revolutionized in scale and granularity by multiplex and high-throughput assays. Profiling cell responses across several experimental parameters, such as perturbations, time, and genetic contexts, leads to richer and more generalizable findings. However, these multidimensional datasets necessitate a reevaluation of the conventional methods for their representation and analysis. Traditionally, experimental parameters are merged to flatten the data into a two-dimensional matrix, sacrificing crucial experiment context reflected by the structure. As Marshall McLuhan famously stated, "the medium is the message." In this work, we propose that the experiment structure is the medium in which subsequent analysis is performed, and the optimal choice of data representation must reflect the experiment structure. We review how tensor-structured analyses and decompositions can preserve this information. We contend that tensor methods are poised to become integral to the biomedical data sciences toolkit.
Collapse
Affiliation(s)
- Zhixin Cyrillus Tan
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA.
| | - Aaron S Meyer
- Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA; Department of Bioengineering, UCLA, Los Angeles, CA, USA; Jonsson Comprehensive Cancer Center, UCLA, Los Angeles, CA, USA; Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
2
|
Newman E, Horesh L, Avron H, Kilmer ME. Stable tensor neural networks for efficient deep learning. Front Big Data 2024; 7:1363978. [PMID: 38873283 PMCID: PMC11170703 DOI: 10.3389/fdata.2024.1363978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Accepted: 04/29/2024] [Indexed: 06/15/2024] Open
Abstract
Learning from complex, multidimensional data has become central to computational mathematics, and among the most successful high-dimensional function approximators are deep neural networks (DNNs). Training DNNs is posed as an optimization problem to learn network weights or parameters that well-approximate a mapping from input to target data. Multiway data or tensors arise naturally in myriad ways in deep learning, in particular as input data and as high-dimensional weights and features extracted by the network, with the latter often being a bottleneck in terms of speed and memory. In this work, we leverage tensor representations and processing to efficiently parameterize DNNs when learning from high-dimensional data. We propose tensor neural networks (t-NNs), a natural extension of traditional fully-connected networks, that can be trained efficiently in a reduced, yet more powerful parameter space. Our t-NNs are built upon matrix-mimetic tensor-tensor products, which retain algebraic properties of matrix multiplication while capturing high-dimensional correlations. Mimeticity enables t-NNs to inherit desirable properties of modern DNN architectures. We exemplify this by extending recent work on stable neural networks, which interpret DNNs as discretizations of differential equations, to our multidimensional framework. We provide empirical evidence of the parametric advantages of t-NNs on dimensionality reduction using autoencoders and classification using fully-connected and stable variants on benchmark imaging datasets MNIST and CIFAR-10.
Collapse
Affiliation(s)
- Elizabeth Newman
- Department of Mathematics, Emory University, Atlanta, GA, United States
| | - Lior Horesh
- Mathematics and Theoretical Computer Science, IBM TJ Watson Research Center, Yorktown, NY, United States
| | - Haim Avron
- Department of Applied Mathematics, Tel Aviv University, Tel Aviv-Yafo, Israel
| | - Misha E. Kilmer
- Department of Mathematics, Tufts University, Medford, MA, United States
| |
Collapse
|
3
|
Fujita S, Karasawa Y, Hironaka KI, Taguchi YH, Kuroda S. Features extracted using tensor decomposition reflect the biological features of the temporal patterns of human blood multimodal metabolome. PLoS One 2023; 18:e0281594. [PMID: 36791130 PMCID: PMC9931158 DOI: 10.1371/journal.pone.0281594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 01/27/2023] [Indexed: 02/16/2023] Open
Abstract
High-throughput omics technologies have enabled the profiling of entire biological systems. For the biological interpretation of such omics data, two analyses, hypothesis- and data-driven analyses including tensor decomposition, have been used. Both analyses have their own advantages and disadvantages and are mutually complementary; however, a direct comparison of these two analyses for omics data is poorly examined.We applied tensor decomposition (TD) to a dataset representing changes in the concentrations of 562 blood molecules at 14 time points in 20 healthy human subjects after ingestion of 75 g oral glucose. We characterized each molecule by individual dependence (constant or variable) and time dependence (later peak or early peak). Three of the four features extracted by TD were characterized by our previous hypothesis-driven study, indicating that TD can extract some of the same features obtained by hypothesis-driven analysis in a non-biased manner. In contrast to the years taken for our previous hypothesis-driven analysis, the data-driven analysis in this study took days, indicating that TD can extract biological features in a non-biased manner without the time-consuming process of hypothesis generation.
Collapse
Affiliation(s)
- Suguru Fujita
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - Yasuaki Karasawa
- Department of Neurosurgery, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Ken-ichi Hironaka
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - Y.-h. Taguchi
- Department of Physics, Chuo University, Tokyo, Japan
| | - Shinya Kuroda
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
- * E-mail:
| |
Collapse
|
4
|
Armingol E, Baghdassarian HM, Martino C, Perez-Lopez A, Aamodt C, Knight R, Lewis NE. Context-aware deconvolution of cell-cell communication with Tensor-cell2cell. Nat Commun 2022; 13:3665. [PMID: 35760817 PMCID: PMC9237099 DOI: 10.1038/s41467-022-31369-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 06/14/2022] [Indexed: 12/23/2022] Open
Abstract
Cell interactions determine phenotypes, and intercellular communication is shaped by cellular contexts such as disease state, organismal life stage, and tissue microenvironment. Single-cell technologies measure the molecules mediating cell-cell communication, and emerging computational tools can exploit these data to decipher intercellular communication. However, current methods either disregard cellular context or rely on simple pairwise comparisons between samples, thus limiting the ability to decipher complex cell-cell communication across multiple time points, levels of disease severity, or spatial contexts. Here we present Tensor-cell2cell, an unsupervised method using tensor decomposition, which deciphers context-driven intercellular communication by simultaneously accounting for multiple stages, states, or locations of the cells. To do so, Tensor-cell2cell uncovers context-driven patterns of communication associated with different phenotypic states and determined by unique combinations of cell types and ligand-receptor pairs. As such, Tensor-cell2cell robustly improves upon and extends the analytical capabilities of existing tools. We show Tensor-cell2cell can identify multiple modules associated with distinct communication processes (e.g., participating cell-cell and ligand-receptor pairs) linked to severities of Coronavirus Disease 2019 and to Autism Spectrum Disorder. Thus, we introduce an effective and easy-to-use strategy for understanding complex communication patterns across diverse conditions.
Collapse
Affiliation(s)
- Erick Armingol
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Hratch M Baghdassarian
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Cameron Martino
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92093, USA
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, 92093, USA
| | - Araceli Perez-Lopez
- Biomedicine Research Unit, Facultad de Estudios Superiores Iztacala, Universidad Nacional Autónoma de México, Tlalnepantla, México, 54090, México
| | - Caitlin Aamodt
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Rob Knight
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92093, USA
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Nathan E Lewis
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, 92093, USA.
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
5
|
Muñoz-Lasso DC, Mollá B, Sáenz-Gamboa JJ, Insuasty E, de la Iglesia-Vaya M, Pook MA, Pallardó FV, Palau F, Gonzalez-Cabo P. Frataxin Deficit Leads to Reduced Dynamics of Growth Cones in Dorsal Root Ganglia Neurons of Friedreich’s Ataxia YG8sR Model: A Multilinear Algebra Approach. Front Mol Neurosci 2022; 15:912780. [PMID: 35769335 PMCID: PMC9236133 DOI: 10.3389/fnmol.2022.912780] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 05/17/2022] [Indexed: 11/14/2022] Open
Abstract
Computational techniques for analyzing biological images offer a great potential to enhance our knowledge of the biological processes underlying disorders of the nervous system. Friedreich’s Ataxia (FRDA) is a rare progressive neurodegenerative inherited disorder caused by the low expression of frataxin, which is a small mitochondrial protein. In FRDA cells, the lack of frataxin promotes primarily mitochondrial dysfunction, an alteration of calcium (Ca2+) homeostasis and the destabilization of the actin cytoskeleton in the neurites and growth cones of sensory neurons. In this paper, a computational multilinear algebra approach was used to analyze the dynamics of the growth cone and its function in control and FRDA neurons. Computational approach, which includes principal component analysis and a multilinear algebra method, is used to quantify the dynamics of the growth cone (GC) morphology of sensory neurons from the dorsal root ganglia (DRG) of the YG8sR humanized murine model for FRDA. It was confirmed that the dynamics and patterns of turning were aberrant in the FRDA growth cones. In addition, our data suggest that other cellular processes dependent on functional GCs such as axonal regeneration might also be affected. Semiautomated computational approaches are presented to quantify differences in GC behaviors in neurodegenerative disease. In summary, the deficiency of frataxin has an adverse effect on the formation and, most importantly, the growth cones’ function in adult DRG neurons. As a result, frataxin deficient DRG neurons might lose the intrinsic capability to grow and regenerate axons properly due to the dysfunctional GCs they build.
Collapse
Affiliation(s)
- Diana C. Muñoz-Lasso
- Chemical Biology Group, Department of Biomedical Engineering, Eindhoven University of Technology (TU/e), Eindhoven, Netherlands
| | - Belén Mollá
- Department of Genetics, Faculty of Biological Sciences, University of Valencia, Valencia, Spain
- CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
| | - Jhon J. Sáenz-Gamboa
- Brain Connectivity Laboratory, Joint Unit FISABIO & Prince Felipe Research Centre (CIPF), Valencia, Spain
- Regional Ministry of Health in Valencia, Hospital Sagunto (CEIB-CSUSP), Valencia, Spain
- CIBER de Salud Mental (CIBERSAM), Valencia, Spain
| | | | - Maria de la Iglesia-Vaya
- Brain Connectivity Laboratory, Joint Unit FISABIO & Prince Felipe Research Centre (CIPF), Valencia, Spain
- Regional Ministry of Health in Valencia, Hospital Sagunto (CEIB-CSUSP), Valencia, Spain
- CIBER de Salud Mental (CIBERSAM), Valencia, Spain
| | - Mark A. Pook
- Biosciences, Brunel University London, Uxbridge, United Kingdom
| | - Federico V. Pallardó
- CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
- Department of Physiology, Faculty of Medicine and Dentistry, University of Valencia, Valencia, Spain
- Biomedical Research Institute INCLIVA, Valencia, Spain
| | - Francesc Palau
- CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
- Department of Genetic and Molecular Medicine IPER, Institut de Recerca Sant Joan de Déu, Hospital Sant Joan de Déu, Barcelona, Spain
- Division of Pediatrics, University of Barcelona School of Medicine and Health Sciences, Barcelona, Spain
| | - Pilar Gonzalez-Cabo
- CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
- Department of Physiology, Faculty of Medicine and Dentistry, University of Valencia, Valencia, Spain
- Biomedical Research Institute INCLIVA, Valencia, Spain
- *Correspondence: Pilar Gonzalez-Cabo,
| |
Collapse
|
6
|
Zhang JZ, Xu W, Hu P. Tightly Integrated Multiomics-based Deep Tensor Survival Model for Time-to-Event Prediction. Bioinformatics 2022; 38:3259-3266. [PMID: 35445698 DOI: 10.1093/bioinformatics/btac286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 03/12/2022] [Accepted: 04/18/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Multiomics cancer profiles provide essential signals for predicting cancer survival. It is challenging to reveal the complex patterns from multiple types of data and link them to survival outcomes. We aim to develop a new deep learning-based algorithm to integrate three types of high-dimensional omics data measured on the same individuals to improve cancer survival outcome prediction. RESULTS We built a three-dimension tensor to integrate multi-omics cancer data and factorized it into two-dimension matrices of latent factors, which were fed into neural networks-based survival networks. The new algorithm and other multi-omics-based algorithms, as well as individual genomic-based survival analysis algorithms, were applied to the breast cancer data colon and rectal cancer data from The Cancer Genome Atlas (TCGA) program. We evaluated the goodness-of-fit using the concordance index (C-index) and Integrated Brier Score (IBS). We demonstrated that the proposed tight integration framework has better survival prediction performance than the models using individual genomic data and other conventional data integration methods. AVAILABILITY https://github.com/jasperzyzhang/DeepTensorSurvival. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jasper Zhongyuan Zhang
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario M5T 3M7, Canada
| | - Wei Xu
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario M5T 3M7, Canada.,Biostatistics Department, Princess Margaret Cancer Centre, Toronto, Ontario M5G 2M9, Canada
| | - Pingzhao Hu
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario M5T 3M7, Canada.,Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Manitoba, R3E 0J9, Canada.,CancerCare Manitoba Research Institute, CancerCare Manitoba, Winnipeg, Manitoba, R3E 0V9, Canada.,Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, R3T 2N2, Canada
| |
Collapse
|
7
|
Integration of Omics and Phenotypic Data for Precision Medicine. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2486:19-35. [PMID: 35437716 DOI: 10.1007/978-1-0716-2265-0_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Over the past two decades, biomedical research is moving toward a big-data-driven approach. The underlying causes of this transition include the ability to gather genetic or molecular profiles of humans faster, the increasing adoption of electronic health record (EHR) system, and the growing interest in linking omics and phenotypic data for analysis. The integration of individual's biology data (e.g., genomics, proteomics, metabolomics), and health-care data has created unprecedented opportunities for precision medicine, that is, a medical model that uses a patient's unique information, mainly genetic, to prevent, diagnose, or treat disease. This chapter reviewed the research opportunities and applications of integrating omics and phenotypic data for precision medicine, such as understanding the relationship between genotype and phenotype, disease subtyping, and diagnosis or prediction of adverse outcomes. We reviewed the recent advanced methods, particularly the machine learning and deep learning-based approaches used for harnessing and harmonizing the multiomics and phenotypic data to address these applications. We finally discussed the challenges and future directions.
Collapse
|
8
|
Degree of Freedom of Gene Expression in Saccharomyces cerevisiae. Microbiol Spectr 2022; 10:e0083821. [PMID: 35230153 PMCID: PMC9045123 DOI: 10.1128/spectrum.00838-21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The complexity of genome-wide gene expression has not yet been adequately addressed due to a lack of comprehensive statistical analyses. In the present study, we introduce degree of freedom (DOF) as a summary statistic for evaluating gene expression complexity. Because DOF can be interpreted by a state-space representation, application of the DOF is highly useful for understanding gene activities. We used over 11,000 gene expression data sets to reveal that the DOF of gene expression in Saccharomyces cerevisiae is not greater than 450. We further demonstrated that various degrees of freedom of gene expression can be interpreted by different sequence motifs within promoter regions and Gene Ontology (GO) terms. The well-known TATA box is the most significant one among the identified motifs, while the GO term "ribosome genesis" is an associated biological process. On the basis of transcriptional freedom, our findings suggest that the regulation of gene expression can be modeled using only a few state variables. IMPORTANCE Yeast works like a well-organized factory. Each of its components works in its own way, while affecting the activities of others. The order of all activities is largely governed by the regulation of gene expression. In recent decades, biologists have recognized many regulations for yeast genes. However, it is not known how closely the regulation links each gene together to make all components of the cell work as a whole. In other words, biologists are very interested in how many independent control factors are needed to operate an artificial "cell" that works the same as a real one. In this work, we suggested that only 450 control factors were sufficient to represent the regulation of all 5800 yeast genes.
Collapse
|
9
|
Tan ZC, Murphy MC, Alpay HS, Taylor SD, Meyer AS. Tensor-structured decomposition improves systems serology analysis. Mol Syst Biol 2021; 17:e10243. [PMID: 34487431 PMCID: PMC8420856 DOI: 10.15252/msb.202110243] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 08/12/2021] [Accepted: 08/16/2021] [Indexed: 01/04/2023] Open
Abstract
Systems serology provides a broad view of humoral immunity by profiling both the antigen-binding and Fc properties of antibodies. These studies contain structured biophysical profiling across disease-relevant antigen targets, alongside additional measurements made for single antigens or in an antigen-generic manner. Identifying patterns in these measurements helps guide vaccine and therapeutic antibody development, improve our understanding of diseases, and discover conserved regulatory mechanisms. Here, we report that coupled matrix-tensor factorization (CMTF) can reduce these data into consistent patterns by recognizing the intrinsic structure of these data. We use measurements from two previous studies of HIV- and SARS-CoV-2-infected subjects as examples. CMTF outperforms standard methods like principal components analysis in the extent of data reduction while maintaining equivalent prediction of immune functional responses and disease status. Under CMTF, model interpretation improves through effective data reduction, separation of the Fc and antigen-binding effects, and recognition of consistent patterns across individual measurements. Data reduction also helps make prediction models more replicable. Therefore, we propose that CMTF is an effective general strategy for data exploration in systems serology.
Collapse
Affiliation(s)
- Zhixin Cyrillus Tan
- Bioinformatics Interdepartmental ProgramUniversity of California, Los AngelesLos AngelesCAUSA
| | - Madeleine C Murphy
- Computational and Systems BiologyUniversity of California, Los AngelesLos AngelesCAUSA
| | - Hakan S Alpay
- Department of Computer ScienceUniversity of California, Los AngelesLos AngelesCAUSA
| | - Scott D Taylor
- Department of BioengineeringUniversity of California, Los AngelesLos AngelesCAUSA
| | - Aaron S Meyer
- Bioinformatics Interdepartmental ProgramUniversity of California, Los AngelesLos AngelesCAUSA
- Department of BioengineeringUniversity of California, Los AngelesLos AngelesCAUSA
- Jonsson Comprehensive Cancer CenterUniversity of California, Los AngelesLos AngelesCAUSA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell ResearchUniversity of California, Los AngelesLos AngelesCAUSA
| |
Collapse
|
10
|
Chang SM, Yang M, Lu W, Huang YJ, Huang Y, Hung H, Miecznikowski JC, Lu TP, Tzeng JY. Gene-Set Integrative Analysis of Multi-Omics Data Using Tensor-based Association Test. Bioinformatics 2021; 37:2259-2265. [PMID: 33674827 PMCID: PMC8388036 DOI: 10.1093/bioinformatics/btab125] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 12/30/2020] [Accepted: 02/24/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Facilitated by technological advances and the decrease in costs, it is feasible to gather subject data from several omics platforms. Each platform assesses different molecular events, and the challenge lies in efficiently analyzing these data to discover novel disease genes or mechanisms. A common strategy is to regress the outcomes on all omics variables in a gene set. However, this approach suffers from problems associated with high-dimensional inference. RESULTS We introduce a tensor-based framework for variable-wise inference in multi-omics analysis. By accounting for the matrix structure of an individual's multi-omics data, the proposed tensor methods incorporate the relationship among omics effects, reduce the number of parameters, and boost the modeling efficiency. We derive the variable-specific tensor test and enhance computational efficiency of tensor modeling. Using simulations and data applications on the Cancer Cell Line Encyclopedia (CCLE), we demonstrate our method performs favorably over baseline methods and will be useful for gaining biological insights in multi-omics analysis. AVAILABILITY AND IMPLEMENTATION R function and instruction are available from the authors' website: https://www4.stat.ncsu.edu/∼jytzeng/Software/TR.omics/TRinstruction.pdf. SUPPLEMENTARY INFORMATION Supplementary materials are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sheng-Mao Chang
- Department of Statistics, National Cheng Kung University, Tainan, Taiwan
| | - Meng Yang
- Department of Statistics, North Carolina State University, Raleigh NC, 27695, USA
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh NC, 27695, USA
| | - Yu-Jyun Huang
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Yueyang Huang
- Bioinformatics Research Center, North Carolina State University, Raleigh NC, 27695, USA
| | - Hung Hung
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | | | - Tzu-Pin Lu
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Jung-Ying Tzeng
- Department of Statistics, National Cheng Kung University, Tainan, Taiwan.,Department of Statistics, North Carolina State University, Raleigh NC, 27695, USA.,Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan.,Bioinformatics Research Center, North Carolina State University, Raleigh NC, 27695, USA
| |
Collapse
|
11
|
Choi D, Lee S. SNeCT: Scalable Network Constrained Tucker Decomposition for Multi-Platform Data Profiling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1785-1796. [PMID: 30908262 DOI: 10.1109/tcbb.2019.2906205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
How do we integratively profile large-scale multi-platform genomic data that are high dimensional and sparse? Furthermore, how can we incorporate prior knowledge, such as the association between genes, in the analysis systematically to find better latent relationships? To solve this problem, we propose a Scalable Network Constrained Tucker decomposition method (SNeCT). SNeCT adopts parallel stochastic gradient descent approach on the proposed parallelizable network constrained optimization function. SNeCT decomposition is applied to a tensor constructed from a large scale multi-platform multi-cohort cancer data, PanCan12, constrained on a network built from PathwayCommons database. The decomposed factor matrices are applied to stratify cancers, to search for top- k similar patients given a new patient, and to illustrate how the matrices can be used to identify significant genomic patterns in each patient. In the stratification test, combined twelve-cohort data is clustered to form thirteen subclasses. The similarity of the top- k patient to the query was high for 23 clinical features, including estrogen/progesterone receptor statuses of BRCA patients with average precision value ranges from 0.72 to 0.86 and from 0.68 to 0.86, respectively. We also illustrate how the factor matrices can be used for identifying significant patterns for each patient. Resources are available at: https://github.com/leesael/SNeCT.
Collapse
|
12
|
Kravchenko-Balasha N. Translating Cancer Molecular Variability into Personalized Information Using Bulk and Single Cell Approaches. Proteomics 2020; 20:e1900227. [PMID: 32072740 DOI: 10.1002/pmic.201900227] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Revised: 01/13/2020] [Indexed: 12/17/2022]
Abstract
Cancer research is striving toward new frontiers of assigning the correct personalized drug(s) to a given patient. However, extensive tumor heterogeneity poses a major obstacle. Tumors of the same type often respond differently to therapy, due to patient-specific molecular aberrations and/or untargeted tumor subpopulations. It is frequently not possible to determine a priori which patients will respond to a certain therapy or how an efficient patient-specific combined therapy should be designed. Large-scale datasets have been growing at an accelerated pace and various technologies and analytical tools for single cell and bulk level analyses are being developed to extract significant individualized signals from such heterogeneous data. However, personalized therapies that dramatically alter the course of the disease remain scarce, and most tumors still respond poorly to medical care. In this review, the basic concepts of bulk and single cell approaches are discussed, as well as their emerging role in individualized designs of drug therapies, including the advantages and limitations of their applications in personalized medicine.
Collapse
Affiliation(s)
- Nataly Kravchenko-Balasha
- Department for Bio-Medical Research, Faculty of Dental Medicine, Hebrew University of Jerusalem, Jerusalem, 91120, Israel
| |
Collapse
|
13
|
Zhang L, Zhang S. Learning common and specific patterns from data of multiple interrelated biological scenarios with matrix factorization. Nucleic Acids Res 2020; 47:6606-6617. [PMID: 31175825 PMCID: PMC6649783 DOI: 10.1093/nar/gkz488] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 05/11/2019] [Accepted: 05/22/2019] [Indexed: 11/18/2022] Open
Abstract
High-throughput biological technologies (e.g. ChIP-seq, RNA-seq and single-cell RNA-seq) rapidly accelerate the accumulation of genome-wide omics data in diverse interrelated biological scenarios (e.g. cells, tissues and conditions). Integration and differential analysis are two common paradigms for exploring and analyzing such data. However, current integrative methods usually ignore the differential part, and typical differential analysis methods either fail to identify combinatorial patterns of difference or require matched dimensions of the data. Here, we propose a flexible framework CSMF to combine them into one paradigm to simultaneously reveal Common and Specific patterns via Matrix Factorization from data generated under interrelated biological scenarios. We demonstrate the effectiveness of CSMF with four representative applications including pairwise ChIP-seq data describing the chromatin modification map between K562 and Huvec cell lines; pairwise RNA-seq data representing the expression profiles of two different cancers; RNA-seq data of three breast cancer subtypes; and single-cell RNA-seq data of human embryonic stem cell differentiation at six time points. Extensive analysis yields novel insights into hidden combinatorial patterns in these multi-modal data. Results demonstrate that CSMF is a powerful tool to uncover common and specific patterns with significant biological implications from data of interrelated biological scenarios.
Collapse
Affiliation(s)
- Lihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
14
|
|
15
|
Sheu K, Luecke S, Hoffmann A. Stimulus-specificity in the Responses of Immune Sentinel Cells. ACTA ACUST UNITED AC 2019; 18:53-61. [PMID: 32864512 DOI: 10.1016/j.coisb.2019.10.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Innate immune sentinel cells must initiate and orchestrate appropriate immune responses for myriad pathogens. These stimulus-specific gene expression responses are mediated by combinatorial and temporal coding within a handful of immune response signaling pathways. We outline the scope of our current understanding and indicate pressing outstanding questions. The innate immune response is a first-line defense against invading pathogens and coordinates the activation and recruitment of specialized immune cells, thereby initiating the adaptive immune response. While the adaptive immune system is capable of highly pathogen-specific immunity through the process of genetic recombination and clonal selection, innate immunity is frequently viewed as a catch-all system that initiates general immune activation. In this review, we are re-examining this view, as we are distinguishing between immune sentinel functions mediated by macrophages and dendritic cells and innate immune effector functions mediated by cells such as neutrophils, NK cells, etc. Given pathogen diversity, including modes of entry, replication cycles, and strategies of immune evasion and spread, all successive waves of the immune response ought to be tailored to the specific immune threat, leading us to postulate that immune sentinel functions by macrophages and dendritic cells ought to be highly stimulus-specific. Here we review the experimental evidence for stimulus-specific responses by immune sentinel cells which initiate and coordinate immune responses, as well as the mechanisms by which this specificity may be achieved.
Collapse
Affiliation(s)
- Katherine Sheu
- Institute for Quantitative and Computational Biosciences and Department for Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095
| | - Stefanie Luecke
- Institute for Quantitative and Computational Biosciences and Department for Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095
| | - Alexander Hoffmann
- Institute for Quantitative and Computational Biosciences and Department for Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095
| |
Collapse
|
16
|
Fang J. Tightly integrated genomic and epigenomic data mining using tensor decomposition. Bioinformatics 2019; 35:112-118. [PMID: 29939222 DOI: 10.1093/bioinformatics/bty513] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Accepted: 06/21/2018] [Indexed: 12/12/2022] Open
Abstract
Motivation Complex diseases such as cancers often involve multiple types of genomic and/or epigenomic abnormalities. Rapid accumulation of multiple types of omics data demands methods for integrating the multidimensional data in order to elucidate complex relationships among different types of genomic and epigenomic abnormalities. Results In the present study, we propose a tightly integrated approach based on tensor decomposition. Multiple types of data, including mRNA, methylation, copy number variations and somatic mutations, are merged into a high-order tensor which is used to develop predictive models for overall survival. The weight tensors of the models are constrained using CANDECOMP/PARAFAC (CP) tensor decomposition and learned using support tensor machine regression (STR) and ridge tensor regression (RTR). The results demonstrate that the tensor decomposition based approaches can achieve better performance than the models based individual data type and the concatenation approach. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jianwen Fang
- Computational & Systems Biology Branch, Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, 9609 Medical Center Dr., Rockville, MD, USA
| |
Collapse
|
17
|
Lou J, Cheung YM. Robust Low-Rank Tensor Minimization via a New Tensor Spectral k-Support Norm. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:2314-2327. [PMID: 31634129 DOI: 10.1109/tip.2019.2946445] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Recently, based on a new tensor algebraic framework for third-order tensors, the tensor singular value decomposition (t-SVD) and its associated tubal rank definition have shed new light on low-rank tensor modeling. Its applications to robust image/video recovery and background modeling show promising performance due to its superior capability in modeling cross-channel/frame information. Under the t-SVD framework, we propose a new tensor norm called tensor spectral k-support norm (TSP-k) by an alternative convex relaxation. As an interpolation between the existing tensor nuclear norm (TNN) and tensor Frobenius norm (TFN), it is able to simultaneously drive minor singular values to zero to induce low-rankness, and to capture more global information for better preserving intrinsic structure. We provide the proximal operator and the polar operator for the TSP-k norm as key optimization blocks, along with two showcase optimization algorithms for medium-and large-size tensors. Experiments on synthetic, image and video datasets in medium and large sizes, all verify the superiority of the TSP-k norm and the effectiveness of both optimization methods in comparison with the existing counterparts.
Collapse
|
18
|
Hériché JK, Alexander S, Ellenberg J. Integrating Imaging and Omics: Computational Methods and Challenges. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-080917-013328] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Fluorescence microscopy imaging has long been complementary to DNA sequencing- and mass spectrometry–based omics in biomedical research, but these approaches are now converging. On the one hand, omics methods are moving from in vitro methods that average across large cell populations to in situ molecular characterization tools with single-cell sensitivity. On the other hand, fluorescence microscopy imaging has moved from a morphological description of tissues and cells to quantitative molecular profiling with single-molecule resolution. Recent technological developments underpinned by computational methods have started to blur the lines between imaging and omics and have made their direct correlation and seamless integration an exciting possibility. As this trend continues rapidly, it will allow us to create comprehensive molecular profiles of living systems with spatial and temporal context and subcellular resolution. Key to achieving this ambitious goal will be novel computational methods and successfully dealing with the challenges of data integration and sharing as well as cloud-enabled big data analysis.
Collapse
Affiliation(s)
- Jean-Karim Hériché
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | - Stephanie Alexander
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | - Jan Ellenberg
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| |
Collapse
|
19
|
Wang M, Fischer J, Song YS. THREE-WAY CLUSTERING OF MULTI-TISSUE MULTI-INDIVIDUAL GENE EXPRESSION DATA USING SEMI-NONNEGATIVE TENSOR DECOMPOSITION. Ann Appl Stat 2019; 13:1103-1127. [PMID: 33381253 PMCID: PMC7771883 DOI: 10.1214/18-aoas1228] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
The advent of high-throughput sequencing technologies has led to an increasing availability of large multi-tissue data sets which contain gene expression measurements across different tissues and individuals. In this setting, variation in expression levels arises due to contributions specific to genes, tissues, individuals, and interactions thereof. Classical clustering methods are ill-suited to explore these three-way interactions and struggle to fully extract the insights into transcriptome complexity contained in the data. We propose a new statistical method, called MultiCluster, based on semi-nonnegative tensor decomposition which permits the investigation of transcriptome variation across individuals and tissues simultaneously. We further develop a tensor projection procedure which detects covariate-related genes with high power, demonstrating the advantage of tensor-based methods in incorporating information across similar tissues. Through simulation and application to the GTEx RNA-seq data from 53 human tissues, we show that MultiCluster identifies three-way interactions with high accuracy and robustness.
Collapse
Affiliation(s)
- Miaoyan Wang
- University of Wisconsin, Madison and University of California, Berkeley
| | - Jonathan Fischer
- University of Wisconsin, Madison and University of California, Berkeley
| | - Yun S Song
- University of Wisconsin, Madison and University of California, Berkeley
| |
Collapse
|
20
|
Esposito F, Gillis N, Del Buono N. Orthogonal joint sparse NMF for microarray data analysis. J Math Biol 2019; 79:223-247. [PMID: 31004215 DOI: 10.1007/s00285-019-01355-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 03/29/2019] [Indexed: 12/20/2022]
Abstract
The 3D microarrays, generally known as gene-sample-time microarrays, couple the information on different time points collected by 2D microarrays that measure gene expression levels among different samples. Their analysis is useful in several biomedical applications, like monitoring dose or drug treatment responses of patients over time in pharmacogenomics studies. Many statistical and data analysis tools have been used to extract useful information. In particular, nonnegative matrix factorization (NMF), with its natural nonnegativity constraints, has demonstrated its ability to extract from 2D microarrays relevant information on specific genes involved in the particular biological process. In this paper, we propose a new NMF model, namely Orthogonal Joint Sparse NMF, to extract relevant information from 3D microarrays containing the time evolution of a 2D microarray, by adding additional constraints to enforce important biological proprieties useful for further biological analysis. We develop multiplicative updates rules that decrease the objective function monotonically, and compare our approach to state-of-the-art NMF algorithms on both synthetic and real data sets.
Collapse
Affiliation(s)
- Flavia Esposito
- Department of Mathematics, University of Bari Aldo Moro, via E. Orabona 4, 70125, Bari, Italy. .,INDAM Research Group GNCS, Roma, Italy.
| | - Nicolas Gillis
- Department of Mathematics and Operational Research, Université de Mons, Rue de Houdain 9, 7000, Mons, Belgium
| | - Nicoletta Del Buono
- Department of Mathematics, University of Bari Aldo Moro, via E. Orabona 4, 70125, Bari, Italy.,INDAM Research Group GNCS, Roma, Italy
| |
Collapse
|
21
|
Li Y, Wu FX, Ngom A. A review on machine learning principles for multi-view biological data integration. Brief Bioinform 2019; 19:325-340. [PMID: 28011753 DOI: 10.1093/bib/bbw113] [Citation(s) in RCA: 126] [Impact Index Per Article: 25.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Indexed: 01/08/2023] Open
Abstract
Driven by high-throughput sequencing techniques, modern genomic and clinical studies are in a strong need of integrative machine learning models for better use of vast volumes of heterogeneous information in the deep understanding of biological systems and the development of predictive models. How data from multiple sources (called multi-view data) are incorporated in a learning system is a key step for successful analysis. In this article, we provide a comprehensive review on omics and clinical data integration techniques, from a machine learning perspective, for various analyses such as prediction, clustering, dimension reduction and association. We shall show that Bayesian models are able to use prior information and model measurements with various distributions; tree-based methods can either build a tree with all features or collectively make a final decision based on trees learned from each view; kernel methods fuse the similarity matrices learned from individual views together for a final similarity matrix or learning model; network-based fusion methods are capable of inferring direct and indirect associations in a heterogeneous network; matrix factorization models have potential to learn interactions among features from different views; and a range of deep neural networks can be integrated in multi-modal learning for capturing the complex mechanism of biological systems.
Collapse
Affiliation(s)
- Yifeng Li
- Information and Communications Technologies, National Research Council Canada, Ottawa, Ontario, Canada
| | - Fang-Xiang Wu
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Alioune Ngom
- School of Computer Science, University of Windsor, Windsor, Ontario, Canada
| |
Collapse
|
22
|
Williams AH, Kim TH, Wang F, Vyas S, Ryu SI, Shenoy KV, Schnitzer M, Kolda TG, Ganguli S. Unsupervised Discovery of Demixed, Low-Dimensional Neural Dynamics across Multiple Timescales through Tensor Component Analysis. Neuron 2018; 98:1099-1115.e8. [PMID: 29887338 DOI: 10.1016/j.neuron.2018.05.015] [Citation(s) in RCA: 132] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 03/18/2018] [Accepted: 05/08/2018] [Indexed: 01/19/2023]
Abstract
Perceptions, thoughts, and actions unfold over millisecond timescales, while learned behaviors can require many days to mature. While recent experimental advances enable large-scale and long-term neural recordings with high temporal fidelity, it remains a formidable challenge to extract unbiased and interpretable descriptions of how rapid single-trial circuit dynamics change slowly over many trials to mediate learning. We demonstrate a simple tensor component analysis (TCA) can meet this challenge by extracting three interconnected, low-dimensional descriptions of neural data: neuron factors, reflecting cell assemblies; temporal factors, reflecting rapid circuit dynamics mediating perceptions, thoughts, and actions within each trial; and trial factors, describing both long-term learning and trial-to-trial changes in cognitive state. We demonstrate the broad applicability of TCA by revealing insights into diverse datasets derived from artificial neural networks, large-scale calcium imaging of rodent prefrontal cortex during maze navigation, and multielectrode recordings of macaque motor cortex during brain machine interface learning.
Collapse
Affiliation(s)
- Alex H Williams
- Neurosciences Graduate Program, Stanford University, Stanford, CA 94305, USA.
| | - Tony Hyun Kim
- Electrical Engineering Department, Stanford University, Stanford, CA 94305, USA
| | - Forea Wang
- Neurosciences Graduate Program, Stanford University, Stanford, CA 94305, USA
| | - Saurabh Vyas
- Electrical Engineering Department, Stanford University, Stanford, CA 94305, USA; Bioengineering Department, Stanford University, Stanford, CA 94305, USA
| | - Stephen I Ryu
- Electrical Engineering Department, Stanford University, Stanford, CA 94305, USA; Department of Neurosurgery, Palo Alto Medical Foundation, Palo Alto, CA 94301, USA
| | - Krishna V Shenoy
- Electrical Engineering Department, Stanford University, Stanford, CA 94305, USA; Bioengineering Department, Stanford University, Stanford, CA 94305, USA; Neurobiology Department, Stanford University, Stanford, CA 94305, USA; Bio-X Program, Stanford University, Stanford, CA 94305, USA; Stanford Neurosciences Institute, Stanford University, Stanford, CA 94305, USA; Howard Hughes Medical Institute, Stanford University, Stanford, CA 94305, USA
| | - Mark Schnitzer
- Applied Physics Department, Stanford University, Stanford, CA 94305, USA; Biology Department, Stanford University, Stanford, CA 94305, USA; Bio-X Program, Stanford University, Stanford, CA 94305, USA; Howard Hughes Medical Institute, Stanford University, Stanford, CA 94305, USA; CNC Program, Stanford University, Stanford, CA 94305, USA
| | | | - Surya Ganguli
- Applied Physics Department, Stanford University, Stanford, CA 94305, USA; Neurobiology Department, Stanford University, Stanford, CA 94305, USA; Bio-X Program, Stanford University, Stanford, CA 94305, USA; Stanford Neurosciences Institute, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
23
|
Frelat R, Lindegren M, Denker TS, Floeter J, Fock HO, Sguotti C, Stäbler M, Otto SA, Möllmann C. Community ecology in 3D: Tensor decomposition reveals spatio-temporal dynamics of large ecological communities. PLoS One 2017; 12:e0188205. [PMID: 29136658 PMCID: PMC5685633 DOI: 10.1371/journal.pone.0188205] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 11/02/2017] [Indexed: 11/19/2022] Open
Abstract
Understanding spatio-temporal dynamics of biotic communities containing large numbers of species is crucial to guide ecosystem management and conservation efforts. However, traditional approaches usually focus on studying community dynamics either in space or in time, often failing to fully account for interlinked spatio-temporal changes. In this study, we demonstrate and promote the use of tensor decomposition for disentangling spatio-temporal community dynamics in long-term monitoring data. Tensor decomposition builds on traditional multivariate statistics (e.g. Principal Component Analysis) but extends it to multiple dimensions. This extension allows for the synchronized study of multiple ecological variables measured repeatedly in time and space. We applied this comprehensive approach to explore the spatio-temporal dynamics of 65 demersal fish species in the North Sea, a marine ecosystem strongly altered by human activities and climate change. Our case study demonstrates how tensor decomposition can successfully (i) characterize the main spatio-temporal patterns and trends in species abundances, (ii) identify sub-communities of species that share similar spatial distribution and temporal dynamics, and (iii) reveal external drivers of change. Our results revealed a strong spatial structure in fish assemblages persistent over time and linked to differences in depth, primary production and seasonality. Furthermore, we simultaneously characterized important temporal distribution changes related to the low frequency temperature variability inherent in the Atlantic Multidecadal Oscillation. Finally, we identified six major sub-communities composed of species sharing similar spatial distribution patterns and temporal dynamics. Our case study demonstrates the application and benefits of using tensor decomposition for studying complex community data sets usually derived from large-scale monitoring programs.
Collapse
Affiliation(s)
- Romain Frelat
- University of Hamburg, Institute for Hydrobiology and Fisheries Science, Center for Earth System Research and Sustainability (CEN), KlimaCampus Hamburg, Große Elbstraße 133, Hamburg, Germany
- * E-mail:
| | - Martin Lindegren
- Centre for Ocean Life, National Institute of Aquatic Resources, Technical University of Denmark, Kemitorvet, Bygning 202, Kgs. Lyngby, Denmark
| | - Tim Spaanheden Denker
- Centre for Ocean Life, National Institute of Aquatic Resources, Technical University of Denmark, Kemitorvet, Bygning 202, Kgs. Lyngby, Denmark
| | - Jens Floeter
- University of Hamburg, Institute for Hydrobiology and Fisheries Science, Center for Earth System Research and Sustainability (CEN), KlimaCampus Hamburg, Große Elbstraße 133, Hamburg, Germany
| | - Heino O. Fock
- Thünen-Institute of Sea Fisheries, Palmaille 9, Hamburg, Germany
| | - Camilla Sguotti
- University of Hamburg, Institute for Hydrobiology and Fisheries Science, Center for Earth System Research and Sustainability (CEN), KlimaCampus Hamburg, Große Elbstraße 133, Hamburg, Germany
| | - Moritz Stäbler
- Leibniz-Centre for Tropical Marine Ecology, Fahrenheitstraße 6, Bremen, Germany
| | - Saskia A. Otto
- University of Hamburg, Institute for Hydrobiology and Fisheries Science, Center for Earth System Research and Sustainability (CEN), KlimaCampus Hamburg, Große Elbstraße 133, Hamburg, Germany
| | - Christian Möllmann
- University of Hamburg, Institute for Hydrobiology and Fisheries Science, Center for Earth System Research and Sustainability (CEN), KlimaCampus Hamburg, Große Elbstraße 133, Hamburg, Germany
| |
Collapse
|
24
|
Wu M, Huang J, Ma S. Identifying gene-gene interactions using penalized tensor regression. Stat Med 2017; 37:598-610. [PMID: 29034516 DOI: 10.1002/sim.7523] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Revised: 09/08/2017] [Accepted: 09/12/2017] [Indexed: 12/15/2022]
Abstract
Gene-gene (G×G) interactions have been shown to be critical for the fundamental mechanisms and development of complex diseases beyond main genetic effects. The commonly adopted marginal analysis is limited by considering only a small number of G factors at a time. With the "main effects, interactions" hierarchical constraint, many of the existing joint analysis methods suffer from prohibitively high computational cost. In this study, we propose a new method for identifying important G×G interactions under joint modeling. The proposed method adopts tensor regression to accommodate high data dimensionality and the penalization technique for selection. It naturally accommodates the strong hierarchical structure without imposing additional constraints, making optimization much simpler and faster than in the existing studies. It outperforms multiple alternatives in simulation. The analysis of The Cancer Genome Atlas (TCGA) data on lung cancer and melanoma demonstrates that it can identify markers with important implications and better prediction performance.
Collapse
Affiliation(s)
- Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, 777 Guoding Road, Shanghai 200433, China.,Department of Biostatistics, School of Public Health, Yale University, 60 College Street, New Haven, CT 06520, USA
| | - Jian Huang
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
| | - Shuangge Ma
- Department of Biostatistics, School of Public Health, Yale University, 60 College Street, New Haven, CT 06520, USA
| |
Collapse
|
25
|
Roy S, Yun D, Madahian B, Berry MW, Deng LY, Goldowitz D, Homayouni R. Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts. Front Bioeng Biotechnol 2017; 5:48. [PMID: 28894735 PMCID: PMC5581332 DOI: 10.3389/fbioe.2017.00048] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 07/31/2017] [Indexed: 01/09/2023] Open
Abstract
In this study, we developed and evaluated a novel text-mining approach, using non-negative tensor factorization (NTF), to simultaneously extract and functionally annotate transcriptional modules consisting of sets of genes, transcription factors (TFs), and terms from MEDLINE abstracts. A sparse 3-mode term × gene × TF tensor was constructed that contained weighted frequencies of 106,895 terms in 26,781 abstracts shared among 7,695 genes and 994 TFs. The tensor was decomposed into sub-tensors using non-negative tensor factorization (NTF) across 16 different approximation ranks. Dominant entries of each of 2,861 sub-tensors were extracted to form term–gene–TF annotated transcriptional modules (ATMs). More than 94% of the ATMs were found to be enriched in at least one KEGG pathway or GO category, suggesting that the ATMs are functionally relevant. One advantage of this method is that it can discover potentially new gene–TF associations from the literature. Using a set of microarray and ChIP-Seq datasets as gold standard, we show that the precision of our method for predicting gene–TF associations is significantly higher than chance. In addition, we demonstrate that the terms in each ATM can be used to suggest new GO classifications to genes and TFs. Taken together, our results indicate that NTF is useful for simultaneous extraction and functional annotation of transcriptional regulatory networks from unstructured text, as well as for literature based discovery. A web tool called Transcriptional Regulatory Modules Extracted from Literature (TREMEL), available at http://binf1.memphis.edu/tremel, was built to enable browsing and searching of ATMs.
Collapse
Affiliation(s)
- Sujoy Roy
- Bioinformatics Program, University of Memphis, Memphis, TN, United States.,Center for Translational Informatics, University of Memphis, Memphis, TN, United States
| | - Daqing Yun
- Computer and Information Sciences Program, Harrisburg University of Science and Technology, Harrisburg, PA, United States
| | - Behrouz Madahian
- Department of Mathematical Sciences, University of Memphis, Memphis, TN, United States
| | - Michael W Berry
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, United States
| | - Lih-Yuan Deng
- Department of Mathematical Sciences, University of Memphis, Memphis, TN, United States
| | - Daniel Goldowitz
- Center for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver, BC, Canada
| | - Ramin Homayouni
- Bioinformatics Program, University of Memphis, Memphis, TN, United States.,Center for Translational Informatics, University of Memphis, Memphis, TN, United States.,Department of Biological Sciences, University of Memphis, Memphis, TN, United States
| |
Collapse
|
26
|
Taguchi YH. Tensor decomposition-based unsupervised feature extraction applied to matrix products for multi-view data processing. PLoS One 2017; 12:e0183933. [PMID: 28841719 PMCID: PMC5571984 DOI: 10.1371/journal.pone.0183933] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Accepted: 08/04/2017] [Indexed: 01/17/2023] Open
Abstract
In the current era of big data, the amount of data available is continuously increasing. Both the number and types of samples, or features, are on the rise. The mixing of distinct features often makes interpretation more difficult. However, separate analysis of individual types requires subsequent integration. A tensor is a useful framework to deal with distinct types of features in an integrated manner without mixing them. On the other hand, tensor data is not easy to obtain since it requires the measurements of huge numbers of combinations of distinct features; if there are m kinds of features, each of which has N dimensions, the number of measurements needed are as many as Nm, which is often too large to measure. In this paper, I propose a new method where a tensor is generated from individual features without combinatorial measurements, and the generated tensor was decomposed back to matrices, by which unsupervised feature extraction was performed. In order to demonstrate the usefulness of the proposed strategy, it was applied to synthetic data, as well as three omics datasets. It outperformed other matrix-based methodologies.
Collapse
Affiliation(s)
- Y-h. Taguchi
- Department of Physics, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan
- * E-mail:
| |
Collapse
|
27
|
Wang M, Duc KD, Fischer J, Song YS. OPERATOR NORM INEQUALITIES BETWEEN TENSOR UNFOLDINGS ON THE PARTITION LATTICE. LINEAR ALGEBRA AND ITS APPLICATIONS 2017; 520:44-66. [PMID: 28286347 PMCID: PMC5340277 DOI: 10.1016/j.laa.2017.01.017] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Interest in higher-order tensors has recently surged in data-intensive fields, with a wide range of applications including image processing, blind source separation, community detection, and feature extraction. A common paradigm in tensor-related algorithms advocates unfolding (or flattening) the tensor into a matrix and applying classical methods developed for matrices. Despite the popularity of such techniques, how the functional properties of a tensor changes upon unfolding is currently not well understood. In contrast to the body of existing work which has focused almost exclusively on matricizations, we here consider all possible unfoldings of an order-k tensor, which are in one-to-one correspondence with the set of partitions of {1, …, k}. We derive general inequalities between the lp -norms of arbitrary unfoldings defined on the partition lattice. In particular, we demonstrate how the spectral norm (p = 2) of a tensor is bounded by that of its unfoldings, and obtain an improved upper bound on the ratio of the Frobenius norm to the spectral norm of an arbitrary tensor. For specially-structured tensors satisfying a generalized definition of orthogonal decomposability, we prove that the spectral norm remains invariant under specific subsets of unfolding operations.
Collapse
Affiliation(s)
- Miaoyan Wang
- Department of Mathematics, University of Pennsylvania
| | - Khanh Dao Duc
- Department of Mathematics, University of Pennsylvania
| | | | - Yun S Song
- Department of Mathematics, University of Pennsylvania; Department of Statistics, University of California, Berkeley; Computer Science Division, University of California, Berkeley
| |
Collapse
|
28
|
Luo Y, Wang F, Szolovits P. Tensor factorization toward precision medicine. Brief Bioinform 2017; 18:511-514. [PMID: 26994614 PMCID: PMC6078180 DOI: 10.1093/bib/bbw026] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Revised: 01/08/2016] [Indexed: 11/13/2022] Open
Abstract
Precision medicine initiatives come amid the rapid growth in quantity and variety of biomedical data, which exceeds the capacity of matrix-oriented data representations and many current analysis algorithms. Tensor factorizations extend the matrix view to multiple modalities and support dimensionality reduction methods that identify latent groups of data for meaningful summarization of both features and instances. In this opinion article, we analyze the modest literature on applying tensor factorization to various biomedical fields including genotyping and phenotyping. Based on the cited work including work of our own, we suggest that tensor applications could serve as an effective tool to enable frequent updating of medical knowledge based on the continually growing scientific and clinical evidence. We encourage extensive experimental studies to tackle challenges including design choice of factorizations, integrating temporality and algorithm scalability.
Collapse
|
29
|
Schmitz G, Madsen NK, Christiansen O. Atomic-batched tensor decomposed two-electron repulsion integrals. J Chem Phys 2017; 146:134112. [DOI: 10.1063/1.4979571] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Gunnar Schmitz
- Department of Chemistry, Aarhus Universitet, DK-8000 Aarhus, Denmark
| | | | - Ove Christiansen
- Department of Chemistry, Aarhus Universitet, DK-8000 Aarhus, Denmark
| |
Collapse
|
30
|
|
31
|
Zhao H, Wang DD, Chen L, Liu X, Yan H. Identifying Multi-Dimensional Co-Clusters in Tensors Based on Hyperplane Detection in Singular Vector Spaces. PLoS One 2016; 11:e0162293. [PMID: 27598575 PMCID: PMC5012624 DOI: 10.1371/journal.pone.0162293] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2016] [Accepted: 08/19/2016] [Indexed: 11/18/2022] Open
Abstract
Co-clustering, often called biclustering for two-dimensional data, has found many applications, such as gene expression data analysis and text mining. Nowadays, a variety of multi-dimensional arrays (tensors) frequently occur in data analysis tasks, and co-clustering techniques play a key role in dealing with such datasets. Co-clusters represent coherent patterns and exhibit important properties along all the modes. Development of robust co-clustering techniques is important for the detection and analysis of these patterns. In this paper, a co-clustering method based on hyperplane detection in singular vector spaces (HDSVS) is proposed. Specifically in this method, higher-order singular value decomposition (HOSVD) transforms a tensor into a core part and a singular vector matrix along each mode, whose row vectors can be clustered by a linear grouping algorithm (LGA). Meanwhile, hyperplanar patterns are extracted and successfully supported the identification of multi-dimensional co-clusters. To validate HDSVS, a number of synthetic and biological tensors were adopted. The synthetic tensors attested a favorable performance of this algorithm on noisy or overlapped data. Experiments with gene expression data and lineage data of embryonic cells further verified the reliability of HDSVS to practical problems. Moreover, the detected co-clusters are well consistent with important genetic pathways and gene ontology annotations. Finally, a series of comparisons between HDSVS and state-of-the-art methods on synthetic tensors and a yeast gene expression tensor were implemented, verifying the robust and stable performance of our method.
Collapse
Affiliation(s)
- Hongya Zhao
- Industrial Center, Shenzhen Polytechnic, Shenzhen, China
| | - Debby D. Wang
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
- Caritas Institute of Higher Education, New Territories, Hong Kong
| | - Long Chen
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
- * E-mail:
| | - Xinyu Liu
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
| | - Hong Yan
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
| |
Collapse
|
32
|
Nguyen NAT, Yang HJ, Kim S. Hidden discriminative features extraction for supervised high-order time series modeling. Comput Biol Med 2016; 78:81-90. [PMID: 27665534 DOI: 10.1016/j.compbiomed.2016.08.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Revised: 08/24/2016] [Accepted: 08/25/2016] [Indexed: 10/21/2022]
Abstract
In this paper, an orthogonal Tucker-decomposition-based extraction of high-order discriminative subspaces from a tensor-based time series data structure is presented, named as Tensor Discriminative Feature Extraction (TDFE). TDFE relies on the employment of category information for the maximization of the between-class scatter and the minimization of the within-class scatter to extract optimal hidden discriminative feature subspaces that are simultaneously spanned by every modality for supervised tensor modeling. In this context, the proposed tensor-decomposition method provides the following benefits: i) reduces dimensionality while robustly mining the underlying discriminative features, ii) results in effective interpretable features that lead to an improved classification and visualization, and iii) reduces the processing time during the training stage and the filtering of the projection by solving the generalized eigenvalue issue at each alternation step. Two real third-order tensor-structures of time series datasets (an epilepsy electroencephalogram (EEG) that is modeled as channel×frequency bin×time frame and a microarray data that is modeled as gene×sample×time) were used for the evaluation of the TDFE. The experiment results corroborate the advantages of the proposed method with averages of 98.26% and 89.63% for the classification accuracies of the epilepsy dataset and the microarray dataset, respectively. These performance averages represent an improvement on those of the matrix-based algorithms and recent tensor-based, discriminant-decomposition approaches; this is especially the case considering the small number of samples that are used in practice.
Collapse
Affiliation(s)
- Ngoc Anh Thi Nguyen
- Department of Computer Science, Chonnam National University, Gwangju 500-757, South Korea; Faculty of Information Technology, University of Education, The University of Danang, VietNam.
| | - Hyung-Jeong Yang
- Department of Computer Science, Chonnam National University, Gwangju 500-757, South Korea.
| | - Sunhee Kim
- Department of Brain and Cognitive Engineering, Korea University, Seoul 136-713, South Korea.
| |
Collapse
|
33
|
Chitforoushzadeh Z, Ye Z, Sheng Z, LaRue S, Fry RC, Lauffenburger DA, Janes KA. TNF-insulin crosstalk at the transcription factor GATA6 is revealed by a model that links signaling and transcriptomic data tensors. Sci Signal 2016; 9:ra59. [PMID: 27273097 DOI: 10.1126/scisignal.aad3373] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Signal transduction networks coordinate transcriptional programs activated by diverse extracellular stimuli, such as growth factors and cytokines. Cells receive multiple stimuli simultaneously, and mapping how activation of the integrated signaling network affects gene expression is a challenge. We stimulated colon adenocarcinoma cells with various combinations of the cytokine tumor necrosis factor (TNF) and the growth factors insulin and epidermal growth factor (EGF) to investigate signal integration and transcriptional crosstalk. We quantitatively linked the proteomic and transcriptomic data sets by implementing a structured computational approach called tensor partial least squares regression. This statistical model accurately predicted transcriptional signatures from signaling arising from single and combined stimuli and also predicted time-dependent contributions of signaling events. Specifically, the model predicted that an early-phase, AKT-associated signal downstream of insulin repressed a set of transcripts induced by TNF. Through bioinformatics and cell-based experiments, we identified the AKT-repressed signal as glycogen synthase kinase 3 (GSK3)-catalyzed phosphorylation of Ser(37) on the long form of the transcription factor GATA6. Phosphorylation of GATA6 on Ser(37) promoted its degradation, thereby preventing GATA6 from repressing transcripts that are induced by TNF and attenuated by insulin. Our analysis showed that predictive tensor modeling of proteomic and transcriptomic data sets can uncover pathway crosstalk that produces specific patterns of gene expression in cells receiving multiple stimuli.
Collapse
Affiliation(s)
- Zeinab Chitforoushzadeh
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA. Department of Pharmacology, University of Virginia, Charlottesville, VA 22908, USA
| | - Zi Ye
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Ziran Sheng
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Silvia LaRue
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Rebecca C Fry
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Douglas A Lauffenburger
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Kevin A Janes
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA.
| |
Collapse
|
34
|
McManus J, Cheng Z, Vogel C. Next-generation analysis of gene expression regulation--comparing the roles of synthesis and degradation. MOLECULAR BIOSYSTEMS 2015; 11:2680-9. [PMID: 26259698 PMCID: PMC4573910 DOI: 10.1039/c5mb00310e] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Technological advances now enable routine measurement of mRNA and protein abundances, and estimates of their rates of synthesis and degradation that inform on their values and the degree of change in response to stimuli. Importantly, more and more data on time-series experiments are emerging, e.g. of cells responding to stress, enabling first insights into a new dimension of gene expression regulation - its dynamics and how it allows for very different response signals across genes. This review discusses recently published methods and datasets, their impact on what we now know about the relationships between concentrations and synthesis rates of mRNAs and proteins in yeast and mammalian cells, their evolution, and new hypotheses on translation regulatory mechanisms generated by approaches that involve ribosome footprinting.
Collapse
Affiliation(s)
- Joel McManus
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA.
| | | | | |
Collapse
|
35
|
Luo Y, Xin Y, Hochberg E, Joshi R, Uzuner O, Szolovits P. Subgraph augmented non-negative tensor factorization (SANTF) for modeling clinical narrative text. J Am Med Inform Assoc 2015; 22:1009-19. [PMID: 25862765 PMCID: PMC4986663 DOI: 10.1093/jamia/ocv016] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2014] [Revised: 01/18/2015] [Accepted: 02/16/2015] [Indexed: 02/04/2023] Open
Abstract
OBJECTIVE Extracting medical knowledge from electronic medical records requires automated approaches to combat scalability limitations and selection biases. However, existing machine learning approaches are often regarded by clinicians as black boxes. Moreover, training data for these automated approaches at often sparsely annotated at best. The authors target unsupervised learning for modeling clinical narrative text, aiming at improving both accuracy and interpretability. METHODS The authors introduce a novel framework named subgraph augmented non-negative tensor factorization (SANTF). In addition to relying on atomic features (e.g., words in clinical narrative text), SANTF automatically mines higher-order features (e.g., relations of lymphoid cells expressing antigens) from clinical narrative text by converting sentences into a graph representation and identifying important subgraphs. The authors compose a tensor using patients, higher-order features, and atomic features as its respective modes. We then apply non-negative tensor factorization to cluster patients, and simultaneously identify latent groups of higher-order features that link to patient clusters, as in clinical guidelines where a panel of immunophenotypic features and laboratory results are used to specify diagnostic criteria. RESULTS AND CONCLUSION SANTF demonstrated over 10% improvement in averaged F-measure on patient clustering compared to widely used non-negative matrix factorization (NMF) and k-means clustering methods. Multiple baselines were established by modeling patient data using patient-by-features matrices with different feature configurations and then performing NMF or k-means to cluster patients. Feature analysis identified latent groups of higher-order features that lead to medical insights. We also found that the latent groups of atomic features help to better correlate the latent groups of higher-order features.
Collapse
Affiliation(s)
- Yuan Luo
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology
| | - Yu Xin
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology
| | - Ephraim Hochberg
- Center for Lymphoma, Massachusetts General Hospital and Department of Medicine, Harvard Medical School
| | - Rohit Joshi
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology
| | - Ozlem Uzuner
- Department of Information Studies, State University of New York at Albany
| | - Peter Szolovits
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology
| |
Collapse
|
36
|
Sankaranarayanan P, Schomay TE, Aiello KA, Alter O. Tensor GSVD of patient- and platform-matched tumor and normal DNA copy-number profiles uncovers chromosome arm-wide patterns of tumor-exclusive platform-consistent alterations encoding for cell transformation and predicting ovarian cancer survival. PLoS One 2015; 10:e0121396. [PMID: 25875127 PMCID: PMC4398562 DOI: 10.1371/journal.pone.0121396] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 01/31/2015] [Indexed: 11/28/2022] Open
Abstract
The number of large-scale high-dimensional datasets recording different aspects of a single disease is growing, accompanied by a need for frameworks that can create one coherent model from multiple tensors of matched columns, e.g., patients and platforms, but independent rows, e.g., probes. We define and prove the mathematical properties of a novel tensor generalized singular value decomposition (GSVD), which can simultaneously find the similarities and dissimilarities, i.e., patterns of varying relative significance, between any two such tensors. We demonstrate the tensor GSVD in comparative modeling of patient- and platform-matched but probe-independent ovarian serous cystadenocarcinoma (OV) tumor, mostly high-grade, and normal DNA copy-number profiles, across each chromosome arm, and combination of two arms, separately. The modeling uncovers previously unrecognized patterns of tumor-exclusive platform-consistent co-occurring copy-number alterations (CNAs). We find, first, and validate that each of the patterns across only 7p and Xq, and the combination of 6p+12p, is correlated with a patient’s prognosis, is independent of the tumor’s stage, the best predictor of OV survival to date, and together with stage makes a better predictor than stage alone. Second, these patterns include most known OV-associated CNAs that map to these chromosome arms, as well as several previously unreported, yet frequent focal CNAs. Third, differential mRNA, microRNA, and protein expression consistently map to the DNA CNAs. A coherent picture emerges for each pattern, suggesting roles for the CNAs in OV pathogenesis and personalized therapy. In 6p+12p, deletion of the p21-encoding CDKN1A and p38-encoding MAPK14 and amplification of RAD51AP1 and KRAS encode for human cell transformation, and are correlated with a cell’s immortality, and a patient’s shorter survival time. In 7p, RPA3 deletion and POLD2 amplification are correlated with DNA stability, and a longer survival. In Xq, PABPC5 deletion and BCAP31 amplification are correlated with a cellular immune response, and a longer survival.
Collapse
MESH Headings
- Carcinoma, Ovarian Epithelial
- Cell Transformation, Neoplastic/genetics
- Chromosome Mapping
- Chromosomes/genetics
- Cystadenocarcinoma, Serous/diagnosis
- Cystadenocarcinoma, Serous/genetics
- Cystadenocarcinoma, Serous/pathology
- DNA Copy Number Variations/genetics
- Female
- Gene Expression Regulation, Neoplastic
- Humans
- MicroRNAs/biosynthesis
- Models, Theoretical
- Mutation
- Neoplasm Proteins/biosynthesis
- Neoplasms, Glandular and Epithelial/diagnosis
- Neoplasms, Glandular and Epithelial/genetics
- Neoplasms, Glandular and Epithelial/pathology
- Ovarian Neoplasms/diagnosis
- Ovarian Neoplasms/genetics
- Ovarian Neoplasms/pathology
- Prognosis
- RNA, Messenger/biosynthesis
- RNA, Messenger/genetics
- Survival Analysis
Collapse
Affiliation(s)
- Preethi Sankaranarayanan
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
| | - Theodore E. Schomay
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
| | - Katherine A. Aiello
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
| | - Orly Alter
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
- Department of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
- * E-mail:
| |
Collapse
|
37
|
Samuels BA, Leonardo ED, Dranovsky A, Williams A, Wong E, Nesbitt AMI, McCurdy RD, Hen R, Alter M. Global state measures of the dentate gyrus gene expression system predict antidepressant-sensitive behaviors. PLoS One 2014; 9:e85136. [PMID: 24465494 PMCID: PMC3894967 DOI: 10.1371/journal.pone.0085136] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2013] [Accepted: 11/23/2013] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Selective serotonin reuptake inhibitors (SSRIs) such as fluoxetine are the most common form of medication treatment for major depression. However, approximately 50% of depressed patients fail to achieve an effective treatment response. Understanding how gene expression systems respond to treatments may be critical for understanding antidepressant resistance. METHODS We take a novel approach to this problem by demonstrating that the gene expression system of the dentate gyrus responds to fluoxetine (FLX), a commonly used antidepressant medication, in a stereotyped-manner involving changes in the expression levels of thousands of genes. The aggregate behavior of this large-scale systemic response was quantified with principal components analysis (PCA) yielding a single quantitative measure of the global gene expression system state. RESULTS Quantitative measures of system state were highly correlated with variability in levels of antidepressant-sensitive behaviors in a mouse model of depression treated with fluoxetine. Analysis of dorsal and ventral dentate samples in the same mice indicated that system state co-varied across these regions despite their reported functional differences. Aggregate measures of gene expression system state were very robust and remained unchanged when different microarray data processing algorithms were used and even when completely different sets of gene expression levels were used for their calculation. CONCLUSIONS System state measures provide a robust method to quantify and relate global gene expression system state variability to behavior and treatment. State variability also suggests that the diversity of reported changes in gene expression levels in response to treatments such as fluoxetine may represent different perspectives on unified but noisy global gene expression system state level responses. Studying regulation of gene expression systems at the state level may be useful in guiding new approaches to augmentation of traditional antidepressant treatments.
Collapse
Affiliation(s)
- Benjamin A. Samuels
- Departments of Psychiatry and Neuroscience, Columbia University, New York, New York, United States of America
| | - E. David Leonardo
- Departments of Psychiatry and Neuroscience, Columbia University, New York, New York, United States of America
| | - Alex Dranovsky
- Departments of Psychiatry and Neuroscience, Columbia University, New York, New York, United States of America
| | - Amanda Williams
- AstraZeneca Pharmaceuticals, CNS Discovery, Wilmington, Delaware, United States of America
| | - Erik Wong
- AstraZeneca Pharmaceuticals, CNS Discovery, Wilmington, Delaware, United States of America
| | - Addie May I. Nesbitt
- Center for Neurobiology and Behavior, Department of Psychiatry, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Richard D. McCurdy
- Center for Neurobiology and Behavior, Department of Psychiatry, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Rene Hen
- Departments of Psychiatry and Neuroscience, Columbia University, New York, New York, United States of America
- * E-mail: (MA); (RH)
| | - Mark Alter
- Center for Neurobiology and Behavior, Department of Psychiatry, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- * E-mail: (MA); (RH)
| |
Collapse
|
38
|
Abstract
BACKGROUND Identifying modules from time series biological data helps us understand biological functionalities of a group of proteins/genes interacting together and how responses of these proteins/genes dynamically change with respect to time. With rapid acquisition of time series biological data from different laboratories or databases, new challenges are posed for the identification task and powerful methods which are able to detect modules with integrative analysis are urgently called for. To accomplish such integrative analysis, we assemble multiple time series biological data into a higher-order form, e.g., a gene × condition × time tensor. It is interesting and useful to develop methods to identify modules from this tensor. RESULTS In this paper, we present MultiFacTV, a new method to find modules from higher-order time series biological data. This method employs a tensor factorization objective function where a time-related total variation regularization term is incorporated. According to factorization results, MultiFacTV extracts modules that are composed of some genes, conditions and time-points. We have performed MultiFacTV on synthetic datasets and the results have shown that MultiFacTV outperforms existing methods EDISA and Metafac. Moreover, we have applied MultiFacTV to Arabidopsis thaliana root(shoot) tissue dataset represented as a gene × condition × time tensor of size 2395 × 9 × 6(3454 × 8 × 6), to Yeast dataset and Homo sapiens dataset represented as tensors of sizes 4425 × 6 × 6 and 2920 × 14 × 9 respectively. The results have shown that MultiFacTV indeed identifies some interesting modules in these datasets, which have been validated and explained by Gene Ontology analysis with DAVID or other analysis. CONCLUSION Experimental results on both synthetic datasets and real datasets show that the proposed MultiFacTV is effective in identifying modules for higher-order time series biological data. It provides, compared to traditional non-integrative analysis methods, a more comprehensive and better view on biological process since modules composed of more than two types of biological variables could be identified and analyzed.
Collapse
Affiliation(s)
- Xutao Li
- Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, 518055, China
- Shenzhen Key Laboratory of Internet Information Collaboration, Shenzhen, 518055, China
| | - Yunming Ye
- Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, 518055, China
- Shenzhen Key Laboratory of Internet Information Collaboration, Shenzhen, 518055, China
| | - Michael Ng
- Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Qingyao Wu
- Department of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, 518055, China
- Shenzhen Key Laboratory of Internet Information Collaboration, Shenzhen, 518055, China
| |
Collapse
|
39
|
Strakova E, Bobek J, Zikova A, Vohradsky J. Global features of gene expression on the proteome and transcriptome levels in S. coelicolor during germination. PLoS One 2013; 8:e72842. [PMID: 24039809 PMCID: PMC3767685 DOI: 10.1371/journal.pone.0072842] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2013] [Accepted: 07/15/2013] [Indexed: 11/18/2022] Open
Abstract
Streptomycetes have been studied mostly as producers of secondary metabolites, while the transition from dormant spores to an exponentially growing culture has largely been ignored. Here, we focus on a comparative analysis of fluorescently and radioactively labeled proteome and microarray acquired transcriptome expressed during the germination of Streptomyces coelicolor. The time-dynamics is considered, starting from dormant spores through 5.5 hours of growth with 13 time points. Time series of the gene expressions were analyzed using correlation, principal components analysis and an analysis of coding genes utilization. Principal component analysis was used to identify principal kinetic trends in gene expression and the corresponding genes driving S. coelicolor germination. In contrast with the correlation analysis, global trends in the gene/protein expression reflected by the first principal components showed that the prominent patterns in both the protein and the mRNA domains are surprisingly well correlated. Analysis of the number of expressed genes identified functional groups activated during different time intervals of the germination.
Collapse
Affiliation(s)
- Eva Strakova
- Laboratory of Bioinformatics, Institute of Microbiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
| | - Jan Bobek
- Laboratory of Bioinformatics, Institute of Microbiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
- Institute of Immunology and Microbiology, First Faculty of Medicine, Charles University in Prague, Prague, Czech Republic
| | - Alice Zikova
- Laboratory of Bioinformatics, Institute of Microbiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
| | - Jiri Vohradsky
- Laboratory of Bioinformatics, Institute of Microbiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
- * E-mail:
| |
Collapse
|
40
|
Jacklin N, Ding Z, Chen W, Chang C. Noniterative convex optimization methods for network component analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1472-1481. [PMID: 22641712 DOI: 10.1109/tcbb.2012.81] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
This work studies the reconstruction of gene regulatory networks by the means of network component analysis (NCA). We will expound a family of convex optimization-based methods for estimating the transcription factor control strengths and the transcription factor activities (TFAs). The approach taken in this work is to decompose the problem into a network connectivity strength estimation phase and a transcription factor activity estimation phase. In the control strength estimation phase, we formulate a new subspace-based method incorporating a choice of multiple error metrics. For the source estimation phase we propose a total least squares (TLS) formulation that generalizes many existing methods. Both estimation procedures are noniterative and yield the optimal estimates according to various proposed error metrics. We test the performance of the proposed algorithms on simulated data and experimental gene expression data for the yeast Saccharomyces cerevisiae and demonstrate that the proposed algorithms have superior effectiveness in comparison with both Bayesian Decomposition (BD) and our previous FastNCA approach, while the computational complexity is still orders of magnitude less than BD.
Collapse
Affiliation(s)
- Neil Jacklin
- Department of Electrical and Computer Engineering, University of California, Davis, CA 95616, USA.
| | | | | | | |
Collapse
|
41
|
Li W, Zhang S, Liu CC, Zhou XJ. Identifying multi-layer gene regulatory modules from multi-dimensional genomic data. Bioinformatics 2012; 28:2458-66. [PMID: 22863767 PMCID: PMC3463121 DOI: 10.1093/bioinformatics/bts476] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Motivation: Eukaryotic gene expression (GE) is subjected to precisely coordinated multi-layer controls, across the levels of epigenetic, transcriptional and post-transcriptional regulations. Recently, the emerging multi-dimensional genomic dataset has provided unprecedented opportunities to study the cross-layer regulatory interplay. In these datasets, the same set of samples is profiled on several layers of genomic activities, e.g. copy number variation (CNV), DNA methylation (DM), GE and microRNA expression (ME). However, suitable analysis methods for such data are currently sparse. Results: In this article, we introduced a sparse Multi-Block Partial Least Squares (sMBPLS) regression method to identify multi-dimensional regulatory modules from this new type of data. A multi-dimensional regulatory module contains sets of regulatory factors from different layers that are likely to jointly contribute to a local ‘gene expression factory’. We demonstrated the performance of our method on the simulated data as well as on The Cancer Genomic Atlas Ovarian Cancer datasets including the CNV, DM, ME and GE data measured on 230 samples. We showed that majority of identified modules have significant functional and transcriptional enrichment, higher than that observed in modules identified using only a single type of genomic data. Our network analysis of the modules revealed that the CNV, DM and microRNA can have coupled impact on expression of important oncogenes and tumor suppressor genes. Availability and implementation: The source code implemented by MATLAB is freely available at: http://zhoulab.usc.edu/sMBPLS/. Contact:xjzhou@usc.edu Supplementary information:Supplementary material are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenyuan Li
- Program in Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | | | | | | |
Collapse
|
42
|
Lee CH, Alpert BO, Sankaranarayanan P, Alter O. GSVD comparison of patient-matched normal and tumor aCGH profiles reveals global copy-number alterations predicting glioblastoma multiforme survival. PLoS One 2012; 7:e30098. [PMID: 22291905 PMCID: PMC3264559 DOI: 10.1371/journal.pone.0030098] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Accepted: 12/09/2011] [Indexed: 11/18/2022] Open
Abstract
Despite recent large-scale profiling efforts, the best prognostic predictor of glioblastoma multiforme (GBM) remains the patient's age at diagnosis. We describe a global pattern of tumor-exclusive co-occurring copy-number alterations (CNAs) that is correlated, possibly coordinated with GBM patients' survival and response to chemotherapy. The pattern is revealed by GSVD comparison of patient-matched but probe-independent GBM and normal aCGH datasets from The Cancer Genome Atlas (TCGA). We find that, first, the GSVD, formulated as a framework for comparatively modeling two composite datasets, removes from the pattern copy-number variations (CNVs) that occur in the normal human genome (e.g., female-specific X chromosome amplification) and experimental variations (e.g., in tissue batch, genomic center, hybridization date and scanner), without a-priori knowledge of these variations. Second, the pattern includes most known GBM-associated changes in chromosome numbers and focal CNAs, as well as several previously unreported CNAs in >3% of the patients. These include the biochemically putative drug target, cell cycle-regulated serine/threonine kinase-encoding TLK2, the cyclin E1-encoding CCNE1, and the Rb-binding histone demethylase-encoding KDM5A. Third, the pattern provides a better prognostic predictor than the chromosome numbers or any one focal CNA that it identifies, suggesting that the GBM survival phenotype is an outcome of its global genotype. The pattern is independent of age, and combined with age, makes a better predictor than age alone. GSVD comparison of matched profiles of a larger set of TCGA patients, inclusive of the initial set, confirms the global pattern. GSVD classification of the GBM profiles of an independent set of patients validates the prognostic contribution of the pattern.
Collapse
Affiliation(s)
- Cheng H. Lee
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
| | - Benjamin O. Alpert
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
| | - Preethi Sankaranarayanan
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
| | - Orly Alter
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
- Department of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
| |
Collapse
|
43
|
Lee HC, Lin BL, Chang WH, Tu IP. Toward automated denoising of single molecular Förster resonance energy transfer data. JOURNAL OF BIOMEDICAL OPTICS 2012; 17:011007. [PMID: 22352641 DOI: 10.1117/1.jbo.17.1.011007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
A wide-field two-channel fluorescence microscope is a powerful tool as it allows for the study of conformation dynamics of hundreds to thousands of immobilized single molecules by Förster resonance energy transfer (FRET) signals. To date, the data reduction from a movie to a final set containing meaningful single-molecule FRET (smFRET) traces involves human inspection and intervention at several critical steps, greatly hampering the efficiency at the post-imaging stage. To facilitate the data reduction from smFRET movies to smFRET traces and to address the noise-limited issues, we developed a statistical denoising system toward fully automated processing. This data reduction system has embedded several novel approaches. First, as to background subtraction, high-order singular value decomposition (HOSVD) method is employed to extract spatial and temporal features. Second, to register and map the two color channels, the spots representing bleeding through the donor channel to the acceptor channel are used. Finally, correlation analysis and likelihood ratio statistic for the change point detection (CPD) are developed to study the two channels simultaneously, resolve FRET states, and report the dwelling time of each state. The performance of our method has been checked using both simulation and real data.
Collapse
Affiliation(s)
- Hao-Chih Lee
- Academia Sinica, Institute of Statistical Science, Taipei, Taiwan
| | | | | | | |
Collapse
|
44
|
Ponnapalli SP, Saunders MA, Van Loan CF, Alter O. A higher-order generalized singular value decomposition for comparison of global mRNA expression from multiple organisms. PLoS One 2011; 6:e28072. [PMID: 22216090 PMCID: PMC3245232 DOI: 10.1371/journal.pone.0028072] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2011] [Accepted: 10/31/2011] [Indexed: 11/18/2022] Open
Abstract
The number of high-dimensional datasets recording multiple aspects of a single phenomenon is increasing in many areas of science, accompanied by a need for mathematical frameworks that can compare multiple large-scale matrices with different row dimensions. The only such framework to date, the generalized singular value decomposition (GSVD), is limited to two matrices. We mathematically define a higher-order GSVD (HO GSVD) for N≥2 matrices , each with full column rank. Each matrix is exactly factored as Di = UiΣiVT, where V, identical in all factorizations, is obtained from the eigensystem SV = VΛ of the arithmetic mean S of all pairwise quotients of the matrices , i≠j. We prove that this decomposition extends to higher orders almost all of the mathematical properties of the GSVD. The matrix S is nondefective with V and Λ real. Its eigenvalues satisfy λk≥1. Equality holds if and only if the corresponding eigenvector vk is a right basis vector of equal significance in all matrices Di and Dj, that is σi,k/σj,k = 1 for all i and j, and the corresponding left basis vector ui,k is orthogonal to all other vectors in Ui for all i. The eigenvalues λk = 1, therefore, define the “common HO GSVD subspace.” We illustrate the HO GSVD with a comparison of genome-scale cell-cycle mRNA expression from S. pombe, S. cerevisiae and human. Unlike existing algorithms, a mapping among the genes of these disparate organisms is not required. We find that the approximately common HO GSVD subspace represents the cell-cycle mRNA expression oscillations, which are similar among the datasets. Simultaneous reconstruction in the common subspace, therefore, removes the experimental artifacts, which are dissimilar, from the datasets. In the simultaneous sequence-independent classification of the genes of the three organisms in this common subspace, genes of highly conserved sequences but significantly different cell-cycle peak times are correctly classified.
Collapse
Affiliation(s)
- Sri Priya Ponnapalli
- Department of Electrical and Computer Engineering, University of Texas at Austin, Texas, United States of America
| | - Michael A. Saunders
- Department of Management Science and Engineering, Stanford University, Stanford, California, United States of America
| | - Charles F. Van Loan
- Department of Computer Science, Cornell University, Ithaca, New York, United States of America
| | - Orly Alter
- Scientific Computing and Imaging (SCI) Institute and Departments of Bioengineering and Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
- * E-mail:
| |
Collapse
|
45
|
|
46
|
Ji S. Computational network analysis of the anatomical and genetic organizations in the mouse brain. Bioinformatics 2011; 27:3293-9. [DOI: 10.1093/bioinformatics/btr558] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
47
|
Ozcaglar C, Shabbeer A, Vandenberg S, Yener B, Bennett KP. Sublineage structure analysis of Mycobacterium tuberculosis complex strains using multiple-biomarker tensors. BMC Genomics 2011; 12 Suppl 2:S1. [PMID: 21988942 PMCID: PMC3194230 DOI: 10.1186/1471-2164-12-s2-s1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Strains of Mycobacterium tuberculosis complex (MTBC) can be classified into major lineages based on their genotype. Further subdivision of major lineages into sublineages requires multiple biomarkers along with methods to combine and analyze multiple sources of information in one unsupervised learning model. Typically, spacer oligonucleotide type (spoligotype) and mycobacterial interspersed repetitive units (MIRU) are used for TB genotyping and surveillance. Here, we examine the sublineage structure of MTBC strains with multiple biomarkers simultaneously, by employing a tensor clustering framework (TCF) on multiple-biomarker tensors. RESULTS Simultaneous analysis of the spoligotype and MIRU type of strains using TCF on multiple-biomarker tensors leads to coherent sublineages of major lineages with clear and distinctive spoligotype and MIRU signatures. Comparison of tensor sublineages with SpolDB4 families either supports tensor sublineages, or suggests subdivision or merging of SpolDB4 families. High prediction accuracy of major lineage classification with supervised tensor learning on multiple-biomarker tensors validates our unsupervised analysis of sublineages on multiple-biomarker tensors. CONCLUSIONS TCF on multiple-biomarker tensors achieves simultaneous analysis of multiple biomarkers and suggest a new putative sublineage structure for each major lineage. Analysis of multiple-biomarker tensors gives insight into the sublineage structure of MTBC at the genomic level.
Collapse
Affiliation(s)
- Cagri Ozcaglar
- Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Amina Shabbeer
- Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Scott Vandenberg
- Computer Science Department, Siena College, Loudonville, NY, USA
| | - Bülent Yener
- Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Kristin P Bennett
- Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, USA
- Mathematical Sciences Department, Rensselaer Polytechnic Institute, Troy, NY, USA
| |
Collapse
|
48
|
Ding Q, MacAlpine DM. Defining the replication program through the chromatin landscape. Crit Rev Biochem Mol Biol 2011; 46:165-79. [PMID: 21417598 DOI: 10.3109/10409238.2011.560139] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
DNA replication is an essential cell cycle event required for the accurate and timely duplication of the chromosomes. It is essential that the genome is replicated accurately and completely within the confines of S-phase. Failure to completely copy the genome has the potential to result in catastrophic genomic instability. Replication initiates in a coordinated manner from multiple locations, termed origins of replication, distributed across each of the chromosomes. The selection of these origins of replication is a dynamic process responding to both developmental and tissue-specific signals. In this review, we explore the role of the local chromatin environment in regulating the DNA replication program at the level of origin selection and activation. Finally, there is increasing molecular evidence that the DNA replication program itself affects the chromatin landscape, suggesting that DNA replication is critical for both genetic and epigenetic inheritance.
Collapse
Affiliation(s)
- Queying Ding
- Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, NC 27710, USA
| | | |
Collapse
|
49
|
Li W, Liu CC, Zhang T, Li H, Waterman MS, Zhou XJ. Integrative analysis of many weighted co-expression networks using tensor computation. PLoS Comput Biol 2011; 7:e1001106. [PMID: 21698123 PMCID: PMC3116899 DOI: 10.1371/journal.pcbi.1001106] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2010] [Accepted: 02/08/2011] [Indexed: 11/18/2022] Open
Abstract
The rapid accumulation of biological networks poses new challenges and calls for powerful integrative analysis tools. Most existing methods capable of simultaneously analyzing a large number of networks were primarily designed for unweighted networks, and cannot easily be extended to weighted networks. However, it is known that transforming weighted into unweighted networks by dichotomizing the edges of weighted networks with a threshold generally leads to information loss. We have developed a novel, tensor-based computational framework for mining recurrent heavy subgraphs in a large set of massive weighted networks. Specifically, we formulate the recurrent heavy subgraph identification problem as a heavy 3D subtensor discovery problem with sparse constraints. We describe an effective approach to solving this problem by designing a multi-stage, convex relaxation protocol, and a non-uniform edge sampling technique. We applied our method to 130 co-expression networks, and identified 11,394 recurrent heavy subgraphs, grouped into 2,810 families. We demonstrated that the identified subgraphs represent meaningful biological modules by validating against a large set of compiled biological knowledge bases. We also showed that the likelihood for a heavy subgraph to be meaningful increases significantly with its recurrence in multiple networks, highlighting the importance of the integrative approach to biological network analysis. Moreover, our approach based on weighted graphs detects many patterns that would be overlooked using unweighted graphs. In addition, we identified a large number of modules that occur predominately under specific phenotypes. This analysis resulted in a genome-wide mapping of gene network modules onto the phenome. Finally, by comparing module activities across many datasets, we discovered high-order dynamic cooperativeness in protein complex networks and transcriptional regulatory networks.
Collapse
Affiliation(s)
- Wenyuan Li
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Chun-Chi Liu
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Tong Zhang
- Department of Statistics, Rutgers University, New Brunswick, New Jersey, United States of America
| | - Haifeng Li
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Michael S. Waterman
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Xianghong Jasmine Zhou
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
50
|
Li R, Ackerman WE, Summerfield TL, Yu L, Gulati P, Zhang J, Huang K, Romero R, Kniss DA. Inflammatory gene regulatory networks in amnion cells following cytokine stimulation: translational systems approach to modeling human parturition. PLoS One 2011; 6:e20560. [PMID: 21655103 PMCID: PMC3107214 DOI: 10.1371/journal.pone.0020560] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2011] [Accepted: 05/05/2011] [Indexed: 11/18/2022] Open
Abstract
A majority of the studies examining the molecular regulation of human labor have been conducted using single gene approaches. While the technology to produce multi-dimensional datasets is readily available, the means for facile analysis of such data are limited. The objective of this study was to develop a systems approach to infer regulatory mechanisms governing global gene expression in cytokine-challenged cells in vitro, and to apply these methods to predict gene regulatory networks (GRNs) in intrauterine tissues during term parturition. To this end, microarray analysis was applied to human amnion mesenchymal cells (AMCs) stimulated with interleukin-1β, and differentially expressed transcripts were subjected to hierarchical clustering, temporal expression profiling, and motif enrichment analysis, from which a GRN was constructed. These methods were then applied to fetal membrane specimens collected in the absence or presence of spontaneous term labor. Analysis of cytokine-responsive genes in AMCs revealed a sterile immune response signature, with promoters enriched in response elements for several inflammation-associated transcription factors. In comparison to the fetal membrane dataset, there were 34 genes commonly upregulated, many of which were part of an acute inflammation gene expression signature. Binding motifs for nuclear factor-κB were prominent in the gene interaction and regulatory networks for both datasets; however, we found little evidence to support the utilization of pathogen-associated molecular pattern (PAMP) signaling. The tissue specimens were also enriched for transcripts governed by hypoxia-inducible factor. The approach presented here provides an uncomplicated means to infer global relationships among gene clusters involved in cellular responses to labor-associated signals.
Collapse
Affiliation(s)
- Ruth Li
- Division of Maternal-Fetal Medicine and Laboratory of Perinatal Research,
Department of Obstetrics and Gynecology, The Ohio State University, Columbus,
Ohio, United States of America
| | - William E. Ackerman
- Division of Maternal-Fetal Medicine and Laboratory of Perinatal Research,
Department of Obstetrics and Gynecology, The Ohio State University, Columbus,
Ohio, United States of America
| | - Taryn L. Summerfield
- Division of Maternal-Fetal Medicine and Laboratory of Perinatal Research,
Department of Obstetrics and Gynecology, The Ohio State University, Columbus,
Ohio, United States of America
| | - Lianbo Yu
- Center for Biostatistics, The Ohio State University, Columbus, Ohio,
United States of America
| | - Parul Gulati
- Center for Biostatistics, The Ohio State University, Columbus, Ohio,
United States of America
| | - Jie Zhang
- Department of Biomedical Informatics, The Ohio State University,
Columbus, Ohio, United States of America
| | - Kun Huang
- Department of Biomedical Informatics, The Ohio State University,
Columbus, Ohio, United States of America
| | - Roberto Romero
- Perinatology Research Branch, Intramural Division, Eunice Kennedy Shriver
National Institute of Child Health and Human Development, National Institutes of
Health, Department of Health and Human Services, Bethesda, Maryland, United
States of America
- Hutzel Women's Hospital, Detroit, Michigan, United States of
America
| | - Douglas A. Kniss
- Division of Maternal-Fetal Medicine and Laboratory of Perinatal Research,
Department of Obstetrics and Gynecology, The Ohio State University, Columbus,
Ohio, United States of America
- Department of Biomedical Engineering, The Ohio State University,
Columbus, Ohio, United States of America
- * E-mail:
| |
Collapse
|