1
|
Stein-O'Brien GL, Arora R, Culhane AC, Favorov AV, Garmire LX, Greene CS, Goff LA, Li Y, Ngom A, Ochs MF, Xu Y, Fertig EJ. Enter the Matrix: Factorization Uncovers Knowledge from Omics. Trends Genet 2018; 34:790-805. [PMID: 30143323 PMCID: PMC6309559 DOI: 10.1016/j.tig.2018.07.003] [Citation(s) in RCA: 100] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 06/01/2018] [Accepted: 07/16/2018] [Indexed: 12/20/2022]
Abstract
Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.
Collapse
Affiliation(s)
- Genevieve L Stein-O'Brien
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA; Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Raman Arora
- Department of Computer Science, Institute for Data Intensive Engineering and Science, Johns Hopkins University, Baltimore, MD, USA
| | - Aedin C Culhane
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Alexander V Favorov
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA; Vavilov Institute of General Genetics, Moscow, Russia
| | | | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, PA, USA; Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, PA, USA
| | - Loyal A Goff
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Yifeng Li
- Digital Technologies Research Centre, National Research Council of Canada, Ottawa, ON, Canada
| | - Aloune Ngom
- School of Computer Science, University of Windsor, Windsor, ON, Canada
| | - Michael F Ochs
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ, USA
| | - Yanxun Xu
- Department of Applied Mathematics and Statistics, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Elana J Fertig
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
2
|
|
3
|
Etges WJ, Trotter MV, de Oliveira CC, Rajpurohit S, Gibbs AG, Tuljapurkar S. Deciphering life history transcriptomes in different environments. Mol Ecol 2014; 24:151-79. [PMID: 25442828 DOI: 10.1111/mec.13017] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2014] [Revised: 10/27/2014] [Accepted: 11/22/2014] [Indexed: 12/25/2022]
Abstract
We compared whole transcriptome variation in six pre-adult stages and seven adult female ages in two populations of cactophilic Drosophila mojavensis reared on two host plants to understand how differences in gene expression influence standing life history variation. We used singular value decomposition (SVD) to identify dominant trajectories of life cycle gene expression variation, performed pairwise comparisons of stage and age differences in gene expression across the life cycle, identified when genes exhibited maximum levels of life cycle gene expression, and assessed population and host cactus effects on gene expression. Life cycle SVD analysis returned four significant components of transcriptional variation, revealing functional enrichment of genes responsible for growth, metabolic function, sensory perception, neural function, translation and ageing. Host cactus effects on female gene expression revealed population- and stage-specific differences, including significant host plant effects on larval metabolism and development, as well as adult neurotransmitter binding and courtship behaviour gene expression levels. In 3- to 6-day-old virgin females, significant upregulation of genes associated with meiosis and oogenesis was accompanied by downregulation of genes associated with somatic maintenance, evidence for a life history trade-off. The transcriptome of D. mojavensis reared in natural environments throughout its life cycle revealed core developmental transitions and genome-wide influences on life history variation in natural populations.
Collapse
Affiliation(s)
- William J Etges
- Program in Ecology and Evolutionary Biology, Dept. of Biological Sciences, University of Arkansas, Fayetteville, AR, 72701, USA
| | | | | | | | | | | |
Collapse
|
4
|
Hütt MT. Understanding genetic variation - the value of systems biology. Br J Clin Pharmacol 2014; 77:597-605. [PMID: 24725073 DOI: 10.1111/bcp.12266] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Accepted: 09/25/2013] [Indexed: 12/20/2022] Open
Abstract
Pharmacology is currently transformed by the vast amounts of genome-associated information available for system-level interpretation. Here I review the potential of systems biology to facilitate this interpretation, thus paving the way for the emerging field of systems pharmacology. In particular, I will show how gene regulatory and metabolic networks can serve as a framework for interpreting high throughput data and as an interface to detailed dynamical models. In addition to the established connectivity analyses of effective networks, I suggest here to also analyze higher order architectural properties of effective networks.
Collapse
Affiliation(s)
- Marc-Thorsten Hütt
- School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, D-28759, Bremen, Germany
| |
Collapse
|
5
|
Coordinated metabolic transitions during Drosophila embryogenesis and the onset of aerobic glycolysis. G3-GENES GENOMES GENETICS 2014; 4:839-50. [PMID: 24622332 PMCID: PMC4025483 DOI: 10.1534/g3.114.010652] [Citation(s) in RCA: 92] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Rapidly proliferating cells such as cancer cells and embryonic stem cells rely on a specialized metabolic program known as aerobic glycolysis, which supports biomass production from carbohydrates. The fruit fly Drosophila melanogaster also utilizes aerobic glycolysis to support the rapid growth that occurs during larval development. Here we use singular value decomposition analysis of modENCODE RNA-seq data combined with GC-MS-based metabolomic analysis to analyze the changes in gene expression and metabolism that occur during Drosophila embryogenesis, spanning the onset of aerobic glycolysis. Unexpectedly, we find that the most common pattern of co-expressed genes in embryos includes the global switch to glycolytic gene expression that occurs midway through embryogenesis. In contrast to the canonical aerobic glycolytic pathway, however, which is accompanied by reduced mitochondrial oxidative metabolism, the expression of genes involved in the tricarboxylic cycle (TCA cycle) and the electron transport chain are also upregulated at this time. Mitochondrial activity, however, appears to be attenuated, as embryos exhibit a block in the TCA cycle that results in elevated levels of citrate, isocitrate, and α-ketoglutarate. We also find that genes involved in lipid breakdown and β-oxidation are upregulated prior to the transcriptional initiation of glycolysis, but are downregulated before the onset of larval development, revealing coordinated use of lipids and carbohydrates during development. These observations demonstrate the efficient use of nutrient stores to support embryonic development, define sequential metabolic transitions during this stage, and demonstrate striking similarities between the metabolic state of late-stage fly embryos and tumor cells.
Collapse
|
6
|
SVD identifies transcript length distribution functions from DNA microarray data and reveals evolutionary forces globally affecting GBM metabolism. PLoS One 2013; 8:e78913. [PMID: 24282503 PMCID: PMC3839928 DOI: 10.1371/journal.pone.0078913] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2013] [Accepted: 09/25/2013] [Indexed: 01/10/2023] Open
Abstract
To search for evolutionary forces that might act upon transcript length, we use the singular value decomposition (SVD) to identify the length distribution functions of sets and subsets of human and yeast transcripts from profiles of mRNA abundance levels across gel electrophoresis migration distances that were previously measured by DNA microarrays. We show that the SVD identifies the transcript length distribution functions as “asymmetric generalized coherent states” from the DNA microarray data and with no a-priori assumptions. Comparing subsets of human and yeast transcripts of the same gene ontology annotations, we find that in both disparate eukaryotes, transcripts involved in protein synthesis or mitochondrial metabolism are significantly shorter than typical, and in particular, significantly shorter than those involved in glucose metabolism. Comparing the subsets of human transcripts that are overexpressed in glioblastoma multiforme (GBM) or normal brain tissue samples from The Cancer Genome Atlas, we find that GBM maintains normal brain overexpression of significantly short transcripts, enriched in transcripts that are involved in protein synthesis or mitochondrial metabolism, but suppresses normal overexpression of significantly longer transcripts, enriched in transcripts that are involved in glucose metabolism and brain activity. These global relations among transcript length, cellular metabolism and tumor development suggest a previously unrecognized physical mode for tumor and normal cells to differentially regulate metabolism in a transcript length-dependent manner. The identified distribution functions support a previous hypothesis from mathematical modeling of evolutionary forces that act upon transcript length in the manner of the restoring force of the harmonic oscillator.
Collapse
|
7
|
Bernardini C, Censi F, Lattanzi W, Calcagnini G, Giuliani A. Gene regulation networks in early phase of Duchenne muscular dystrophy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:393-400. [PMID: 23929863 DOI: 10.1109/tcbb.2013.24] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
The aim of this study was to analyze previously published gene expression data of skeletal muscle biopsies of Duchenne muscular dystrophy (DMD) patients and controls (gene expression omnibus database, accession #GSE6011) using systems biology approaches. We applied an unsupervised method to discriminate patient and control populations, based on principal component analysis, using the gene expressions as units and patients as variables. The genes having the highest absolute scores in the discrimination between the groups, were then analyzed in terms of gene expression networks, on the basis of their mutual correlation in the two groups. The correlation network structures suggest two different modes of gene regulation in the two groups, reminiscent of important aspects of DMD pathogenesis.
Collapse
|
8
|
Gross A, Li CM, Remacle F, Levine RD. Free energy rhythms in Saccharomyces cerevisiae: a dynamic perspective with implications for ribosomal biogenesis. Biochemistry 2013; 52:1641-8. [PMID: 23379300 DOI: 10.1021/bi3016982] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
To describe the time course of cellular systems, we integrate ideas from thermodynamics and information theory to discuss the work needed to change the state of the cell. The biological example analyzed is experimental microarray transcription level oscillations of yeast in the different phases as characterized by oxygen consumption. Surprisal analysis was applied to identify groups of transcripts that oscillate in concert and thereby to compute changes in free energy with time. Three dominant transcript groups were identified by surprisal analysis. The groups correspond to the respiratory, early, and late reductive phases. Genes involved in ribosome biogenesis peaked at the respiratory phase. The work to prepare the state is shown to be the sum of the contributions of these groups. We paid particular attention to work requirements during ribosomal building, and the correlation with ATP levels and dissolved oxygen. The suggestion that cells in the respiratory phase likely build ribosomes, an energy intensive process, in preparation for protein production during the S phase of the cell cycle is validated by an experiment. Surprisal analysis thereby provided a useful tool for determining the synchronization of transcription events and energetics in a cell in real time.
Collapse
Affiliation(s)
- A Gross
- The Fritz Haber Research Center, Hebrew University, Jerusalem 91904, Israel
| | | | | | | |
Collapse
|
9
|
Reuveni E, Giuliani A. A novel multi-scale modeling approach to infer whole genome divergence. Evol Bioinform Online 2012. [PMID: 23189028 PMCID: PMC3503470 DOI: 10.4137/ebo.s10194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We propose a novel and simple approach to elucidate genomic patterns of divergence using principal component analysis (PCA). We applied this methodology to the metric space generated by M. musculus genome-wide SNPs. Distance profiles were computed between M. musculus and its closely related species, M. spretus, which was used as external reference. While the speciation dynamics were apparent in the first principal component, the within M. musculus differentiation dimensions gave rise to three minor components. We were unable to obtain a clear divergence signature discriminating laboratory strains, suggesting a stronger effect of genetic drift. These results were at odds with wild strains which exhibit defined deterministic signals of divergence. Finally, we were able to rank novel and previously known genes according to their likelihood to be under selective pressure. In conclusion, we posit PCA as a robust methodology to unravel diverging DNA regions without any a priori forcing.
Collapse
Affiliation(s)
- Eli Reuveni
- Mouse Biology Unit, European Molecular Biology Laboratory (EMBL), via Ramarini 32, 00015 Monterotondo, Italy
| | | |
Collapse
|
10
|
Lee CH, Alpert BO, Sankaranarayanan P, Alter O. GSVD comparison of patient-matched normal and tumor aCGH profiles reveals global copy-number alterations predicting glioblastoma multiforme survival. PLoS One 2012; 7:e30098. [PMID: 22291905 PMCID: PMC3264559 DOI: 10.1371/journal.pone.0030098] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Accepted: 12/09/2011] [Indexed: 11/18/2022] Open
Abstract
Despite recent large-scale profiling efforts, the best prognostic predictor of glioblastoma multiforme (GBM) remains the patient's age at diagnosis. We describe a global pattern of tumor-exclusive co-occurring copy-number alterations (CNAs) that is correlated, possibly coordinated with GBM patients' survival and response to chemotherapy. The pattern is revealed by GSVD comparison of patient-matched but probe-independent GBM and normal aCGH datasets from The Cancer Genome Atlas (TCGA). We find that, first, the GSVD, formulated as a framework for comparatively modeling two composite datasets, removes from the pattern copy-number variations (CNVs) that occur in the normal human genome (e.g., female-specific X chromosome amplification) and experimental variations (e.g., in tissue batch, genomic center, hybridization date and scanner), without a-priori knowledge of these variations. Second, the pattern includes most known GBM-associated changes in chromosome numbers and focal CNAs, as well as several previously unreported CNAs in >3% of the patients. These include the biochemically putative drug target, cell cycle-regulated serine/threonine kinase-encoding TLK2, the cyclin E1-encoding CCNE1, and the Rb-binding histone demethylase-encoding KDM5A. Third, the pattern provides a better prognostic predictor than the chromosome numbers or any one focal CNA that it identifies, suggesting that the GBM survival phenotype is an outcome of its global genotype. The pattern is independent of age, and combined with age, makes a better predictor than age alone. GSVD comparison of matched profiles of a larger set of TCGA patients, inclusive of the initial set, confirms the global pattern. GSVD classification of the GBM profiles of an independent set of patients validates the prognostic contribution of the pattern.
Collapse
Affiliation(s)
- Cheng H. Lee
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
| | - Benjamin O. Alpert
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
| | - Preethi Sankaranarayanan
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
| | - Orly Alter
- Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
- Department of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
| |
Collapse
|
11
|
Ponnapalli SP, Saunders MA, Van Loan CF, Alter O. A higher-order generalized singular value decomposition for comparison of global mRNA expression from multiple organisms. PLoS One 2011; 6:e28072. [PMID: 22216090 PMCID: PMC3245232 DOI: 10.1371/journal.pone.0028072] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2011] [Accepted: 10/31/2011] [Indexed: 11/18/2022] Open
Abstract
The number of high-dimensional datasets recording multiple aspects of a single phenomenon is increasing in many areas of science, accompanied by a need for mathematical frameworks that can compare multiple large-scale matrices with different row dimensions. The only such framework to date, the generalized singular value decomposition (GSVD), is limited to two matrices. We mathematically define a higher-order GSVD (HO GSVD) for N≥2 matrices , each with full column rank. Each matrix is exactly factored as Di = UiΣiVT, where V, identical in all factorizations, is obtained from the eigensystem SV = VΛ of the arithmetic mean S of all pairwise quotients of the matrices , i≠j. We prove that this decomposition extends to higher orders almost all of the mathematical properties of the GSVD. The matrix S is nondefective with V and Λ real. Its eigenvalues satisfy λk≥1. Equality holds if and only if the corresponding eigenvector vk is a right basis vector of equal significance in all matrices Di and Dj, that is σi,k/σj,k = 1 for all i and j, and the corresponding left basis vector ui,k is orthogonal to all other vectors in Ui for all i. The eigenvalues λk = 1, therefore, define the “common HO GSVD subspace.” We illustrate the HO GSVD with a comparison of genome-scale cell-cycle mRNA expression from S. pombe, S. cerevisiae and human. Unlike existing algorithms, a mapping among the genes of these disparate organisms is not required. We find that the approximately common HO GSVD subspace represents the cell-cycle mRNA expression oscillations, which are similar among the datasets. Simultaneous reconstruction in the common subspace, therefore, removes the experimental artifacts, which are dissimilar, from the datasets. In the simultaneous sequence-independent classification of the genes of the three organisms in this common subspace, genes of highly conserved sequences but significantly different cell-cycle peak times are correctly classified.
Collapse
Affiliation(s)
- Sri Priya Ponnapalli
- Department of Electrical and Computer Engineering, University of Texas at Austin, Texas, United States of America
| | - Michael A. Saunders
- Department of Management Science and Engineering, Stanford University, Stanford, California, United States of America
| | - Charles F. Van Loan
- Department of Computer Science, Cornell University, Ithaca, New York, United States of America
| | - Orly Alter
- Scientific Computing and Imaging (SCI) Institute and Departments of Bioengineering and Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
- * E-mail:
| |
Collapse
|
12
|
Tensor decomposition reveals concurrent evolutionary convergences and divergences and correlations with structural motifs in ribosomal RNA. PLoS One 2011; 6:e18768. [PMID: 21625625 PMCID: PMC3094155 DOI: 10.1371/journal.pone.0018768] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2011] [Accepted: 03/17/2011] [Indexed: 11/19/2022] Open
Abstract
Evolutionary relationships among organisms are commonly described by using a hierarchy derived from comparisons of ribosomal RNA (rRNA) sequences. We propose that even on the level of a single rRNA molecule, an organism's evolution is composed of multiple pathways due to concurrent forces that act independently upon different rRNA degrees of freedom. Relationships among organisms are then compositions of coexisting pathway-dependent similarities and dissimilarities, which cannot be described by a single hierarchy. We computationally test this hypothesis in comparative analyses of 16S and 23S rRNA sequence alignments by using a tensor decomposition, i.e., a framework for modeling composite data. Each alignment is encoded in a cuboid, i.e., a third-order tensor, where nucleotides, positions and organisms, each represent a degree of freedom. A tensor mode-1 higher-order singular value decomposition (HOSVD) is formulated such that it separates each cuboid into combinations of patterns of nucleotide frequency variation across organisms and positions, i.e., "eigenpositions" and corresponding nucleotide-specific segments of "eigenorganisms," respectively, independent of a-priori knowledge of the taxonomic groups or rRNA structures. We find, in support of our hypothesis that, first, the significant eigenpositions reveal multiple similarities and dissimilarities among the taxonomic groups. Second, the corresponding eigenorganisms identify insertions or deletions of nucleotides exclusively conserved within the corresponding groups, that map out entire substructures and are enriched in adenosines, unpaired in the rRNA secondary structure, that participate in tertiary structure interactions. This demonstrates that structural motifs involved in rRNA folding and function are evolutionary degrees of freedom. Third, two previously unknown coexisting subgenic relationships between Microsporidia and Archaea are revealed in both the 16S and 23S rRNA alignments, a convergence and a divergence, conferred by insertions and deletions of these motifs, which cannot be described by a single hierarchy. This shows that mode-1 HOSVD modeling of rRNA alignments might be used to computationally predict evolutionary mechanisms.
Collapse
|
13
|
Omberg L, Meyerson JR, Kobayashi K, Drury LS, Diffley JFX, Alter O. Global effects of DNA replication and DNA replication origin activity on eukaryotic gene expression. Mol Syst Biol 2009; 5:312. [PMID: 19888207 PMCID: PMC2779084 DOI: 10.1038/msb.2009.70] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2009] [Accepted: 08/19/2009] [Indexed: 11/09/2022] Open
Abstract
This report provides a global view of how gene expression is affected by DNA replication. We analyzed synchronized cultures of Saccharomyces cerevisiae under conditions that prevent DNA replication initiation without delaying cell cycle progression. We use a higher-order singular value decomposition to integrate the global mRNA expression measured in the multiple time courses, detect and remove experimental artifacts and identify significant combinations of patterns of expression variation across the genes, time points and conditions. We find that, first, approximately 88% of the global mRNA expression is independent of DNA replication. Second, the requirement of DNA replication for efficient histone gene expression is independent of conditions that elicit DNA damage checkpoint responses. Third, origin licensing decreases the expression of genes with origins near their 3' ends, revealing that downstream origins can regulate the expression of upstream genes. This confirms previous predictions from mathematical modeling of a global causal coordination between DNA replication origin activity and mRNA expression, and shows that mathematical modeling of DNA microarray data can be used to correctly predict previously unknown biological modes of regulation.
Collapse
Affiliation(s)
- Larsson Omberg
- Department of Biomedical Engineering, University of Texas, Austin, TX 78712, USA
| | | | | | | | | | | |
Collapse
|
14
|
Abstract
The term health is commonplace in both everyday parlance and professional discourse. Unfortunately, the term has little objective specification, especially in physiologic terms. When critically examined, even time-honored terms such as homeostasis lack specific measurable referents. The last three decades, however, have witnessed an explosion of information from diverse fields regarding the dynamical basis of biology. This brief review explores a few main ideas, which appear to be coming together to provide biosignatures of health.
Collapse
Affiliation(s)
- Joseph P Zbilut
- Adult Health Nursing, College of Nursing, and Molecular Biophysics and Physiology, Rush Medical College, Chicago, Illinois, USA
| |
Collapse
|
15
|
|
16
|
A tensor higher-order singular value decomposition for integrative analysis of DNA microarray data from different studies. Proc Natl Acad Sci U S A 2007; 104:18371-6. [PMID: 18003902 DOI: 10.1073/pnas.0709146104] [Citation(s) in RCA: 105] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We describe the use of a higher-order singular value decomposition (HOSVD) in transforming a data tensor of genes x "x-settings," that is, different settings of the experimental variable x x "y-settings," which tabulates DNA microarray data from different studies, to a "core tensor" of "eigenarrays" x "x-eigengenes" x "y-eigengenes." Reformulating this multilinear HOSVD such that it decomposes the data tensor into a linear superposition of all outer products of an eigenarray, an x- and a y-eigengene, that is, rank-1 "subtensors," we define the significance of each subtensor in terms of the fraction of the overall information in the data tensor that it captures. We illustrate this HOSVD with an integration of genome-scale mRNA expression data from three yeast cell cycle time courses, two of which are under exposure to either hydrogen peroxide or menadione. We find that significant subtensors represent independent biological programs or experimental phenomena. The picture that emerges suggests that the conserved genes YKU70, MRE11, AIF1, and ZWF1, and the processes of retrotransposition, apoptosis, and the oxidative pentose phosphate pathway that these genes are involved in, may play significant, yet previously unrecognized, roles in the differential effects of hydrogen peroxide and menadione on cell cycle progression. A genome-scale correlation between DNA replication initiation and RNA transcription, which is equivalent to a recently discovered correlation and might be due to a previously unknown mechanism of regulation, is independently uncovered.
Collapse
|
17
|
Abstract
DNA microarrays make it possible, for the first time, to record the complete genomic signals that guide the progression of cellular processes. Future discovery in biology and medicine will come from the mathematical modeling of these data, which hold the key to fundamental understanding of life on the molecular level, as well as answers to questions regarding diagnosis, treatment, and drug development. This chapter reviews the first data-driven models that were created from these genome-scale data, through adaptations and generalizations of mathematical frameworks from matrix algebra that have proven successful in describing the physical world, in such diverse areas as mechanics and perception: the singular value decomposition model, the generalized singular value decomposition model comparative model, and the pseudoinverse projection integrative model. These models provide mathematical descriptions of the genetic networks that generate and sense the measured data, where the mathematical variables and operations represent biological reality. The variables, patterns uncovered in the data, correlate with activities of cellular elements such as regulators or transcription factors that drive the measured signals and cellular states where these elements are active. The operations, such as data reconstruction, rotation, and classification in subspaces of selected patterns, simulate experimental observation of only the cellular programs that these patterns represent. These models are illustrated in the analyses of RNA expression data from yeast and human during their cell cycle programs and DNA-binding data from yeast cell cycle transcription factors and replication initiation proteins. Two alternative pictures of RNA expression oscillations during the cell cycle that emerge from these analyses, which parallel well-known designs of physical oscillators, convey the capacity of the models to elucidate the design principles of cellular systems, as well as guide the design of synthetic ones. In these analyses, the power of the models to predict previously unknown biological principles is demonstrated with a prediction of a novel mechanism of regulation that correlates DNA replication initiation with cell cycle-regulated RNA transcription in yeast. These models may become the foundation of a future in which biological systems are modeled as physical systems are today.
Collapse
Affiliation(s)
- Orly Alter
- Department of Biomedical Engineering, Institute for Cellular and Molecular Biology and Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|