1
|
Rivera-Mulia JC, Kim S, Gabr H, Chakraborty A, Ay F, Kahveci T, Gilbert DM. Replication timing networks reveal a link between transcription regulatory circuits and replication timing control. Genome Res 2019; 29:1415-1428. [PMID: 31434679 PMCID: PMC6724675 DOI: 10.1101/gr.247049.118] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Accepted: 08/05/2019] [Indexed: 12/11/2022]
Abstract
DNA replication occurs in a defined temporal order known as the replication timing (RT) program and is regulated during development, coordinated with 3D genome organization and transcriptional activity. However, transcription and RT are not sufficiently coordinated to predict each other, suggesting an indirect relationship. Here, we exploit genome-wide RT profiles from 15 human cell types and intermediate differentiation stages derived from human embryonic stem cells to construct different types of RT regulatory networks. First, we constructed networks based on the coordinated RT changes during cell fate commitment to create highly complex RT networks composed of thousands of interactions that form specific functional subnetwork communities. We also constructed directional regulatory networks based on the order of RT changes within cell lineages, and identified master regulators of differentiation pathways. Finally, we explored relationships between RT networks and transcriptional regulatory networks (TRNs) by combining them into more complex circuitries of composite and bipartite networks. Results identified novel trans interactions linking transcription factors that are core to the regulatory circuitry of each cell type to RT changes occurring in those cell types. These core transcription factors were found to bind cooperatively to sites in the affected replication domains, providing provocative evidence that they constitute biologically significant directional interactions. Our findings suggest a regulatory link between the establishment of cell-type-specific TRNs and RT control during lineage specification.
Collapse
Affiliation(s)
- Juan Carlos Rivera-Mulia
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota Medical School, Minneapolis, Minnesota 55455, USA
| | - Sebo Kim
- Department of Computer and Information Sciences and Engineering, University of Florida, Gainesville, Florida 32611, USA
| | - Haitham Gabr
- Department of Computer and Information Sciences and Engineering, University of Florida, Gainesville, Florida 32611, USA
| | - Abhijit Chakraborty
- La Jolla Institute for Allergy and Immunology, La Jolla, California 92037, USA
| | - Ferhat Ay
- La Jolla Institute for Allergy and Immunology, La Jolla, California 92037, USA.,School of Medicine, University of California San Diego, La Jolla, California 92093, USA
| | - Tamer Kahveci
- Department of Computer and Information Sciences and Engineering, University of Florida, Gainesville, Florida 32611, USA
| | - David M Gilbert
- Department of Biological Science, Florida State University, Tallahassee, Florida, 32306-4295, USA.,Center for Genomics and Personalized Medicine, Florida State University, Tallahassee, Florida 32306, USA
| |
Collapse
|
2
|
Gabr H, Rivera-Mulia JC, Gilbert DM, Kahveci T. Computing interaction probabilities in signaling networks. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2015; 2015:10. [PMID: 26587014 PMCID: PMC4642599 DOI: 10.1186/s13637-015-0031-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Accepted: 10/30/2015] [Indexed: 01/17/2023]
Abstract
Biological networks inherently have uncertain topologies. This arises from many factors. For instance, interactions between molecules may or may not take place under varying conditions. Genetic or epigenetic mutations may also alter biological processes like transcription or translation. This uncertainty is often modeled by associating each interaction with a probability value. Studying biological networks under this probabilistic model has already been shown to yield accurate and insightful analysis of interaction data. However, the problem of assigning accurate probability values to interactions remains unresolved. In this paper, we present a novel method for computing interaction probabilities in signaling networks based on transcription levels of genes. The transcription levels define the signal reachability probability between membrane receptors and transcription factors. Our method computes the interaction probabilities that minimize the gap between the observed and the computed signal reachability probabilities. We evaluate our method on four signaling networks from the Kyoto Encyclopedia of Genes and Genomes (KEGG). For each network, we compute its edge probabilities using the gene expression profiles for seven major leukemia subtypes. We use these values to analyze how the stress induced by different leukemia subtypes affects signaling interactions.
Collapse
Affiliation(s)
- Haitham Gabr
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, Florida, USA
| | | | - David M. Gilbert
- Department of Biological Science, Florida State University, Tallahassee, Florida, USA
| | - Tamer Kahveci
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
3
|
Pathway correlation profile of gene-gene co-expression for identifying pathway perturbation. PLoS One 2012; 7:e52127. [PMID: 23284898 PMCID: PMC3527387 DOI: 10.1371/journal.pone.0052127] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2012] [Accepted: 11/14/2012] [Indexed: 11/29/2022] Open
Abstract
Identifying perturbed or dysregulated pathways is critical to understanding the biological processes that change within an experiment. Previous methods identified important pathways that are significantly enriched among differentially expressed genes; however, these methods cannot account for small, coordinated changes in gene expression that amass across a whole pathway. In order to overcome this limitation, we use microarray gene expression data to identify pathway perturbation based on pathway correlation profiles. By identifying the distribution of gene-gene pair correlations within a pathway, we can rank the pathways based on the level of perturbation and dysregulation. We have shown this successfully for differences between two experimental conditions in Escherichia coli and changes within time series data in Saccharomyces cerevisiae, as well as two estrogen receptor response classes of breast cancer. Overall, our method made significant predictions as to the pathway perturbations that are involved in the experimental conditions.
Collapse
|
4
|
Applications of different weighting schemes to improve pathway-based analysis. Comp Funct Genomics 2011; 2011:463645. [PMID: 21687588 PMCID: PMC3114410 DOI: 10.1155/2011/463645] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2010] [Accepted: 02/26/2011] [Indexed: 11/28/2022] Open
Abstract
Conventionally, pathway-based analysis assumes that genes in a pathway equally contribute to a biological function, thus assigning uniform weight to genes. However, this assumption has been proved incorrect, and applying uniform weight in the pathway analysis may not be an appropriate approach for the tasks like molecular classification of diseases, as genes in a functional group may have different predicting power. Hence, we propose to use different weights to genes in pathway-based analysis and devise four weighting schemes. We applied them in two existing pathway analysis methods using both real and simulated gene expression data for pathways. Among all schemes, random weighting scheme, which generates random weights and selects optimal weights minimizing an objective function, performs best in terms of P value or error rate reduction. Weighting changes pathway scoring and brings up some new significant pathways, leading to the detection of disease-related genes that are missed under uniform weight.
Collapse
|
5
|
Demir E, Cary MP, Paley S, Fukuda K, Lemer C, Vastrik I, Wu G, D'Eustachio P, Schaefer C, Luciano J, Schacherer F, Martinez-Flores I, Hu Z, Jimenez-Jacinto V, Joshi-Tope G, Kandasamy K, Lopez-Fuentes AC, Mi H, Pichler E, Rodchenkov I, Splendiani A, Tkachev S, Zucker J, Gopinath G, Rajasimha H, Ramakrishnan R, Shah I, Syed M, Anwar N, Babur O, Blinov M, Brauner E, Corwin D, Donaldson S, Gibbons F, Goldberg R, Hornbeck P, Luna A, Murray-Rust P, Neumann E, Ruebenacker O, Reubenacker O, Samwald M, van Iersel M, Wimalaratne S, Allen K, Braun B, Whirl-Carrillo M, Cheung KH, Dahlquist K, Finney A, Gillespie M, Glass E, Gong L, Haw R, Honig M, Hubaut O, Kane D, Krupa S, Kutmon M, Leonard J, Marks D, Merberg D, Petri V, Pico A, Ravenscroft D, Ren L, Shah N, Sunshine M, Tang R, Whaley R, Letovksy S, Buetow KH, Rzhetsky A, Schachter V, Sobral BS, Dogrusoz U, McWeeney S, Aladjem M, Birney E, Collado-Vides J, Goto S, Hucka M, Le Novère N, Maltsev N, Pandey A, Thomas P, Wingender E, Karp PD, Sander C, Bader GD. The BioPAX community standard for pathway data sharing. Nat Biotechnol 2010; 28:935-42. [PMID: 20829833 PMCID: PMC3001121 DOI: 10.1038/nbt.1666] [Citation(s) in RCA: 432] [Impact Index Per Article: 30.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
BioPAX (Biological Pathway Exchange) is a standard language to represent biological pathways at the molecular and cellular level. Its major use is to facilitate the exchange of pathway data (http://www.biopax.org). Pathway data captures our understanding of biological processes, but its rapid growth necessitates development of databases and computational tools to aid interpretation. However, the current fragmentation of pathway information across many databases with incompatible formats presents barriers to its effective use. BioPAX solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. BioPAX was created through a community process. Through BioPAX, millions of interactions organized into thousands of pathways across many organisms, from a growing number of sources, are available. Thus, large amounts of pathway data are available in a computable form to support visualization, analysis and biological discovery.
Collapse
Affiliation(s)
- Emek Demir
- Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, New York, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Bayesian network expansion identifies new ROS and biofilm regulators. PLoS One 2010; 5:e9513. [PMID: 20209085 PMCID: PMC2831072 DOI: 10.1371/journal.pone.0009513] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2009] [Accepted: 02/07/2010] [Indexed: 11/19/2022] Open
Abstract
Signaling and regulatory pathways that guide gene expression have only been partially defined for most organisms. However, given the increasing number of microarray measurements, it may be possible to reconstruct such pathways and uncover missing connections directly from experimental data. Using a compendium of microarray gene expression data obtained from Escherichia coli, we constructed a series of Bayesian network models for the reactive oxygen species (ROS) pathway as defined by EcoCyc. A consensus Bayesian network model was generated using those networks sharing the top recovered score. This microarray-based network only partially agreed with the known ROS pathway curated from the literature and databases. A top network was then expanded to predict genes that could enhance the Bayesian network model using an algorithm we termed ‘BN+1’. This expansion procedure predicted many stress-related genes (e.g., dusB and uspE), and their possible interactions with other ROS pathway genes. A term enrichment method discovered that biofilm-associated microarray data usually contained high expression levels of both uspE and gadX. The predicted involvement of gene uspE in the ROS pathway and interactions between uspE and gadX were confirmed experimentally using E. coli reporter strains. Genes gadX and uspE showed a feedback relationship in regulating each other's expression. Both genes were verified to regulate biofilm formation through gene knockout experiments. These data suggest that the BN+1 expansion method can faithfully uncover hidden or unknown genes for a selected pathway with significant biological roles. The presently reported BN+1 expansion method is a generalized approach applicable to the characterization and expansion of other biological pathways and living systems.
Collapse
|
7
|
Li GG, Wang ZZ. Evaluation of similarity measures for gene expression data and their correspondent combined measures. Interdiscip Sci 2009; 1:72-80. [DOI: 10.1007/s12539-008-0005-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2008] [Revised: 08/10/2008] [Accepted: 08/10/2008] [Indexed: 11/30/2022]
|
8
|
Chikayama E, Suto M, Nishihara T, Shinozaki K, Hirayama T, Kikuchi J. Systematic NMR analysis of stable isotope labeled metabolite mixtures in plant and animal systems: coarse grained views of metabolic pathways. PLoS One 2008; 3:e3805. [PMID: 19030231 PMCID: PMC2583929 DOI: 10.1371/journal.pone.0003805] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2008] [Accepted: 10/21/2008] [Indexed: 11/23/2022] Open
Abstract
Background Metabolic phenotyping has become an important ‘bird's-eye-view’ technology which can be applied to higher organisms, such as model plant and animal systems in the post-genomics and proteomics era. Although genotyping technology has expanded greatly over the past decade, metabolic phenotyping has languished due to the difficulty of ‘top-down’ chemical analyses. Here, we describe a systematic NMR methodology for stable isotope-labeling and analysis of metabolite mixtures in plant and animal systems. Methodology/Principal Findings The analysis method includes a stable isotope labeling technique for use in living organisms; a systematic method for simultaneously identifying a large number of metabolites by using a newly developed HSQC-based metabolite chemical shift database combined with heteronuclear multidimensional NMR spectroscopy; Principal Components Analysis; and a visualization method using a coarse-grained overview of the metabolic system. The database contains more than 1000 1H and 13C chemical shifts corresponding to 142 metabolites measured under identical physicochemical conditions. Using the stable isotope labeling technique in Arabidopsis T87 cultured cells and Bombyx mori, we systematically detected >450 HSQC peaks in each 13C-HSQC spectrum derived from model plant, Arabidopsis T87 cultured cells and the invertebrate animal model Bombyx mori. Furthermore, for the first time, efficient 13C labeling has allowed reliable signal assignment using analytical separation techniques such as 3D HCCH-COSY spectra in higher organism extracts. Conclusions/Significance Overall physiological changes could be detected and categorized in relation to a critical developmental phase change in B. mori by coarse-grained representations in which the organization of metabolic pathways related to a specific developmental phase was visualized on the basis of constituent changes of 56 identified metabolites. Based on the observed intensities of 13C atoms of given metabolites on development-dependent changes in the 56 identified 13C-HSQC signals, we have determined the changes in metabolic networks that are associated with energy and nitrogen metabolism.
Collapse
|
9
|
Gene module level analysis: identification to networks and dynamics. Curr Opin Biotechnol 2008; 19:482-91. [PMID: 18725293 DOI: 10.1016/j.copbio.2008.07.011] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2008] [Revised: 07/25/2008] [Accepted: 07/29/2008] [Indexed: 12/23/2022]
Abstract
Nature exhibits modular design in biological systems. Gene module level analysis is based on this module concept, aiming to understand biological network design and systems behavior in disease and development by emphasizing on modules of genes rather than individual genes. Module level analysis has been extensively applied in genome wide level analysis, exploring the organization of biological systems from identifying modules to reconstructing module networks and analyzing module dynamics. Such module level perspective provides a high level representation of the regulatory scenario and design of biological systems, promising to revolutionize our view of systems biology, genetic engineering as well as disease mechanisms and molecular medicine.
Collapse
|
10
|
Chen YPP, Chen F. Identifying targets for drug discovery using bioinformatics. Expert Opin Ther Targets 2008; 12:383-9. [PMID: 18348676 DOI: 10.1517/14728222.12.4.383] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
BACKGROUND Drug discovery is the process of discovering and designing drugs, which includes target identification, target validation, lead identification, lead optimization and introduction of the new drugs to the public. This process is very important, involving analyzing the causes of the diseases and finding ways to tackle them. OBJECTIVE The problems we must face include: i) that this process is so long and expensive that it might cost millions of dollars and take a dozen years; and ii) the accuracy of identification of targets is not good enough, which in turn delays the process. Introducing bioinformatics into the drug discovery process could contribute much to it. Bioinformatics is a booming subject combining biology with computer science. It can explore the causes of diseases at the molecular level, explain the phenomena of the diseases from the angle of the gene and make use of computer techniques, such as data mining, machine learning and so on, to decrease the scope of analysis and enhance the accuracy of the results so as to reduce the cost and time. METHODS Here we describe recent studies about how to apply bioinformatics techniques in the four phases of drug discovery, how these techniques improve the drug discovery process and some possible difficulties that should be dealt with. RESULTS We conclude that combining bioinformatics with drug discovery is a very promising method although it faces many problems currently.
Collapse
|
11
|
Panteris E, Swift S, Payne A, Liu X. Mining pathway signatures from microarray data and relevant biological knowledge. J Biomed Inform 2007; 40:698-706. [PMID: 17395545 DOI: 10.1016/j.jbi.2007.01.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2006] [Revised: 01/05/2007] [Accepted: 01/31/2007] [Indexed: 10/23/2022]
Abstract
High-throughput technologies such as DNA microarray are in the process of revolutionizing the way modern biological research is being done. Bioinformatics tools are becoming increasingly important to assist biomedical scientists in their quest in understanding complex biological processes. Gene expression analysis has attracted a large amount of attention over the last few years mostly in the form of algorithms, exploring cluster and regulatory relationships among genes of interest, and programs that try to display the multidimensional microarray data in appropriate formats so that they make biological sense. To reduce the dimensionality of microarray data and make the corresponding analysis more biologically relevant, in this paper we propose a biologically-led approach to biochemical pathway analysis using microarray data and relevant biological knowledge. The method selects a subset of genes for each pathway that describes the behaviour of the pathway at a given experimental condition, and transforms them into pathway signatures. The metabolic pathways of Escherichia coli are used as a case study.
Collapse
Affiliation(s)
- Eleftherios Panteris
- School of Information Systems, Computing and Mathematics, Brunel University, Uxbridge, Middlesex UB8 3PH, UK.
| | | | | | | |
Collapse
|
12
|
GenMAPP 2: new features and resources for pathway analysis. BMC Bioinformatics 2007; 8:217. [PMID: 17588266 PMCID: PMC1924866 DOI: 10.1186/1471-2105-8-217] [Citation(s) in RCA: 205] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2006] [Accepted: 06/24/2007] [Indexed: 12/03/2022] Open
Abstract
Background Microarray technologies have evolved rapidly, enabling biologists to quantify genome-wide levels of gene expression, alternative splicing, and sequence variations for a variety of species. Analyzing and displaying these data present a significant challenge. Pathway-based approaches for analyzing microarray data have proven useful for presenting data and for generating testable hypotheses. Results To address the growing needs of the microarray community we have released version 2 of Gene Map Annotator and Pathway Profiler (GenMAPP), a new GenMAPP database schema, and integrated resources for pathway analysis. We have redesigned the GenMAPP database to support multiple gene annotations and species as well as custom species database creation for a potentially unlimited number of species. We have expanded our pathway resources by utilizing homology information to translate pathway content between species and extending existing pathways with data derived from conserved protein interactions and coexpression. We have implemented a new mode of data visualization to support analysis of complex data, including time-course, single nucleotide polymorphism (SNP), and splicing. GenMAPP version 2 also offers innovative ways to display and share data by incorporating HTML export of analyses for entire sets of pathways as organized web pages. Conclusion GenMAPP version 2 provides a means to rapidly interrogate complex experimental data for pathway-level changes in a diverse range of organisms.
Collapse
|