1
|
Chockalingam SP, Aluru M, Aluru S. SCEMENT: scalable and memory efficient integration of large-scale single-cell RNA-sequencing data. BIOINFORMATICS (OXFORD, ENGLAND) 2025; 41:btaf057. [PMID: 39985442 DOI: 10.1093/bioinformatics/btaf057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 11/18/2024] [Accepted: 02/20/2025] [Indexed: 02/24/2025]
Abstract
MOTIVATION Integrative analysis of large-scale single-cell data collected from diverse cell populations promises an improved understanding of complex biological systems. While several algorithms have been developed for single-cell RNA-sequencing data integration, many lack the scalability to handle large numbers of datasets and/or millions of cells due to their memory and run time requirements. The few tools that can handle large data do so by reducing the computational burden through strategies such as subsampling of the data or selecting a reference dataset to improve computational efficiency and scalability. Such shortcuts, however, hamper the accuracy of downstream analyses, especially those requiring quantitative gene expression information. RESULTS We present SCEMENT, a SCalablE and Memory-Efficient iNTegration method, to overcome these limitations. Our new parallel algorithm builds upon and extends the linear regression model previously applied in ComBat to an unsupervised sparse matrix setting to enable accurate integration of diverse and large collections of single-cell RNA-sequencing data. Using tens to hundreds of real single-cell RNA-seq datasets, we show that SCEMENT outperforms ComBat as well as FastIntegration and Scanorama in runtime (upto 214× faster) and memory usage (upto 17.5× less). It not only performs batch correction and integration of millions of cells in under 25 min, but also facilitates the discovery of new rare cell types and more robust reconstruction of gene regulatory networks with full quantitative gene expression information. AVAILABILITY AND IMPLEMENTATION Source code freely available for download at https://github.com/AluruLab/scement, implemented in C++ and supported on Linux.
Collapse
Affiliation(s)
- Sriram P Chockalingam
- Institute for Data Engineering and Science, Georgia Institute of Technology, Atlanta, GA-30332, United States
| | - Maneesha Aluru
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA-30332, United States
| | - Srinivas Aluru
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA-30332, United States
| |
Collapse
|
2
|
Jia C, Grima R. Holimap: an accurate and efficient method for solving stochastic gene network dynamics. Nat Commun 2024; 15:6557. [PMID: 39095346 PMCID: PMC11297302 DOI: 10.1038/s41467-024-50716-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Accepted: 07/13/2024] [Indexed: 08/04/2024] Open
Abstract
Gene-gene interactions are crucial to the control of sub-cellular processes but our understanding of their stochastic dynamics is hindered by the lack of simulation methods that can accurately and efficiently predict how the distributions of gene product numbers vary across parameter space. To overcome these difficulties, here we present Holimap (high-order linear-mapping approximation), an approach that approximates the protein or mRNA number distributions of a complex gene regulatory network by the distributions of a much simpler reaction system. We demonstrate Holimap's computational advantages over conventional methods by applying it to predict the stochastic time-dependent dynamics of various gene networks, including transcriptional networks ranging from simple autoregulatory loops to complex randomly connected networks, post-transcriptional networks, and post-translational networks. Holimap is ideally suited to study how the intricate network of gene-gene interactions results in precise coordination and control of gene expression.
Collapse
Affiliation(s)
- Chen Jia
- Applied and Computational Mathematics Division, Beijing Computational Science Research Center, Beijing, China
| | - Ramon Grima
- School of Biological Sciences, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
3
|
Chee FT, Harun S, Mohd Daud K, Sulaiman S, Nor Muhammad NA. Exploring gene regulation and biological processes in insects: Insights from omics data using gene regulatory network models. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2024; 189:1-12. [PMID: 38604435 DOI: 10.1016/j.pbiomolbio.2024.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/18/2023] [Accepted: 04/03/2024] [Indexed: 04/13/2024]
Abstract
Gene regulatory network (GRN) comprises complicated yet intertwined gene-regulator relationships. Understanding the GRN dynamics will unravel the complexity behind the observed gene expressions. Insect gene regulation is often complicated due to their complex life cycles and diverse ecological adaptations. The main interest of this review is to have an update on the current mathematical modelling methods of GRNs to explain insect science. Several popular GRN architecture models are discussed, together with examples of applications in insect science. In the last part of this review, each model is compared from different aspects, including network scalability, computation complexity, robustness to noise and biological relevancy.
Collapse
Affiliation(s)
- Fong Ting Chee
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Sarahani Harun
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Kauthar Mohd Daud
- Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600, UKM Bangi, Selangor, Malaysia
| | - Suhaila Sulaiman
- FGV R&D Sdn Bhd, FGV Innovation Center, PT23417 Lengkuk Teknologi, Bandar Baru Enstek, 71760 Nilai, Negeri Sembilan, Malaysia
| | - Nor Azlan Nor Muhammad
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia.
| |
Collapse
|
4
|
Wang A. Conceptual breakthroughs of the long noncoding RNA functional system and its endogenous regulatory role in the cancerous regime. EXPLORATION OF TARGETED ANTI-TUMOR THERAPY 2024; 5:170-186. [PMID: 38464381 PMCID: PMC10918237 DOI: 10.37349/etat.2024.00211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 12/18/2023] [Indexed: 03/12/2024] Open
Abstract
Long noncoding RNAs (lncRNAs) derived from noncoding regions in the human genome were once regarded as junks with no biological significance, but recent studies have shown that these molecules are highly functional, prompting an explosion of studies on their biology. However, these recent efforts have only begun to recognize the biological significance of a small fraction (< 1%) of the lncRNAs. The basic concept of these lncRNA functions remains controversial. This controversy arises primarily from conventional biased observations based on limited datasets. Fortunately, emerging big data provides a promising path to circumvent conventional bias to understand an unbiased big picture of lncRNA biology and advance the fundamental principles of lncRNA biology. This review focuses on big data studies that break through the critical concepts of the lncRNA functional system and its endogenous regulatory roles in all cancers. lncRNAs have unique functional systems distinct from proteins, such as transcriptional initiation and regulation, and they abundantly interact with mitochondria and consume less energy. lncRNAs, rather than proteins as traditionally thought, function as the most critical endogenous regulators of all cancers. lncRNAs regulate the cancer regulatory regime by governing the endogenous regulatory network of all cancers. This is accomplished by dominating the regulatory network module and serving as a key hub and top inducer. These critical conceptual breakthroughs lay a blueprint for a comprehensive functional picture of the human genome. They also lay a blueprint for combating human diseases that are regulated by lncRNAs.
Collapse
Affiliation(s)
- Anyou Wang
- Feinstone Center for Genomic Research, University of Memphis, Memphis, TN 38152, USA
| |
Collapse
|
5
|
Altay G, Zapardiel-Gonzalo J, Peters B. RNA-seq preprocessing and sample size considerations for gene network inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.02.522518. [PMID: 36711979 PMCID: PMC9881880 DOI: 10.1101/2023.01.02.522518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Background Gene network inference (GNI) methods have the potential to reveal functional relationships between different genes and their products. Most GNI algorithms have been developed for microarray gene expression datasets and their application to RNA-seq data is relatively recent. As the characteristics of RNA-seq data are different from microarray data, it is an unanswered question what preprocessing methods for RNA-seq data should be applied prior to GNI to attain optimal performance, or what the required sample size for RNA-seq data is to obtain reliable GNI estimates. Results We ran 9144 analysis of 7 different RNA-seq datasets to evaluate 300 different preprocessing combinations that include data transformations, normalizations and association estimators. We found that there was no single best performing preprocessing combination but that there were several good ones. The performance varied widely over various datasets, which emphasized the importance of choosing an appropriate preprocessing configuration before GNI. Two preprocessing combinations appeared promising in general: First, Log-2 TPM (transcript per million) with Variance-stabilizing transformation (VST) and Pearson Correlation Coefficient (PCC) association estimator. Second, raw RNA-seq count data with PCC. Along with these two, we also identified 18 other good preprocessing combinations. Any of these algorithms might perform best in different datasets. Therefore, the GNI performances of these approaches should be measured on any new dataset to select the best performing one for it. In terms of the required biological sample size of RNA-seq data, we found that between 30 to 85 samples were required to generate reliable GNI estimates. Conclusions This study provides practical recommendations on default choices for data preprocessing prior to GNI analysis of RNA-seq data to obtain optimal performance results.
Collapse
Affiliation(s)
- Gökmen Altay
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | | | - Bjoern Peters
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| |
Collapse
|
6
|
Galindez G, Sadegh S, Baumbach J, Kacprowski T, List M. Network-based approaches for modeling disease regulation and progression. Comput Struct Biotechnol J 2022; 21:780-795. [PMID: 36698974 PMCID: PMC9841310 DOI: 10.1016/j.csbj.2022.12.022] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/14/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
Molecular interaction networks lay the foundation for studying how biological functions are controlled by the complex interplay of genes and proteins. Investigating perturbed processes using biological networks has been instrumental in uncovering mechanisms that underlie complex disease phenotypes. Rapid advances in omics technologies have prompted the generation of high-throughput datasets, enabling large-scale, network-based analyses. Consequently, various modeling techniques, including network enrichment, differential network extraction, and network inference, have proven to be useful for gaining new mechanistic insights. We provide an overview of recent network-based methods and their core ideas to facilitate the discovery of disease modules or candidate mechanisms. Knowledge generated from these computational efforts will benefit biomedical research, especially drug development and precision medicine. We further discuss current challenges and provide perspectives in the field, highlighting the need for more integrative and dynamic network approaches to model disease development and progression.
Collapse
Affiliation(s)
- Gihanna Galindez
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Sepideh Sadegh
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| |
Collapse
|
7
|
Foo M, Dony L, He F. Data-driven dynamical modelling of a pathogen-infected plant gene regulatory network: A comparative analysis. Biosystems 2022; 219:104732. [PMID: 35781035 DOI: 10.1016/j.biosystems.2022.104732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 05/30/2022] [Accepted: 06/22/2022] [Indexed: 11/02/2022]
Abstract
Recent advances in synthetic biology have enabled the design of genetic feedback control circuits that could be implemented to build resilient plants against pathogen attacks. To facilitate the proper design of these genetic feedback control circuits, an accurate model that is able to capture the vital dynamical behaviour of the pathogen-infected plant is required. In this study, using a data-driven modelling approach, we develop and compare four dynamical models (i.e. linear, Michaelis-Menten with Hill coefficient (Hill Function), standard S-System and extended S-System) of a pathogen-infected plant gene regulatory network (GRN). These models are then assessed across several criteria, i.e. ease of identifying the type of gene regulation, the predictive capability, Akaike Information Criterion (AIC) and the robustness to parameter uncertainty to determine its viability of balancing between biological complexity and accuracy when modelling the pathogen-infected plant GRN. Using our defined ranking score, we obtain the following insights to the modelling of GRN. Our analyses show that despite commonly used and provide biological relevance, the Hill Function model ranks the lowest while the extended S-System model ranks highest in the overall comparison. Interestingly, the performance of the linear model is more consistent throughout the comparison, making it the preferred model for this pathogen-infected plant GRN when considering data-driven modelling approach.
Collapse
Affiliation(s)
- Mathias Foo
- School of Engineering, University of Warwick, CV4 7AL, Coventry, UK.
| | - Leander Dony
- Institute of Computational Biology, Helmholtz Munich, 85764, Neuherberg, Germany; Department of Translational Psychiatry, Max Planck Institute of Psychiatry, International Max Planck Research School for Translational Psychiatry (IMPRS-TP), 80804, Munich, Germany; TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354, Freising, Germany.
| | - Fei He
- Centre for Computational Science and Mathematical Modelling, Coventry University, CV1 2JH, Coventry, UK.
| |
Collapse
|
8
|
Inference on the structure of gene regulatory networks. J Theor Biol 2022; 539:111055. [PMID: 35150721 DOI: 10.1016/j.jtbi.2022.111055] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 01/29/2022] [Accepted: 02/03/2022] [Indexed: 11/20/2022]
Abstract
In this paper, we conduct theoretical analyses on inferring the structure of gene regulatory networks. Depending on the experimental method and data type, the inference problem is classified into 20 different scenarios. For each scenario, we discuss the problem that with enough data, under what assumptions, what can be inferred about the structure. For scenarios that have been covered in the literature, we provide a brief review. For scenarios that have not been covered in literature, if the structure can be inferred, we propose new mathematical inference methods and evaluate them on simulated data. Otherwise, we prove that the structure cannot be inferred.
Collapse
|
9
|
Dorantes-Gilardi R, García-Cortés D, Hernández-Lemus E, Espinal-Enríquez J. k-core genes underpin structural features of breast cancer. Sci Rep 2021; 11:16284. [PMID: 34381069 PMCID: PMC8358063 DOI: 10.1038/s41598-021-95313-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 07/12/2021] [Indexed: 02/07/2023] Open
Abstract
Gene co-expression networks (GCNs) have been developed as relevant analytical tools for the study of the gene expression patterns behind complex phenotypes. Determining the association between structure and function in GCNs is a current challenge in biomedical research. Several structural differences between GCNs of breast cancer and healthy phenotypes have been reported. In a previous study, using co-expression multilayer networks, we have shown that there are abrupt differences in the connectivity patterns of the GCN of basal-like breast cancer between top co-expressed gene-pairs and the remaining gene-pairs. Here, we compared the top-100,000 interactions networks for the four breast cancer phenotypes (Luminal-A, Luminal-B, Her2+ and Basal), in terms of structural properties. For this purpose, we used the graph-theoretical k-core of a network (maximal sub-network with nodes of degree at least k). We developed a comprehensive analysis of the network k-core ([Formula: see text]) structures in cancer, and its relationship with biological functions. We found that in the Top-100,000-edges networks, the majority of interactions in breast cancer networks are intra-chromosome, meanwhile inter-chromosome interactions serve as connecting bridges between clusters. Moreover, core genes in the healthy network are strongly associated with processes such as metabolism and cell cycle. In breast cancer, only the core of Luminal A is related to those processes, and genes in its core are over-expressed. The intersection of the core nodes in all subtypes of cancer is composed only by genes in the chr8q24.3 region. This region has been observed to be highly amplified in several cancers before, and its appearance in the intersection of the four breast cancer k-cores, may suggest that local co-expression is a conserved phenomenon in cancer. Considering the many intricacies associated with these phenomena and the vast amount of research in epigenomic regulation which is currently undergoing, there is a need for further research on the epigenomic effects on the structure and function of gene co-expression networks in cancer.
Collapse
Affiliation(s)
- Rodrigo Dorantes-Gilardi
- grid.261112.70000 0001 2173 3359Network Science Institute and Department of Physics, Northeastern University, Boston, MA 02115 USA ,grid.462201.3El Colegio de México, Tlalpan, Mexico City, 14110 Mexico ,grid.452651.10000 0004 0627 7633Computational Genomics Division, National Institute of Genomic Medicine (INMEGEN), Mexico City, 14610 Mexico
| | - Diana García-Cortés
- grid.452651.10000 0004 0627 7633Computational Genomics Division, National Institute of Genomic Medicine (INMEGEN), Mexico City, 14610 Mexico
| | - Enrique Hernández-Lemus
- grid.452651.10000 0004 0627 7633Computational Genomics Division, National Institute of Genomic Medicine (INMEGEN), Mexico City, 14610 Mexico ,grid.9486.30000 0001 2159 0001Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México (UNAM), Mexico City, 04510 Mexico
| | - Jesús Espinal-Enríquez
- grid.452651.10000 0004 0627 7633Computational Genomics Division, National Institute of Genomic Medicine (INMEGEN), Mexico City, 14610 Mexico ,grid.9486.30000 0001 2159 0001Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México (UNAM), Mexico City, 04510 Mexico
| |
Collapse
|
10
|
Weidemüller P, Kholmatov M, Petsalaki E, Zaugg JB. Transcription factors: Bridge between cell signaling and gene regulation. Proteomics 2021; 21:e2000034. [PMID: 34314098 DOI: 10.1002/pmic.202000034] [Citation(s) in RCA: 108] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 07/05/2021] [Accepted: 07/16/2021] [Indexed: 01/17/2023]
Abstract
Transcription factors (TFs) are key regulators of intrinsic cellular processes, such as differentiation and development, and of the cellular response to external perturbation through signaling pathways. In this review we focus on the role of TFs as a link between signaling pathways and gene regulation. Cell signaling tends to result in the modulation of a set of TFs that then lead to changes in the cell's transcriptional program. We highlight the molecular layers at which TF activity can be measured and the associated technical and conceptual challenges. These layers include post-translational modifications (PTMs) of the TF, regulation of TF binding to DNA through chromatin accessibility and epigenetics, and expression of target genes. We highlight that a large number of TFs are understudied in both signaling and gene regulation studies, and that our knowledge about known TF targets has a strong literature bias. We argue that TFs serve as a perfect bridge between the fields of gene regulation and signaling, and that separating these fields hinders our understanding of cell functions. Multi-omics approaches that measure multiple dimensions of TF activity are ideally suited to study the interplay of cell signaling and gene regulation using TFs as the anchor to link the two fields.
Collapse
Affiliation(s)
- Paula Weidemüller
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Maksim Kholmatov
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstraße 1, Heidelberg, 69117, Germany
| | - Evangelia Petsalaki
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Judith B Zaugg
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstraße 1, Heidelberg, 69117, Germany
| |
Collapse
|
11
|
Kimura S, Fukutomi R, Tokuhisa M, Okada M. Inference of Genetic Networks From Time-Series and Static Gene Expression Data: Combining a Random-Forest-Based Inference Method With Feature Selection Methods. Front Genet 2021; 11:595912. [PMID: 33384716 PMCID: PMC7770182 DOI: 10.3389/fgene.2020.595912] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 11/23/2020] [Indexed: 11/17/2022] Open
Abstract
Several researchers have focused on random-forest-based inference methods because of their excellent performance. Some of these inference methods also have a useful ability to analyze both time-series and static gene expression data. However, they are only of use in ranking all of the candidate regulations by assigning them confidence values. None have been capable of detecting the regulations that actually affect a gene of interest. In this study, we propose a method to remove unpromising candidate regulations by combining the random-forest-based inference method with a series of feature selection methods. In addition to detecting unpromising regulations, our proposed method uses outputs from the feature selection methods to adjust the confidence values of all of the candidate regulations that have been computed by the random-forest-based inference method. Numerical experiments showed that the combined application with the feature selection methods improved the performance of the random-forest-based inference method on 99 of the 100 trials performed on the artificial problems. However, the improvement tends to be small, since our combined method succeeded in removing only 19% of the candidate regulations at most. The combined application with the feature selection methods moreover makes the computational cost higher. While a bigger improvement at a lower computational cost would be ideal, we see no impediments to our investigation, given that our aim is to extract as much useful information as possible from a limited amount of gene expression data.
Collapse
Affiliation(s)
- Shuhei Kimura
- Faculty of Engineering, Tottori University, Tottori, Japan
| | - Ryo Fukutomi
- Graduate School of Sustainability Science, Tottori University, Tottori, Japan
| | | | - Mariko Okada
- Laboratory of Cell Systems, Institute of Protein Research, Osaka University, Osaka, Japan
| |
Collapse
|
12
|
Mignone P, Pio G, Džeroski S, Ceci M. Multi-task learning for the simultaneous reconstruction of the human and mouse gene regulatory networks. Sci Rep 2020; 10:22295. [PMID: 33339842 PMCID: PMC7749184 DOI: 10.1038/s41598-020-78033-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Accepted: 10/29/2020] [Indexed: 12/31/2022] Open
Abstract
The reconstruction of Gene Regulatory Networks (GRNs) from gene expression data, supported by machine learning approaches, has received increasing attention in recent years. The task at hand is to identify regulatory links between genes in a network. However, existing methods often suffer when the number of labeled examples is low or when no negative examples are available. In this paper we propose a multi-task method that is able to simultaneously reconstruct the human and the mouse GRNs using the similarities between the two. This is done by exploiting, in a transfer learning approach, possible dependencies that may exist among them. Simultaneously, we solve the issues arising from the limited availability of examples of links by relying on a novel clustering-based approach, able to estimate the degree of certainty of unlabeled examples of links, so that they can be exploited during the training together with the labeled examples. Our experiments show that the proposed method can reconstruct both the human and the mouse GRNs more effectively compared to reconstructing each network separately. Moreover, it significantly outperforms three state-of-the-art transfer learning approaches that, analogously to our method, can exploit the knowledge coming from both organisms. Finally, a specific robustness analysis reveals that, even when the number of labeled examples is very low with respect to the number of unlabeled examples, the proposed method is almost always able to outperform its single-task counterpart.
Collapse
Affiliation(s)
- Paolo Mignone
- Department of Computer Science, University of Bari Aldo Moro, Bari, 70125, Italy
| | - Gianvito Pio
- Department of Computer Science, University of Bari Aldo Moro, Bari, 70125, Italy.
| | - Sašo Džeroski
- Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, 1000, Slovenia
| | - Michelangelo Ceci
- Department of Computer Science, University of Bari Aldo Moro, Bari, 70125, Italy.,Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, 1000, Slovenia
| |
Collapse
|
13
|
Abstract
OBJECTIVES Numerous software has been developed to infer the gene regulatory network, a long-standing key topic in biology and computational biology. Yet the slowness and inaccuracy inherited in current software hamper their applications to the increasing massive data. Here, we develop a software, FINET (Fast Inferring NETwork), to infer a network with high accuracy and rapidity from big data. RESULTS The high accuracy results from integrating algorithms with stability-selection, elastic-net, and parameter optimization. Tested by a known biological network, FINET infers interactions with over 94% precision. The high speed comes from partnering parallel computations implemented with Julia, a new compiled language that runs much faster than existing languages used in the current software, such as R, Python, and MATLAB. Regardless of FINET's implementations with Julia, users with no background in the language or computer science can easily operate it, with only a user-friendly single command line. In addition, FINET can infer other networks such as chemical networks and social networks. Overall, FINET provides a confident way to efficiently and accurately infer any type of network for any scale of data.
Collapse
Affiliation(s)
- Anyou Wang
- The Institute for Integrative Genome Biology, University of California at Riverside, Riverside, CA, 92521, USA.
| | - Rong Hai
- The Institute for Integrative Genome Biology, University of California at Riverside, Riverside, CA, 92521, USA. .,Department of Microbiology and Plant Pathology, University of California at Riverside, Riverside, CA, 92521, USA.
| |
Collapse
|
14
|
García-Cortés D, de Anda-Jáuregui G, Fresno C, Hernández-Lemus E, Espinal-Enríquez J. Gene Co-expression Is Distance-Dependent in Breast Cancer. Front Oncol 2020; 10:1232. [PMID: 32850369 PMCID: PMC7396632 DOI: 10.3389/fonc.2020.01232] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 06/16/2020] [Indexed: 12/15/2022] Open
Abstract
Breast carcinomas are characterized by anomalous gene regulatory programs. As is well-known, gene expression programs are able to shape phenotypes. Hence, the understanding of gene co-expression may shed light on the underlying mechanisms behind the transcriptional regulatory programs affecting tumor development and evolution. For instance, in breast cancer, there is a clear loss of inter-chromosomal (trans-) co-expression, compared with healthy tissue. At the same time cis- (intra-chromosomal) interactions are favored in breast tumors. In order to have a deeper understanding of regulatory phenomena in cancer, here, we constructed Gene Co-expression Networks by using TCGA-derived RNA-seq whole-genome samples corresponding to the four breast cancer molecular subtypes, as well as healthy tissue. We quantify the cis-/trans- co-expression imbalance in all phenotypes. Additionally, we measured the association between co-expression and physical distance between genes, and characterized the ratio of intra/inter-cytoband interactions per phenotype. We confirmed loss of trans- co-expression in all molecular subtypes. We also observed that gene cis- co-expression decays abruptly with distance in all tumors in contrast with healthy tissue. We observed co-expressed gene hotspots, that tend to be connected at cytoband regions, and coincide accurately with already known copy number altered regions, such as Chr17q12, or Chr8q24.3 for all subtypes. Our methodology recovered different alterations already reported for specific breast cancer subtypes, showing how co-expression network approaches might help to capture distinct events that modify the cell regulatory program.
Collapse
Affiliation(s)
- Diana García-Cortés
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Programa de Doctorado en Ciencias Biomédicas, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | | | - Cristóbal Fresno
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
15
|
Sauta E, Demartini A, Vitali F, Riva A, Bellazzi R. A Bayesian data fusion based approach for learning genome-wide transcriptional regulatory networks. BMC Bioinformatics 2020; 21:219. [PMID: 32471360 PMCID: PMC7257163 DOI: 10.1186/s12859-020-3510-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 04/22/2020] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Reverse engineering of transcriptional regulatory networks (TRN) from genomics data has always represented a computational challenge in System Biology. The major issue is modeling the complex crosstalk among transcription factors (TFs) and their target genes, with a method able to handle both the high number of interacting variables and the noise in the available heterogeneous experimental sources of information. RESULTS In this work, we propose a data fusion approach that exploits the integration of complementary omics-data as prior knowledge within a Bayesian framework, in order to learn and model large-scale transcriptional networks. We develop a hybrid structure-learning algorithm able to jointly combine TFs ChIP-Sequencing data and gene expression compendia to reconstruct TRNs in a genome-wide perspective. Applying our method to high-throughput data, we verified its ability to deal with the complexity of a genomic TRN, providing a snapshot of the synergistic TFs regulatory activity. Given the noisy nature of data-driven prior knowledge, which potentially contains incorrect information, we also tested the method's robustness to false priors on a benchmark dataset, comparing the proposed approach to other regulatory network reconstruction algorithms. We demonstrated the effectiveness of our framework by evaluating structural commonalities of our learned genomic network with other existing networks inferred by different DNA binding information-based methods. CONCLUSIONS This Bayesian omics-data fusion based methodology allows to gain a genome-wide picture of the transcriptional interplay, helping to unravel key hierarchical transcriptional interactions, which could be subsequently investigated, and it represents a promising learning approach suitable for multi-layered genomic data integration, given its robustness to noisy sources and its tailored framework for handling high dimensional data.
Collapse
Affiliation(s)
- Elisabetta Sauta
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100, Pavia, Italy.
| | - Andrea Demartini
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100, Pavia, Italy
| | - Francesca Vitali
- Center for Biomedical Informatics and Biostatistics, Dept. of Medicine, The University of Arizona Health Sciences, 1230 Cherry Ave, Tucson, AZ, 85719, USA
| | - Alberto Riva
- Bioinformatics Core, Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL, 32610, USA
| | - Riccardo Bellazzi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, 27100, Pavia, Italy
| |
Collapse
|
16
|
Albrecht M, Lucarelli P, Kulms D, Sauter T. Computational models of melanoma. Theor Biol Med Model 2020; 17:8. [PMID: 32410672 PMCID: PMC7222475 DOI: 10.1186/s12976-020-00126-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 04/29/2020] [Indexed: 02/08/2023] Open
Abstract
Genes, proteins, or cells influence each other and consequently create patterns, which can be increasingly better observed by experimental biology and medicine. Thereby, descriptive methods of statistics and bioinformatics sharpen and structure our perception. However, additionally considering the interconnectivity between biological elements promises a deeper and more coherent understanding of melanoma. For instance, integrative network-based tools and well-grounded inductive in silico research reveal disease mechanisms, stratify patients, and support treatment individualization. This review gives an overview of different modeling techniques beyond statistics, shows how different strategies align with the respective medical biology, and identifies possible areas of new computational melanoma research.
Collapse
Affiliation(s)
- Marco Albrecht
- Systems Biology Group, Life Science Research Unit, University of Luxembourg, 6, avenue du Swing, Belval, 4367 Luxembourg
| | - Philippe Lucarelli
- Systems Biology Group, Life Science Research Unit, University of Luxembourg, 6, avenue du Swing, Belval, 4367 Luxembourg
| | - Dagmar Kulms
- Experimental Dermatology, Department of Dermatology, Dresden University of Technology, Fetscherstraße 105, Dresden, 01307 Germany
| | - Thomas Sauter
- Systems Biology Group, Life Science Research Unit, University of Luxembourg, 6, avenue du Swing, Belval, 4367 Luxembourg
| |
Collapse
|
17
|
Li W, Zhang W, Zhang J. A Novel Model Integration Network Inference Algorithm with Clustering and Hub Genes Finding. Mol Inform 2020; 39:e1900075. [DOI: 10.1002/minf.201900075] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2019] [Accepted: 01/14/2020] [Indexed: 11/08/2022]
Affiliation(s)
- Wenchao Li
- State Key Laboratory of Industrial Control TechnologyInstitute of Cyber-Systems and Control of Zhejiang University Hangzhou China
| | - Wei Zhang
- State Key Laboratory of Industrial Control TechnologyInstitute of Cyber-Systems and Control of Zhejiang University Hangzhou China
| | - Jianming Zhang
- State Key Laboratory of Industrial Control TechnologyInstitute of Cyber-Systems and Control of Zhejiang University Hangzhou China
| |
Collapse
|
18
|
Wang L, Audenaert P, Michoel T. High-Dimensional Bayesian Network Inference From Systems Genetics Data Using Genetic Node Ordering. Front Genet 2019; 10:1196. [PMID: 31921278 PMCID: PMC6933017 DOI: 10.3389/fgene.2019.01196] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Accepted: 10/29/2019] [Indexed: 11/23/2022] Open
Abstract
Studying the impact of genetic variation on gene regulatory networks is essential to understand the biological mechanisms by which genetic variation causes variation in phenotypes. Bayesian networks provide an elegant statistical approach for multi-trait genetic mapping and modelling causal trait relationships. However, inferring Bayesian gene networks from high-dimensional genetics and genomics data is challenging, because the number of possible networks scales super-exponentially with the number of nodes, and the computational cost of conventional Bayesian network inference methods quickly becomes prohibitive. We propose an alternative method to infer high-quality Bayesian gene networks that easily scales to thousands of genes. Our method first reconstructs a node ordering by conducting pairwise causal inference tests between genes, which then allows to infer a Bayesian network via a series of independent variable selection problems, one for each gene. We demonstrate using simulated and real systems genetics data that this results in a Bayesian network with equal, and sometimes better, likelihood than the conventional methods, while having a significantly higher overlap with groundtruth networks and being orders of magnitude faster. Moreover our method allows for a unified false discovery rate control across genes and individual edges, and thus a rigorous and easily interpretable way for tuning the sparsity level of the inferred network. Bayesian network inference using pairwise node ordering is a highly efficient approach for reconstructing gene regulatory networks when prior information for the inclusion of edges exists or can be inferred from the available data.
Collapse
Affiliation(s)
- Lingfei Wang
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, United Kingdom
- Broad Institute of Harvard and MIT, Cambridge, MA, United States
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, United States
| | - Pieter Audenaert
- IDLab, Ghent University—imec, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Tom Michoel
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, United Kingdom
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| |
Collapse
|
19
|
Schubert M, Colomé-Tatché M, Foijer F. Gene networks in cancer are biased by aneuploidies and sample impurities. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1863:194444. [PMID: 31654805 DOI: 10.1016/j.bbagrm.2019.194444] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 09/05/2019] [Accepted: 10/14/2019] [Indexed: 12/14/2022]
Abstract
Gene regulatory network inference is a standard technique for obtaining structured regulatory information from, for instance, gene expression measurements. Methods performing this task have been extensively evaluated on synthetic, and to a lesser extent real data sets. In contrast to these test evaluations, applications to gene expression data of human cancers are often limited by fewer samples and more potential regulatory links, and are biased by copy number aberrations as well as cell mixtures and sample impurities. Here, we take networks inferred from TCGA cohorts as an example to show that (1) transcription factor annotations are essential to obtain reliable networks, and (2) even for state of the art methods, we expect that between 20 and 80% of edges are caused by copy number changes and cell mixtures rather than transcription factor regulation.
Collapse
Affiliation(s)
- Michael Schubert
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, the Netherlands; Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany.
| | - Maria Colomé-Tatché
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, the Netherlands; Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany; TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Floris Foijer
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, the Netherlands.
| |
Collapse
|
20
|
Loskot P, Atitey K, Mihaylova L. Comprehensive Review of Models and Methods for Inferences in Bio-Chemical Reaction Networks. Front Genet 2019; 10:549. [PMID: 31258548 PMCID: PMC6588029 DOI: 10.3389/fgene.2019.00549] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 05/24/2019] [Indexed: 01/30/2023] Open
Abstract
The key processes in biological and chemical systems are described by networks of chemical reactions. From molecular biology to biotechnology applications, computational models of reaction networks are used extensively to elucidate their non-linear dynamics. The model dynamics are crucially dependent on the parameter values which are often estimated from observations. Over the past decade, the interest in parameter and state estimation in models of (bio-) chemical reaction networks (BRNs) grew considerably. The related inference problems are also encountered in many other tasks including model calibration, discrimination, identifiability, and checking, and optimum experiment design, sensitivity analysis, and bifurcation analysis. The aim of this review paper is to examine the developments in literature to understand what BRN models are commonly used, and for what inference tasks and inference methods. The initial collection of about 700 documents concerning estimation problems in BRNs excluding books and textbooks in computational biology and chemistry were screened to select over 270 research papers and 20 graduate research theses. The paper selection was facilitated by text mining scripts to automate the search for relevant keywords and terms. The outcomes are presented in tables revealing the levels of interest in different inference tasks and methods for given models in the literature as well as the research trends are uncovered. Our findings indicate that many combinations of models, tasks and methods are still relatively unexplored, and there are many new research opportunities to explore combinations that have not been considered-perhaps for good reasons. The most common models of BRNs in literature involve differential equations, Markov processes, mass action kinetics, and state space representations whereas the most common tasks are the parameter inference and model identification. The most common methods in literature are Bayesian analysis, Monte Carlo sampling strategies, and model fitting to data using evolutionary algorithms. The new research problems which cannot be directly deduced from the text mining data are also discussed.
Collapse
Affiliation(s)
- Pavel Loskot
- College of Engineering, Swansea University, Swansea, United Kingdom
| | - Komlan Atitey
- College of Engineering, Swansea University, Swansea, United Kingdom
| | - Lyudmila Mihaylova
- Department of Automatic Control and Systems Engineering, University of Sheffield, Sheffield, United Kingdom
| |
Collapse
|
21
|
Abstract
Inferring gene regulatory networks from expression data is a very challenging problem that has raised the interest of the scientific community. Different algorithms have been proposed to try to solve this issue, but it has been shown that different methods have some particular biases and strengths, and none of them is the best across all types of data and datasets. As a result, the idea of aggregating various network inferences through a consensus mechanism naturally arises. In this chapter, a common framework to standardize already proposed consensus methods is presented, and based on this framework different proposals are introduced and analyzed in two different scenarios: Homogeneous and Heterogeneous. The first scenario reflects situations where the networks to be aggregated are rather similar because they are obtained with inference algorithms working on the same data, whereas the second scenario deals with very diverse networks because various sources of data are used to generate the individual networks. A procedure for combining multiple network inference algorithms is analyzed in a systematic way. The results show that there is a very significant difference between these two scenarios, and that the best way to combine networks in the Heterogeneous scenario is not the most commonly used. We show in particular that aggregation in the Heterogeneous scenario can be very beneficial if the individual networks are combined with our new proposed method ScaleLSum.
Collapse
|
22
|
Liu W, Rajapakse JC. Fusing gene expressions and transitive protein-protein interactions for inference of gene regulatory networks. BMC SYSTEMS BIOLOGY 2019; 13:37. [PMID: 30953534 PMCID: PMC6449891 DOI: 10.1186/s12918-019-0695-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
BACKGROUND Systematic fusion of multiple data sources for Gene Regulatory Networks (GRN) inference remains a key challenge in systems biology. We incorporate information from protein-protein interaction networks (PPIN) into the process of GRN inference from gene expression (GE) data. However, existing PPIN remain sparse and transitive protein interactions can help predict missing protein interactions. We therefore propose a systematic probabilistic framework on fusing GE data and transitive protein interaction data to coherently build GRN. RESULTS We use a Gaussian Mixture Model (GMM) to soft-cluster GE data, allowing overlapping cluster memberships. Next, a heuristic method is proposed to extend sparse PPIN by incorporating transitive linkages. We then propose a novel way to score extended protein interactions by combining topological properties of PPIN and correlations of GE. Following this, GE data and extended PPIN are fused using a Gaussian Hidden Markov Model (GHMM) in order to identify gene regulatory pathways and refine interaction scores that are then used to constrain the GRN structure. We employ a Bayesian Gaussian Mixture (BGM) model to refine the GRN derived from GE data by using the structural priors derived from GHMM. Experiments on real yeast regulatory networks demonstrate both the feasibility of the extended PPIN in predicting transitive protein interactions and its effectiveness on improving the coverage and accuracy the proposed method of fusing PPIN and GE to build GRN. CONCLUSION The GE and PPIN fusion model outperforms both the state-of-the-art single data source models (CLR, GENIE3, TIGRESS) as well as existing fusion models under various constraints.
Collapse
Affiliation(s)
- Wenting Liu
- School of Public Health and Management, Hubei University of Medicine, Shiyan, Hubei China
- Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA USA
| | - Jagath C. Rajapakse
- School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
23
|
Causal Queries from Observational Data in Biological Systems via Bayesian Networks: An Empirical Study in Small Networks. Methods Mol Biol 2018. [PMID: 30547398 DOI: 10.1007/978-1-4939-8882-2_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Biological networks are a very convenient modeling and visualization tool to discover knowledge from modern high-throughput genomics and post-genomics data sets. Indeed, biological entities are not isolated but are components of complex multilevel systems. We go one step further and advocate for the consideration of causal representations of the interactions in living systems. We present the causal formalism and bring it out in the context of biological networks, when the data is observational. We also discuss its ability to decipher the causal information flow as observed in gene expression. We also illustrate our exploration by experiments on small simulated networks as well as on a real biological data set.
Collapse
|
24
|
Azuaje F, Kaoma T, Jeanty C, Nazarov PV, Muller A, Kim SY, Dittmar G, Golebiewska A, Niclou SP. Hub genes in a pan-cancer co-expression network show potential for predicting drug responses. F1000Res 2018; 7:1906. [PMID: 30881689 PMCID: PMC6406180 DOI: 10.12688/f1000research.17149.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/22/2019] [Indexed: 12/15/2022] Open
Abstract
Background: The topological analysis of networks extracted from different types of "omics" data is a useful strategy for characterizing biologically meaningful properties of the complex systems underlying these networks. In particular, the biological significance of highly connected genes in diverse molecular networks has been previously determined using data from several model organisms and phenotypes. Despite such insights, the predictive potential of candidate hubs in gene co-expression networks in the specific context of cancer-related drug experiments remains to be deeply investigated. The examination of such associations may offer opportunities for the accurate prediction of anticancer drug responses. Methods: Here, we address this problem by: a) analyzing a co-expression network obtained from thousands of cancer cell lines, b) detecting significant network hubs, and c) assessing their capacity to predict drug sensitivity using data from thousands of drug experiments. We investigated the prediction capability of those genes using a multiple linear regression model, independent datasets, comparisons with other models and our own in vitro experiments. Results: These analyses led to the identification of 47 hub genes, which are implicated in a diverse range of cancer-relevant processes and pathways. Overall, encouraging agreements between predicted and observed drug sensitivities were observed in public datasets, as well as in our in vitro validations for four glioblastoma cell lines and four drugs. To facilitate further research, we share our hub-based drug sensitivity prediction model as an online tool. Conclusions: Our research shows that co-expression network hubs are biologically interesting and exhibit potential for predicting drug responses in vitro. These findings motivate further investigations about the relevance and application of our unbiased discovery approach in pre-clinical, translationally-oriented research.
Collapse
Affiliation(s)
| | - Tony Kaoma
- Luxembourg Institute of Health (LIH), Strassen, Luxembourg
| | - Céline Jeanty
- Luxembourg Institute of Health (LIH), Strassen, Luxembourg
| | | | - Arnaud Muller
- Luxembourg Institute of Health (LIH), Strassen, Luxembourg
| | - Sang-Yoon Kim
- Luxembourg Institute of Health (LIH), Strassen, Luxembourg
| | - Gunnar Dittmar
- Luxembourg Institute of Health (LIH), Strassen, Luxembourg
| | | | | |
Collapse
|
25
|
Azuaje F, Kaoma T, Jeanty C, Nazarov PV, Muller A, Kim SY, Dittmar G, Golebiewska A, Niclou SP. Hub genes in a pan-cancer co-expression network show potential for predicting drug responses. F1000Res 2018; 7:1906. [PMID: 30881689 PMCID: PMC6406180 DOI: 10.12688/f1000research.17149.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/30/2018] [Indexed: 10/07/2023] Open
Abstract
Background: The topological analysis of networks extracted from different types of "omics" data is a useful strategy for characterizing biologically meaningful properties of the complex systems underlying these networks. In particular, the biological significance of highly connected genes in diverse molecular networks has been previously determined using data from several model organisms and phenotypes. Despite such insights, the predictive potential of candidate hubs in gene co-expression networks in the specific context of cancer-related drug experiments remains to be deeply investigated. The examination of such associations may offer opportunities for the accurate prediction of anticancer drug responses. Methods: Here, we address this problem by: a) analyzing a co-expression network obtained from thousands of cancer cell lines, b) detecting significant network hubs, and c) assessing their capacity to predict drug sensitivity using data from thousands of drug experiments. We investigated the prediction capability of those genes using a multiple linear regression model, independent datasets, comparisons with other models and our own in vitro experiments. Results: These analyses led to the identification of 47 hub genes, which are implicated in a diverse range of cancer-relevant processes and pathways. Overall, encouraging agreements between predicted and observed drug sensitivities were observed in public datasets, as well as in our in vitro validations for four glioblastoma cell lines and four drugs. To facilitate further research, we share our hub-based drug sensitivity prediction model as an online tool. Conclusions: Our research shows that co-expression network hubs are biologically interesting and exhibit potential for predicting drug responses in vitro. These findings motivate further investigations about the relevance and application of our unbiased discovery approach in pre-clinical, translationally-oriented research.
Collapse
Affiliation(s)
| | - Tony Kaoma
- Luxembourg Institute of Health (LIH), Strassen, Luxembourg
| | - Céline Jeanty
- Luxembourg Institute of Health (LIH), Strassen, Luxembourg
| | | | - Arnaud Muller
- Luxembourg Institute of Health (LIH), Strassen, Luxembourg
| | - Sang-Yoon Kim
- Luxembourg Institute of Health (LIH), Strassen, Luxembourg
| | - Gunnar Dittmar
- Luxembourg Institute of Health (LIH), Strassen, Luxembourg
| | | | | |
Collapse
|
26
|
Kuzmanovski V, Todorovski L, Džeroski S. Extensive evaluation of the generalized relevance network approach to inferring gene regulatory networks. Gigascience 2018; 7:5099470. [PMID: 30239704 PMCID: PMC6420648 DOI: 10.1093/gigascience/giy118] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2017] [Accepted: 09/11/2018] [Indexed: 01/15/2023] Open
Abstract
Background The generalized relevance network approach to network inference reconstructs network links based on the strength of associations between data in individual network nodes. It can reconstruct undirected networks, i.e., relevance networks, sensu stricto, as well as directed networks, referred to as causal relevance networks. The generalized approach allows the use of an arbitrary measure of pairwise association between nodes, an arbitrary scoring scheme that transforms the associations into weights of the network links, and a method for inferring the directions of the links. While this makes the approach powerful and flexible, it introduces the challenge of finding a combination of components that would perform well on a given inference task. Results We address this challenge by performing an extensive empirical analysis of the performance of 114 variants of the generalized relevance network approach on 47 tasks of gene network inference from time-series data and 39 tasks of gene network inference from steady-state data. We compare the different variants in a multi-objective manner, considering their ranking in terms of different performance metrics. The results suggest a set of recommendations that provide guidance for selecting an appropriate variant of the approach in different data settings. Conclusions The association measures based on correlation, combined with a particular scoring scheme of asymmetric weighting, lead to optimal performance of the relevance network approach in the general case. In the two special cases of inference tasks involving short time-series data and/or large networks, association measures based on identifying qualitative trends in the time series are more appropriate.
Collapse
Affiliation(s)
- Vladimir Kuzmanovski
- Department of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia
| | - Ljupco Todorovski
- Department of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia.,Faculty of Public Administration, University of Ljubljana, Gosarjeva ulica 5, 1000 Ljubljana, Slovenia
| | - Sašo Džeroski
- Department of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia
| |
Collapse
|
27
|
Inference of Genome-Scale Gene Regulatory Networks: Are There Differences in Biological and Clinical Validations? MACHINE LEARNING AND KNOWLEDGE EXTRACTION 2018. [DOI: 10.3390/make1010008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Causal networks, e.g., gene regulatory networks (GRNs) inferred from gene expression data, contain a wealth of information but are defying simple, straightforward and low-budget experimental validations. In this paper, we elaborate on this problem and discuss distinctions between biological and clinical validations. As a result, validation differences for GRNs reflect known differences between basic biological and clinical research questions making the validations context specific. Hence, the meaning of biologically and clinically meaningful GRNs can be very different. For a concerted approach to a problem of this size, we suggest the establishment of the HUMAN GENE REGULATORY NETWORK PROJECT which provides the information required for biological and clinical validations alike.
Collapse
|
28
|
Carlin DE, Paull EO, Graim K, Wong CK, Bivol A, Ryabinin P, Ellrott K, Sokolov A, Stuart JM. Prophetic Granger Causality to infer gene regulatory networks. PLoS One 2017; 12:e0170340. [PMID: 29211761 PMCID: PMC5718405 DOI: 10.1371/journal.pone.0170340] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 10/26/2017] [Indexed: 01/09/2023] Open
Abstract
We introduce a novel method called Prophetic Granger Causality (PGC) for inferring gene regulatory networks (GRNs) from protein-level time series data. The method uses an L1-penalized regression adaptation of Granger Causality to model protein levels as a function of time, stimuli, and other perturbations. When combined with a data-independent network prior, the framework outperformed all other methods submitted to the HPN-DREAM 8 breast cancer network inference challenge. Our investigations reveal that PGC provides complementary information to other approaches, raising the performance of ensemble learners, while on its own achieves moderate performance. Thus, PGC serves as a valuable new tool in the bioinformatics toolkit for analyzing temporal datasets. We investigate the general and cell-specific interactions predicted by our method and find several novel interactions, demonstrating the utility of the approach in charting new tumor wiring.
Collapse
Affiliation(s)
- Daniel E. Carlin
- University of California San Diego, Department of Medicine, La Jolla, CA, United States of America
| | - Evan O. Paull
- University of California Santa Cruz, Department of Biomolecular Engineering, Santa Cruz, CA, United States of America
| | - Kiley Graim
- University of California Santa Cruz, Department of Biomolecular Engineering, Santa Cruz, CA, United States of America
| | - Christopher K. Wong
- University of California Santa Cruz, Department of Biomolecular Engineering, Santa Cruz, CA, United States of America
| | - Adrian Bivol
- University of California Santa Cruz, Department of Biomolecular Engineering, Santa Cruz, CA, United States of America
| | - Peter Ryabinin
- University of California Santa Cruz, Department of Biomolecular Engineering, Santa Cruz, CA, United States of America
| | - Kyle Ellrott
- Oregon Health Sciences University, Department of Biomedical Engineering, Portland, OR, United States of America
| | - Artem Sokolov
- University of California Santa Cruz, Department of Biomolecular Engineering, Santa Cruz, CA, United States of America
- * E-mail: (JMS); (AS)
| | - Joshua M. Stuart
- University of California Santa Cruz, Department of Biomolecular Engineering, Santa Cruz, CA, United States of America
- * E-mail: (JMS); (AS)
| |
Collapse
|
29
|
Bourdakou MM, Spyrou GM. Informed walks: whispering hints to gene hunters inside networks' jungle. BMC SYSTEMS BIOLOGY 2017; 11:97. [PMID: 29020948 PMCID: PMC5637247 DOI: 10.1186/s12918-017-0473-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 10/03/2017] [Indexed: 12/24/2022]
Abstract
Background Systemic approaches offer a different point of view on the analysis of several types of molecular associations as well as on the identification of specific gene communities in several cancer types. However, due to lack of sufficient data needed to construct networks based on experimental evidence, statistical gene co-expression networks are widely used instead. Many efforts have been made to exploit the information hidden in these networks. However, these approaches still need to capitalize comprehensively the prior knowledge encrypted into molecular pathway associations and improve their efficiency regarding the discovery of both exclusive subnetworks as candidate biomarkers and conserved subnetworks that may uncover common origins of several cancer types. Methods In this study we present the development of the Informed Walks model based on random walks that incorporate information from molecular pathways to mine candidate genes and gene-gene links. The proposed model has been applied to TCGA (The Cancer Genome Atlas) datasets from seven different cancer types, exploring the reconstructed co-expression networks of the whole set of genes and driving to highlighted sub-networks for each cancer type. In the sequel, we elucidated the impact of each subnetwork on the indication of underlying exclusive and common molecular mechanisms as well as on the short-listing of drugs that have the potential to suppress the corresponding cancer type through a drug-repurposing pipeline. Conclusions We have developed a method of gene subnetwork highlighting based on prior knowledge, capable to give fruitful insights regarding the underlying molecular mechanisms and valuable input to drug-repurposing pipelines for a variety of cancer types. Electronic supplementary material The online version of this article (10.1186/s12918-017-0473-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Marilena M Bourdakou
- Bioinformatics ERA Chair, The Cyprus Institute of Neurology and Genetics, 6 International Airport Avenue, Ayios Dometios, 2370, Nicosia, Cyprus.,Center of Systems Biology, Biomedical Research Foundation, Academy of Athens, Soranou Ephessiou 4, 115 27, Athens, Greece
| | - George M Spyrou
- Bioinformatics ERA Chair, The Cyprus Institute of Neurology and Genetics, 6 International Airport Avenue, Ayios Dometios, 2370, Nicosia, Cyprus.
| |
Collapse
|
30
|
Yu B, Xu JM, Li S, Chen C, Chen RX, Wang L, Zhang Y, Wang MH. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method. Oncotarget 2017; 8:80373-80392. [PMID: 29113310 PMCID: PMC5655205 DOI: 10.18632/oncotarget.21268] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 08/27/2017] [Indexed: 01/31/2023] Open
Abstract
Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli, and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.
Collapse
Affiliation(s)
- Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- CAS Key Laboratory of Geospace Environment, Department of Geophysics and Planetary Science, University of Science and Technology of China, Hefei 230026, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Jia-Meng Xu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Shan Li
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Cheng Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Rui-Xin Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Lei Wang
- Key Laboratory of Eco-chemical Engineering, Ministry of Education, Laboratory of Inorganic Synthesis and Applied Chemistry, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao 266042, China
| | - Yan Zhang
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
- College of Electromechanical Engineering, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Ming-Hui Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Bioinformatics and Systems Biology Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| |
Collapse
|
31
|
Walker L, Boddington C, Jenkins D, Wang Y, Grønlund JT, Hulsmans J, Kumar S, Patel D, Moore JD, Carter A, Samavedam S, Bonomo G, Hersh DS, Coruzzi GM, Burroughs NJ, Gifford ML. Changes in Gene Expression in Space and Time Orchestrate Environmentally Mediated Shaping of Root Architecture. THE PLANT CELL 2017; 29:2393-2412. [PMID: 28893852 PMCID: PMC5774560 DOI: 10.1105/tpc.16.00961] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2016] [Revised: 08/16/2017] [Accepted: 09/07/2017] [Indexed: 05/02/2023]
Abstract
Shaping of root architecture is a quintessential developmental response that involves the concerted action of many different cell types, is highly dynamic, and underpins root plasticity. To determine to what extent the environmental regulation of lateral root development is a product of cell-type preferential activities, we tracked transcriptomic responses to two different treatments that both change root development in Arabidopsis thaliana at an unprecedented level of temporal detail. We found that individual transcripts are expressed with a very high degree of temporal and spatial specificity, yet biological processes are commonly regulated, in a mechanism we term response nonredundancy. Using causative gene network inference to compare the genes regulated in different cell types and during responses to nitrogen and a biotic interaction, we found that common transcriptional modules often regulate the same gene families but control different individual members of these families, specific to response and cell type. This reinforces that the activity of a gene cannot be defined simply as molecular function; rather, it is a consequence of spatial location, expression timing, and environmental responsiveness.
Collapse
Affiliation(s)
- Liam Walker
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Clare Boddington
- Warwick Systems Biology Centre, University of Warwick, Senate House, Coventry CV4 7AL, United Kingdom
| | - Dafyd Jenkins
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, United Kingdom
- Warwick Systems Biology Centre, University of Warwick, Senate House, Coventry CV4 7AL, United Kingdom
| | - Ying Wang
- Warwick Systems Biology Centre, University of Warwick, Senate House, Coventry CV4 7AL, United Kingdom
| | - Jesper T Grønlund
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Jo Hulsmans
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, United Kingdom
- Warwick Systems Biology Centre, University of Warwick, Senate House, Coventry CV4 7AL, United Kingdom
| | - Sanjeev Kumar
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Dhaval Patel
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Jonathan D Moore
- Warwick Systems Biology Centre, University of Warwick, Senate House, Coventry CV4 7AL, United Kingdom
| | - Anthony Carter
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, United Kingdom
- Warwick Systems Biology Centre, University of Warwick, Senate House, Coventry CV4 7AL, United Kingdom
| | - Siva Samavedam
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Giovanni Bonomo
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York 10003
| | - David S Hersh
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York 10003
| | - Gloria M Coruzzi
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York 10003
| | - Nigel J Burroughs
- Warwick Systems Biology Centre, University of Warwick, Senate House, Coventry CV4 7AL, United Kingdom
- Warwick Mathematics Institute, University of Warwick, Zeeman Building, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Miriam L Gifford
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, United Kingdom
| |
Collapse
|
32
|
Tripathi S, Lloyd-Price J, Ribeiro A, Yli-Harja O, Dehmer M, Emmert-Streib F. sgnesR: An R package for simulating gene expression data from an underlying real gene network structure considering delay parameters. BMC Bioinformatics 2017; 18:325. [PMID: 28676075 PMCID: PMC5496254 DOI: 10.1186/s12859-017-1731-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 06/15/2017] [Indexed: 01/04/2023] Open
Abstract
Background sgnesR (Stochastic Gene Network Expression Simulator in R) is an R package that provides an interface to simulate gene expression data from a given gene network using the stochastic simulation algorithm (SSA). The package allows various options for delay parameters and can easily included in reactions for promoter delay, RNA delay and Protein delay. A user can tune these parameters to model various types of reactions within a cell. As examples, we present two network models to generate expression profiles. We also demonstrated the inference of networks and the evaluation of association measure of edge and non-edge components from the generated expression profiles. Results The purpose of sgnesR is to enable an easy to use and a quick implementation for generating realistic gene expression data from biologically relevant networks that can be user selected. Conclusions sgnesR is freely available for academic use. The R package has been tested for R 3.2.0 under Linux, Windows and Mac OS X. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1731-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shailesh Tripathi
- Predictive Medicine and Data Analytics Lab, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Jason Lloyd-Price
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, USA.,Laboratory of Biosystem Dynamics, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Andre Ribeiro
- Laboratory of Biosystem Dynamics, Department of Signal Processing, Tampere University of Technology, Tampere, Finland.,Institute of Biosciences and Medical Technology, Tampere, Finland
| | - Olli Yli-Harja
- Institute of Biosciences and Medical Technology, Tampere, Finland.,Computational Systems Biology, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Matthias Dehmer
- Institute for Theoretical Informatics, Mathematics and Operations Research, Department of Computer Science, Universität der Bundeswehr München, Munich, Germany
| | - Frank Emmert-Streib
- Predictive Medicine and Data Analytics Lab, Department of Signal Processing, Tampere University of Technology, Tampere, Finland. .,Institute of Biosciences and Medical Technology, Tampere, Finland.
| |
Collapse
|
33
|
Moran B, Rahman A, Palonen K, Lanigan FT, Gallagher WM. Master Transcriptional Regulators in Cancer: Discovery via Reverse Engineering Approaches and Subsequent Validation. Cancer Res 2017; 77:2186-2190. [PMID: 28428271 DOI: 10.1158/0008-5472.can-16-1813] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2016] [Revised: 09/08/2016] [Accepted: 02/22/2017] [Indexed: 11/16/2022]
Abstract
Reverse engineering of transcriptional networks using gene expression data enables identification of genes that underpin the development and progression of different cancers. Methods to this end have been available for over a decade and, with a critical mass of transcriptomic data in the oncology arena having been reached, they are ever more applicable. Extensive and complex networks can be distilled into a small set of key master transcriptional regulators (MTR), genes that are very highly connected and have been shown to be involved in processes of known importance in disease. Interpreting and validating the results of standardized bioinformatic methods is of crucial importance in determining the inherent value of MTRs. In this review, we briefly describe how MTRs are identified and focus on providing an overview of how MTRs can and have been validated for use in clinical decision making in malignant diseases, along with serving as tractable therapeutic targets. Cancer Res; 77(9); 2186-90. ©2017 AACR.
Collapse
Affiliation(s)
- Bruce Moran
- Cancer Biology and Therapeutics Laboratory, UCD School of Biomolecular and Biomedical Research, UCD Conway Institute, University College Dublin, Dublin, Ireland.,OncoMark Limited, NovaUCD, Belfield Innovation Park, Belfield, Dublin, Ireland
| | - Arman Rahman
- Cancer Biology and Therapeutics Laboratory, UCD School of Biomolecular and Biomedical Research, UCD Conway Institute, University College Dublin, Dublin, Ireland.,OncoMark Limited, NovaUCD, Belfield Innovation Park, Belfield, Dublin, Ireland
| | - Katja Palonen
- Cancer Biology and Therapeutics Laboratory, UCD School of Biomolecular and Biomedical Research, UCD Conway Institute, University College Dublin, Dublin, Ireland.,OncoMark Limited, NovaUCD, Belfield Innovation Park, Belfield, Dublin, Ireland
| | - Fiona T Lanigan
- Cancer Biology and Therapeutics Laboratory, UCD School of Biomolecular and Biomedical Research, UCD Conway Institute, University College Dublin, Dublin, Ireland
| | - William M Gallagher
- Cancer Biology and Therapeutics Laboratory, UCD School of Biomolecular and Biomedical Research, UCD Conway Institute, University College Dublin, Dublin, Ireland. .,OncoMark Limited, NovaUCD, Belfield Innovation Park, Belfield, Dublin, Ireland
| |
Collapse
|
34
|
Monneret G, Jaffrézic F, Rau A, Zerjal T, Nuel G. Identification of marginal causal relationships in gene networks from observational and interventional expression data. PLoS One 2017; 12:e0171142. [PMID: 28301504 PMCID: PMC5354375 DOI: 10.1371/journal.pone.0171142] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Accepted: 01/01/2017] [Indexed: 11/29/2022] Open
Abstract
Causal network inference is an important methodological challenge in biology as well as other areas of application. Although several causal network inference methods have been proposed in recent years, they are typically applicable for only a small number of genes, due to the large number of parameters to be estimated and the limited number of biological replicates available. In this work, we consider the specific case of transcriptomic studies made up of both observational and interventional data in which a single gene of biological interest is knocked out. We focus on a marginal causal estimation approach, based on the framework of Gaussian directed acyclic graphs, to infer causal relationships between the knocked-out gene and a large set of other genes. In a simulation study, we found that our proposed method accurately differentiates between downstream causal relationships and those that are upstream or simply associative. It also enables an estimation of the total causal effects between the gene of interest and the remaining genes. Our method performed very similarly to a classical differential analysis for experiments with a relatively large number of biological replicates, but has the advantage of providing a formal causal interpretation. Our proposed marginal causal approach is computationally efficient and may be applied to several thousands of genes simultaneously. In addition, it may help highlight subsets of genes of interest for a more thorough subsequent causal network inference. The method is implemented in an R package called MarginalCausality (available on GitHub).
Collapse
Affiliation(s)
- Gilles Monneret
- UMR GABI, AgroParisTech, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France
- LPMA, UMR CNRS 7599, UPMC, Sorbonne Universités, 4 place Jussieu, 75005 Paris, France
- * E-mail:
| | - Florence Jaffrézic
- UMR GABI, AgroParisTech, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Andrea Rau
- UMR GABI, AgroParisTech, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Tatiana Zerjal
- UMR GABI, AgroParisTech, INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Grégory Nuel
- LPMA, UMR CNRS 7599, UPMC, Sorbonne Universités, 4 place Jussieu, 75005 Paris, France
| |
Collapse
|
35
|
Lai PY. Reconstructing network topology and coupling strengths in directed networks of discrete-time dynamics. Phys Rev E 2017; 95:022311. [PMID: 28297975 DOI: 10.1103/physreve.95.022311] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Indexed: 05/22/2023]
Abstract
Reconstructing network connection topology and interaction strengths solely from measurement of the dynamics of the nodes is a challenging inverse problem of broad applicability in various areas of science and engineering. For a discrete-time step network under noises whose noise-free dynamics is stationary, we derive general analytic results relating the weighted connection matrix of the network to the correlation functions obtained from time-series measurements of the nodes for networks with one-dimensional intrinsic node dynamics. Information about the intrinsic node dynamics and the noise strengths acting on the nodes can also be obtained. Based on these results, we develop a scheme that can reconstruct the above information of the network using only the time-series measurements of node dynamics as input. Reconstruction formulas for higher-dimensional node dynamics are also derived and illustrated with a two-dimensional node dynamics network system. Furthermore, we extend our results and obtain a reconstruction scheme even for the cases when the noise-free dynamics is periodic. We demonstrate that our method can give accurate reconstruction results for weighted directed networks with linear or nonlinear node dynamics of various connection topologies, and with linear or nonlinear couplings.
Collapse
Affiliation(s)
- Pik-Yin Lai
- Department of Physics and Center for Complex Systems, National Central University, Chung-Li District, Taoyuan City 320, Taiwan, Republic of China
| |
Collapse
|
36
|
Abstract
The inference of gene regulatory networks is an important process that contributes to a better understanding of biological and biomedical problems. These networks aim to capture the causal molecular interactions of biological processes and provide valuable information about normal cell physiology. In this book chapter, we introduce GNI methods, namely C3NET, RN, ARACNE, CLR, and MRNET and describe their components and working mechanisms. We present a comparison of the performance of these algorithms using the results of our previously published studies. According to the study results, which were obtained from simulated as well as expression data sets, the inference algorithm C3NET provides consistently better results than the other widely used methods.
Collapse
|
37
|
He FQ, Ollert M. Network-Guided Key Gene Discovery for a Given Cellular Process. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2016. [PMID: 27783134 DOI: 10.1007/10_2016_39] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Identification of key genes for a given physiological or pathological process is an essential but still very challenging task for the entire biomedical research community. Statistics-based approaches, such as genome-wide association study (GWAS)- or quantitative trait locus (QTL)-related analysis have already made enormous contributions to identifying key genes associated with a given disease or phenotype, the success of which is however very much dependent on a huge number of samples. Recent advances in network biology, especially network inference directly from genome-scale data and the following-up network analysis, opens up new avenues to predict key genes driving a given biological process or cellular function. Here we review and compare the current approaches in predicting key genes, which have no chances to stand out by classic differential expression analysis, from gene-regulatory, protein-protein interaction, or gene expression correlation networks. We elaborate these network-based approaches mainly in the context of immunology and infection, and urge more usage of correlation network-based predictions. Such network-based key gene discovery approaches driven by information-enriched 'omics' data should be very useful for systematic key gene discoveries for any given biochemical process or cellular function, and also valuable for novel drug target discovery and novel diagnostic, prognostic and therapeutic-efficiency marker prediction for a specific disease or disorder.
Collapse
Affiliation(s)
- Feng Q He
- Department of Infection and Immunity, Group of Immune Systems Biology, Luxembourg Institute of Health, 29, rue Henri Koch, 4354, Esch-sur-Alzette, Luxembourg.
| | - Markus Ollert
- Department of Infection and Immunity, Group of Allergy and Clinical Immunology, Luxembourg Institute of Health, 29, rue Henri Koch, 4354, Esch-sur-Alzette, Luxembourg
- Odense Research Center for Anaphylaxis, Department of Dermatology and Allergy Center, Odense University Hospital, University of Southern Denmark, 5000, Odense C, Denmark
| |
Collapse
|
38
|
Banf M, Rhee SY. Computational inference of gene regulatory networks: Approaches, limitations and opportunities. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2016; 1860:41-52. [PMID: 27641093 DOI: 10.1016/j.bbagrm.2016.09.003] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Revised: 09/08/2016] [Accepted: 09/08/2016] [Indexed: 10/21/2022]
Abstract
Gene regulatory networks lie at the core of cell function control. In E. coli and S. cerevisiae, the study of gene regulatory networks has led to the discovery of regulatory mechanisms responsible for the control of cell growth, differentiation and responses to environmental stimuli. In plants, computational rendering of gene regulatory networks is gaining momentum, thanks to the recent availability of high-quality genomes and transcriptomes and development of computational network inference approaches. Here, we review current techniques, challenges and trends in gene regulatory network inference and highlight challenges and opportunities for plant science. We provide plant-specific application examples to guide researchers in selecting methodologies that suit their particular research questions. Given the interdisciplinary nature of gene regulatory network inference, we tried to cater to both biologists and computer scientists to help them engage in a dialogue about concepts and caveats in network inference. Specifically, we discuss problems and opportunities in heterogeneous data integration for eukaryotic organisms and common caveats to be considered during network model evaluation. This article is part of a Special Issue entitled: Plant Gene Regulatory Mechanisms and Networks, edited by Dr. Erich Grotewold and Dr. Nathan Springer.
Collapse
Affiliation(s)
- Michael Banf
- Department of Plant Biology, Carnegie Institution for Science, 260 Panama Street, Stanford 93405, United States.
| | - Seung Y Rhee
- Department of Plant Biology, Carnegie Institution for Science, 260 Panama Street, Stanford 93405, United States.
| |
Collapse
|
39
|
Hung LH, Kristiyanto D, Lee SB, Yeung KY. GUIdock: Using Docker Containers with a Common Graphics User Interface to Address the Reproducibility of Research. PLoS One 2016; 11:e0152686. [PMID: 27045593 PMCID: PMC4821530 DOI: 10.1371/journal.pone.0152686] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 03/17/2016] [Indexed: 12/03/2022] Open
Abstract
Reproducibility is vital in science. For complex computational methods, it is often necessary, not just to recreate the code, but also the software and hardware environment to reproduce results. Virtual machines, and container software such as Docker, make it possible to reproduce the exact environment regardless of the underlying hardware and operating system. However, workflows that use Graphical User Interfaces (GUIs) remain difficult to replicate on different host systems as there is no high level graphical software layer common to all platforms. GUIdock allows for the facile distribution of a systems biology application along with its graphics environment. Complex graphics based workflows, ubiquitous in systems biology, can now be easily exported and reproduced on many different platforms. GUIdock uses Docker, an open source project that provides a container with only the absolutely necessary software dependencies and configures a common X Windows (X11) graphic interface on Linux, Macintosh and Windows platforms. As proof of concept, we present a Docker package that contains a Bioconductor application written in R and C++ called networkBMA for gene network inference. Our package also includes Cytoscape, a java-based platform with a graphical user interface for visualizing and analyzing gene networks, and the CyNetworkBMA app, a Cytoscape app that allows the use of networkBMA via the user-friendly Cytoscape interface.
Collapse
Affiliation(s)
- Ling-Hong Hung
- Institute of Technology, University of Washington, Tacoma, WA 98402, United States of America
| | - Daniel Kristiyanto
- Institute of Technology, University of Washington, Tacoma, WA 98402, United States of America
| | - Sung Bong Lee
- Institute of Technology, University of Washington, Tacoma, WA 98402, United States of America
| | - Ka Yee Yeung
- Institute of Technology, University of Washington, Tacoma, WA 98402, United States of America
- * E-mail:
| |
Collapse
|
40
|
Riccadonna S, Jurman G, Visintainer R, Filosi M, Furlanello C. DTW-MIC Coexpression Networks from Time-Course Data. PLoS One 2016; 11:e0152648. [PMID: 27031641 PMCID: PMC4816347 DOI: 10.1371/journal.pone.0152648] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Accepted: 03/17/2016] [Indexed: 01/01/2023] Open
Abstract
When modeling coexpression networks from high-throughput time course data, Pearson Correlation Coefficient (PCC) is one of the most effective and popular similarity functions. However, its reliability is limited since it cannot capture non-linear interactions and time shifts. Here we propose to overcome these two issues by employing a novel similarity function, Dynamic Time Warping Maximal Information Coefficient (DTW-MIC), combining a measure taking care of functional interactions of signals (MIC) and a measure identifying time lag (DTW). By using the Hamming-Ipsen-Mikhailov (HIM) metric to quantify network differences, the effectiveness of the DTW-MIC approach is demonstrated on a set of four synthetic and one transcriptomic datasets, also in comparison to TimeDelay ARACNE and Transfer Entropy.
Collapse
Affiliation(s)
| | - Giuseppe Jurman
- Research and Innovation Centre, Fondazione Edmund Mach, San Michele all’Adige, Italy
| | - Roberto Visintainer
- Research and Innovation Centre, Fondazione Edmund Mach, San Michele all’Adige, Italy
| | - Michele Filosi
- Research and Innovation Centre, Fondazione Edmund Mach, San Michele all’Adige, Italy
| | - Cesare Furlanello
- Research and Innovation Centre, Fondazione Edmund Mach, San Michele all’Adige, Italy
| |
Collapse
|
41
|
Discovering gene re-ranking efficiency and conserved gene-gene relationships derived from gene co-expression network analysis on breast cancer data. Sci Rep 2016; 6:20518. [PMID: 26892392 PMCID: PMC4759568 DOI: 10.1038/srep20518] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Accepted: 01/05/2016] [Indexed: 12/18/2022] Open
Abstract
Systemic approaches are essential in the discovery of disease-specific genes, offering a different perspective and new tools on the analysis of several types of molecular relationships, such as gene co-expression or protein-protein interactions. However, due to lack of experimental information, this analysis is not fully applicable. The aim of this study is to reveal the multi-potent contribution of statistical network inference methods in highlighting significant genes and interactions. We have investigated the ability of statistical co-expression networks to highlight and prioritize genes for breast cancer subtypes and stages in terms of: (i) classification efficiency, (ii) gene network pattern conservation, (iii) indication of involved molecular mechanisms and (iv) systems level momentum to drug repurposing pipelines. We have found that statistical network inference methods are advantageous in gene prioritization, are capable to contribute to meaningful network signature discovery, give insights regarding the disease-related mechanisms and boost drug discovery pipelines from a systems point of view.
Collapse
|
42
|
Pathania S, Bagler G, Ahuja PS. Differential Network Analysis Reveals Evolutionary Complexity in Secondary Metabolism of Rauvolfia serpentina over Catharanthus roseus. FRONTIERS IN PLANT SCIENCE 2016; 7:1229. [PMID: 27588023 PMCID: PMC4988974 DOI: 10.3389/fpls.2016.01229] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2016] [Accepted: 08/02/2016] [Indexed: 05/07/2023]
Abstract
Comparative co-expression analysis of multiple species using high-throughput data is an integrative approach to determine the uniformity as well as diversification in biological processes. Rauvolfia serpentina and Catharanthus roseus, both members of Apocyanacae family, are reported to have remedial properties against multiple diseases. Despite of sharing upstream of terpenoid indole alkaloid pathway, there is significant diversity in tissue-specific synthesis and accumulation of specialized metabolites in these plants. This led us to implement comparative co-expression network analysis to investigate the modules and genes responsible for differential tissue-specific expression as well as species-specific synthesis of metabolites. Toward these goals differential network analysis was implemented to identify candidate genes responsible for diversification of metabolites profile. Three genes were identified with significant difference in connectivity leading to differential regulatory behavior between these plants. These genes may be responsible for diversification of secondary metabolism, and thereby for species-specific metabolite synthesis. The network robustness of R. serpentina, determined based on topological properties, was also complemented by comparison of gene-metabolite networks of both plants, and may have evolved to have complex metabolic mechanisms as compared to C. roseus under the influence of various stimuli. This study reveals evolution of complexity in secondary metabolism of R. serpentina, and key genes that contribute toward diversification of specific metabolites.
Collapse
Affiliation(s)
- Shivalika Pathania
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Council of Scientific and Industrial ResearchPalampur, India
- *Correspondence: Shivalika Pathania
| | - Ganesh Bagler
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Council of Scientific and Industrial ResearchPalampur, India
- Center for Computational Biology, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi)New Delhi, India
- Centre for Biologically Inspired System Science, Indian Institute of Technology JodhpurJodhpur, India
- Dhirubhai Ambani Institute of Information and Communication TechnologyGandhinagar, India
- Ganesh Bagler
| | - Paramvir S. Ahuja
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Council of Scientific and Industrial ResearchPalampur, India
- Indian Institute of Science Education and Research (IISER) MohaliMohali, India
| |
Collapse
|
43
|
Hsiao YT, Lee WP, Yang W, Müller S, Flamm C, Hofacker I, Kügler P. Practical Guidelines for Incorporating Knowledge-Based and Data-Driven Strategies into the Inference of Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:64-75. [PMID: 26441429 DOI: 10.1109/tcbb.2015.2465954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Modeling gene regulatory networks (GRNs) is essential for conceptualizing how genes are expressed and how they influence each other. Typically, a reverse engineering approach is employed; this strategy is effective in reproducing possible fitting models of GRNs. To use this strategy, however, two daunting tasks must be undertaken: one task is to optimize the accuracy of inferred network behaviors; and the other task is to designate valid biological topologies for target networks. Although existing studies have addressed these two tasks for years, few of the studies can satisfy both of the requirements simultaneously. To address these difficulties, we propose an integrative modeling framework that combines knowledge-based and data-driven input sources to construct biological topologies with their corresponding network behaviors. To validate the proposed approach, a real dataset collected from the cell cycle of the yeast S. cerevisiae is used. The results show that the proposed framework can successfully infer solutions that meet the requirements of both the network behaviors and biological structures. Therefore, the outcomes are exploitable for future in vivo experimental design.
Collapse
|
44
|
Ceci M, Pio G, Kuzmanovski V, Džeroski S. Semi-Supervised Multi-View Learning for Gene Network Reconstruction. PLoS One 2015; 10:e0144031. [PMID: 26641091 PMCID: PMC4671612 DOI: 10.1371/journal.pone.0144031] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Accepted: 11/12/2015] [Indexed: 12/30/2022] Open
Abstract
The task of gene regulatory network reconstruction from high-throughput data is receiving increasing attention in recent years. As a consequence, many inference methods for solving this task have been proposed in the literature. It has been recently observed, however, that no single inference method performs optimally across all datasets. It has also been shown that the integration of predictions from multiple inference methods is more robust and shows high performance across diverse datasets. Inspired by this research, in this paper, we propose a machine learning solution which learns to combine predictions from multiple inference methods. While this approach adds additional complexity to the inference process, we expect it would also carry substantial benefits. These would come from the automatic adaptation to patterns on the outputs of individual inference methods, so that it is possible to identify regulatory interactions more reliably when these patterns occur. This article demonstrates the benefits (in terms of accuracy of the reconstructed networks) of the proposed method, which exploits an iterative, semi-supervised ensemble-based algorithm. The algorithm learns to combine the interactions predicted by many different inference methods in the multi-view learning setting. The empirical evaluation of the proposed algorithm on a prokaryotic model organism (E. coli) and on a eukaryotic model organism (S. cerevisiae) clearly shows improved performance over the state of the art methods. The results indicate that gene regulatory network reconstruction for the real datasets is more difficult for S. cerevisiae than for E. coli. The software, all the datasets used in the experiments and all the results are available for download at the following link: http://figshare.com/articles/Semi_supervised_Multi_View_Learning_for_Gene_Network_Reconstruction/1604827.
Collapse
Affiliation(s)
- Michelangelo Ceci
- Dept. of Computer Science, University of Bari Aldo Moro, Via Orabona 4, 70125 Bari, Italy
| | - Gianvito Pio
- Dept. of Computer Science, University of Bari Aldo Moro, Via Orabona 4, 70125 Bari, Italy
- * E-mail: (GP); (VK)
| | - Vladimir Kuzmanovski
- Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia
- Jožef Stefan International Postgraduate School, Jamova 39, 1000 Ljubljana, Slovenia
- * E-mail: (GP); (VK)
| | - Sašo Džeroski
- Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia
- Jožef Stefan International Postgraduate School, Jamova 39, 1000 Ljubljana, Slovenia
| |
Collapse
|
45
|
Ghanbari M, Lasserre J, Vingron M. Reconstruction of gene networks using prior knowledge. BMC SYSTEMS BIOLOGY 2015; 9:84. [PMID: 26589494 PMCID: PMC4654848 DOI: 10.1186/s12918-015-0233-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2014] [Accepted: 11/11/2015] [Indexed: 01/08/2023]
Abstract
Background Reconstructing gene regulatory networks (GRNs) from expression data is a challenging task that has become essential to the understanding of complex regulatory mechanisms in cells. The major issues are the usually very high ratio of number of genes to sample size, and the noise in the available data. Integrating biological prior knowledge to the learning process is a natural and promising way to partially compensate for the lack of reliable expression data and to increase the accuracy of network reconstruction algorithms. Results In this manuscript, we present PriorPC, a new algorithm based on the PC algorithm. PC algorithm is one of the most popular methods for Bayesian network reconstruction. The result of PC is known to depend on the order in which conditional independence tests are processed, especially for large networks. PriorPC uses prior knowledge to exclude unlikely edges from network estimation and introduces a particular ordering for the conditional independence tests. We show on synthetic data that the structural accuracy of networks obtained with PriorPC is greatly improved compared to PC. Conclusion PriorPC improves structural accuracy of inferred gene networks by using soft priors which assign to edges a probability of existence. It is robust to false prior which is not avoidable in the context of biological data. PriorPC is also fast and scales well for large networks which is important for its applicability to real data. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0233-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mahsa Ghanbari
- Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 63-73, Berlin, D-14195, Germany.
| | - Julia Lasserre
- Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 63-73, Berlin, D-14195, Germany.
| | - Martin Vingron
- Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 63-73, Berlin, D-14195, Germany.
| |
Collapse
|
46
|
Wang X, Alshawaqfeh M, Dang X, Wajid B, Noor A, Qaraqe M, Serpedin E. An Overview of NCA-Based Algorithms for Transcriptional Regulatory Network Inference. ACTA ACUST UNITED AC 2015; 4:596-617. [PMID: 27600242 PMCID: PMC4996402 DOI: 10.3390/microarrays4040596] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 10/07/2015] [Accepted: 11/11/2015] [Indexed: 01/08/2023]
Abstract
In systems biology, the regulation of gene expressions involves a complex network of regulators. Transcription factors (TFs) represent an important component of this network: they are proteins that control which genes are turned on or off in the genome by binding to specific DNA sequences. Transcription regulatory networks (TRNs) describe gene expressions as a function of regulatory inputs specified by interactions between proteins and DNA. A complete understanding of TRNs helps to predict a variety of biological processes and to diagnose, characterize and eventually develop more efficient therapies. Recent advances in biological high-throughput technologies, such as DNA microarray data and next-generation sequence (NGS) data, have made the inference of transcription factor activities (TFAs) and TF-gene regulations possible. Network component analysis (NCA) represents an efficient computational framework for TRN inference from the information provided by microarrays, ChIP-on-chip and the prior information about TF-gene regulation. However, NCA suffers from several shortcomings. Recently, several algorithms based on the NCA framework have been proposed to overcome these shortcomings. This paper first overviews the computational principles behind NCA, and then, it surveys the state-of-the-art NCA-based algorithms proposed in the literature for TRN reconstruction.
Collapse
Affiliation(s)
- Xu Wang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Mustafa Alshawaqfeh
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Xuan Dang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Bilal Wajid
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Amina Noor
- Institute of Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA.
| | - Marwa Qaraqe
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Erchin Serpedin
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| |
Collapse
|
47
|
Emmert-Streib F, Dehmer M. Biological networks: the microscope of the twenty-first century? Front Genet 2015; 6:307. [PMID: 26528327 PMCID: PMC4602153 DOI: 10.3389/fgene.2015.00307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Accepted: 09/23/2015] [Indexed: 11/13/2022] Open
Affiliation(s)
- Frank Emmert-Streib
- Computational Medicine and Statistical Learning Laboratory, Department of Signal Processing, Tampere University of Technology Tampere, Finland ; Institute of Biosciences and Medical Technology Tampere, Finland
| | - Matthias Dehmer
- Department of Computer Science, Universität der Bundeswehr München Germany ; Department of Mechatronics and Biomedical Computer Science, UMIT Hall in Tyrol, Austria
| |
Collapse
|
48
|
A Glimpse to Background and Characteristics of Major Molecular Biological Networks. BIOMED RESEARCH INTERNATIONAL 2015; 2015:540297. [PMID: 26491677 PMCID: PMC4605226 DOI: 10.1155/2015/540297] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/01/2015] [Revised: 07/22/2015] [Accepted: 08/18/2015] [Indexed: 12/11/2022]
Abstract
Recently, biology has become a data intensive science because of huge data sets produced by high throughput molecular biological experiments in diverse areas including the fields of genomics, transcriptomics, proteomics, and metabolomics. These huge datasets have paved the way for system-level analysis of the processes and subprocesses of the cell. For system-level understanding, initially the elements of a system are connected based on their mutual relations and a network is formed. Among omics researchers, construction and analysis of biological networks have become highly popular. In this review, we briefly discuss both the biological background and topological properties of major types of omics networks to facilitate a comprehensive understanding and to conceptualize the foundation of network biology.
Collapse
|
49
|
Data Integration for Microarrays: Enhanced Inference for Gene Regulatory Networks. MICROARRAYS 2015; 4:255-69. [PMID: 27600224 PMCID: PMC4996389 DOI: 10.3390/microarrays4020255] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 04/30/2015] [Indexed: 01/01/2023]
Abstract
Microarray technologies have been the basis of numerous important findings regarding gene expression in the few last decades. Studies have generated large amounts of data describing various processes, which, due to the existence of public databases, are widely available for further analysis. Given their lower cost and higher maturity compared to newer sequencing technologies, these data continue to be produced, even though data quality has been the subject of some debate. However, given the large volume of data generated, integration can help overcome some issues related, e.g., to noise or reduced time resolution, while providing additional insight on features not directly addressed by sequencing methods. Here, we present an integration test case based on public Drosophila melanogaster datasets (gene expression, binding site affinities, known interactions). Using an evolutionary computation framework, we show how integration can enhance the ability to recover transcriptional gene regulatory networks from these data, as well as indicating which data types are more important for quantitative and qualitative network inference. Our results show a clear improvement in performance when multiple datasets are integrated, indicating that microarray data will remain a valuable and viable resource for some time to come.
Collapse
|
50
|
Liu ZP. Reverse Engineering of Genome-wide Gene Regulatory Networks from Gene Expression Data. Curr Genomics 2015; 16:3-22. [PMID: 25937810 PMCID: PMC4412962 DOI: 10.2174/1389202915666141110210634] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2014] [Revised: 09/05/2014] [Accepted: 09/05/2014] [Indexed: 12/17/2022] Open
Abstract
Transcriptional regulation plays vital roles in many fundamental biological processes. Reverse engineering of genome-wide regulatory networks from high-throughput transcriptomic data provides a promising way to characterize the global scenario of regulatory relationships between regulators and their targets. In this review, we summarize and categorize the main frameworks and methods currently available for inferring transcriptional regulatory networks from microarray gene expression profiling data. We overview each of strategies and introduce representative methods respectively. Their assumptions, advantages, shortcomings, and possible improvements and extensions are also clarified and commented.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|