1
|
Gao Z, Tang J, Xia J, Zheng CH, Wei PJ. CNNGRN: A Convolutional Neural Network-Based Method for Gene Regulatory Network Inference From Bulk Time-Series Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2853-2861. [PMID: 37267145 DOI: 10.1109/tcbb.2023.3282212] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Gene regulatory networks (GRNs) participate in many biological processes, and reconstructing them plays an important role in systems biology. Although many advanced methods have been proposed for GRN reconstruction, their predictive performance is far from the ideal standard, so it is urgent to design a more effective method to reconstruct GRN. Moreover, most methods only consider the gene expression data, ignoring the network structure information contained in GRN. In this study, we propose a supervised model named CNNGRN, which infers GRN from bulk time-series expression data via convolutional neural network (CNN) model, with a more informative feature. Bulk time series gene expression data imply the intricate regulatory associations between genes, and the network structure feature of ground-truth GRN contains rich neighbor information. Hence, CNNGRN integrates the above two features as model inputs. In addition, CNN is adopted to extract intricate features of genes and infer the potential associations between regulators and target genes. Moreover, feature importance visualization experiments are implemented to seek the key features. Experimental results show that CNNGRN achieved competitive performance on benchmark datasets compared to the state-of-the-art computational methods. Finally, hub genes identified based on CNNGRN have been confirmed to be involved in biological processes through literature.
Collapse
|
2
|
Marku M, Pancaldi V. From time-series transcriptomics to gene regulatory networks: A review on inference methods. PLoS Comput Biol 2023; 19:e1011254. [PMID: 37561790 PMCID: PMC10414591 DOI: 10.1371/journal.pcbi.1011254] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023] Open
Abstract
Inference of gene regulatory networks has been an active area of research for around 20 years, leading to the development of sophisticated inference algorithms based on a variety of assumptions and approaches. With the ever increasing demand for more accurate and powerful models, the inference problem remains of broad scientific interest. The abstract representation of biological systems through gene regulatory networks represents a powerful method to study such systems, encoding different amounts and types of information. In this review, we summarize the different types of inference algorithms specifically based on time-series transcriptomics, giving an overview of the main applications of gene regulatory networks in computational biology. This review is intended to give an updated reference of regulatory networks inference tools to biologists and researchers new to the topic and guide them in selecting the appropriate inference method that best fits their questions, aims, and experimental data.
Collapse
Affiliation(s)
- Malvina Marku
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
| | - Vera Pancaldi
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
- Barcelona Supercomputing Center, Barcelona, Spain
| |
Collapse
|
3
|
Segura-Ortiz A, García-Nieto J, Aldana-Montes JF, Navas-Delgado I. GENECI: A novel evolutionary machine learning consensus-based approach for the inference of gene regulatory networks. Comput Biol Med 2023; 155:106653. [PMID: 36803795 DOI: 10.1016/j.compbiomed.2023.106653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 01/09/2023] [Accepted: 02/08/2023] [Indexed: 02/16/2023]
Abstract
Gene regulatory networks define the interactions between DNA products and other substances in cells. Increasing knowledge of these networks improves the level of detail with which the processes that trigger different diseases are described and fosters the development of new therapeutic targets. These networks are usually represented by graphs, and the primary sources for their correct construction are usually time series from differential expression data. The inference of networks from this data type has been approached differently in the literature. Mostly, computational learning techniques have been implemented, which have finally shown some specialization in specific datasets. For this reason, the need arises to create new and more robust strategies for reaching a consensus based on previous results to gain a particular capacity for generalization. This paper presents GENECI (GEne NEtwork Consensus Inference), an evolutionary machine learning approach that acts as an organizer for constructing ensembles to process the results of the main inference techniques reported in the literature and to optimize the consensus network derived from them, according to their confidence levels and topological characteristics. After its design, the proposal was confronted with datasets collected from academic benchmarks (DREAM challenges and IRMA network) to quantify its accuracy. Subsequently, it was applied to a real-world biological network of melanoma patients whose results could be contrasted with medical research collected in the literature. Finally, it has been proved that its ability to optimize the consensus of several networks leads to outstanding robustness and accuracy, gaining a certain generalization capacity after facing the inference of multiple datasets. The source code is hosted in a public repository at GitHub under MIT license: https://github.com/AdrianSeguraOrtiz/GENECI. Moreover, to facilitate its installation and use, the software associated with this implementation has been encapsulated in a python package available at PyPI: https://pypi.org/project/geneci/.
Collapse
Affiliation(s)
- Adrián Segura-Ortiz
- Dept. de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain
| | - José García-Nieto
- Dept. de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain.
| | - José F Aldana-Montes
- Dept. de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - Ismael Navas-Delgado
- Dept. de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| |
Collapse
|
4
|
Majumder S, Thakran Y, Pal V, Singh K. Fuzzy and Rough Set Theory Based Computational Framework for Mining Genetic Interaction Triplets From Gene Expression Profiles for Lung Adenocarcinoma. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3469-3481. [PMID: 34665736 DOI: 10.1109/tcbb.2021.3120844] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Genetic interactions are very helpful in understanding different disease and discovering drugs for it. Compared to the gene pairs that represent the genetic interactions between two genes, the gene triplets are more informative and useful. However, existing works on genetic interactions among gene triplets have primarily focused on detecting gene triplets from time series gene expression profiles. Generating the time series gene expression profiles for humans is quite impracticable but the labeled gene expression profiles are available for different diseases in case of humans. In this paper, a computational framework has been proposed to detect gene triplets from labeled gene expression profiles. First, it employs Rough Set Theory for extracting the key genes and then designs a fuzzy inference system for generating possible gene triplets. Further, Root Mean Squared Error measure has been used to prune out the irrelevant gene triplets. In the present work, the proposed computational framework has been applied to labeled lung adenocarcinoma dataset and can be applied to any other labeled gene expression dataset. The extracted gene triplets and their functionalities have been verified with existing biological literature and benchmark databases and the results of verification signify that the proposed framework is promising in terms of finding useful genetic triplets. Further, the proposed framework has been found more efficient as compared to an existing mutual information-based technique in terms of detecting known genetic interactions.
Collapse
|
5
|
Kelly J, Berzuini C, Keavney B, Tomaszewski M, Guo H. A review of causal discovery methods for molecular network analysis. Mol Genet Genomic Med 2022; 10:e2055. [PMID: 36087049 PMCID: PMC9544222 DOI: 10.1002/mgg3.2055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 07/12/2022] [Accepted: 08/18/2022] [Indexed: 11/08/2022] Open
Abstract
BACKGROUND With the increasing availability and size of multi-omics datasets, investigating the casual relationships between molecular phenotypes has become an important aspect of exploring underlying biology andgenetics. There are an increasing number of methodlogies that have been developed and applied to moleular networks to investigate these causal interactions. METHODS We have introduced and reviewed the available methods for building large-scale causal molecular networks that have been developed and applied in the past decade. RESULTS In this review we have identified and summarized the existing methods for infering causality in large-scale causal molecular networks, and discussed important factors that will need to be considered in future research in this area. CONCLUSION Existing methods to infering causal molecular networks have their own strengths and limitations so there is no one best approach, and it is instead down to the discretion of the researcher. This review also to discusses some of the current limitations to biological interpretation of these networks, and important factors to consider for future studies on molecular networks.
Collapse
Affiliation(s)
- Jack Kelly
- Centre for Biostatistics, School of Health Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
| | - Carlo Berzuini
- Centre for Biostatistics, School of Health Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
| | - Bernard Keavney
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
- Division of Cardiology and Manchester Academic Health Science CentreManchester University NHS Foundation TrustManchesterUK
| | - Maciej Tomaszewski
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
- Manchester Heart Centre and Manchester Academic Health Science CentreManchester University NHS Foundation TrustManchesterUK
| | - Hui Guo
- Centre for Biostatistics, School of Health Sciences, Faculty of Medicine, Biology and HealthUniversity of ManchesterManchesterUK
| |
Collapse
|
6
|
Cerutti C, Zhang L, Tribollet V, Shi JR, Brillet R, Gillet B, Hughes S, Forcet C, Shi TL, Vanacker JM. Computational identification of new potential transcriptional partners of ERRα in breast cancer cells: specific partners for specific targets. Sci Rep 2022; 12:3826. [PMID: 35264626 PMCID: PMC8907200 DOI: 10.1038/s41598-022-07744-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 02/17/2022] [Indexed: 12/26/2022] Open
Abstract
Estrogen related receptors are orphan members of the nuclear receptor superfamily acting as transcription factors (TFs). In contrast to classical nuclear receptors, the activities of the ERRs are not controlled by a natural ligand. Regulation of their activities thus relies on availability of transcriptional co-regulators. In this paper, we focus on ERRα, whose involvement in cancer progression has been broadly demonstrated. We propose a new approach to identify potential co-activators, starting from previously identified ERRα-activated genes in a breast cancer (BC) cell line. Considering mRNA gene expression from two sets of human BC cells as major endpoint, we used sparse partial least squares modeling to uncover new transcriptional regulators associated with ERRα. Among them, DDX21, MYBBP1A, NFKB1, and SETD7 are functionally relevant in MDA-MB-231 cells, specifically activating the expression of subsets of ERRα-activated genes. We studied SET7 in more details and showed its co-localization with ERRα and its ERRα-dependent transcriptional and phenotypic effects. Our results thus demonstrate the ability of a modeling approach to identify new transcriptional partners from gene expression. Finally, experimental results show that ERRα cooperates with distinct co-regulators to control the expression of distinct sets of target genes, thus reinforcing the combinatorial specificity of transcription.
Collapse
Affiliation(s)
- Catherine Cerutti
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France
| | - Ling Zhang
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France
| | - Violaine Tribollet
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France
| | - Jing-Ru Shi
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Riwan Brillet
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France
| | - Benjamin Gillet
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France
| | - Sandrine Hughes
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France
| | - Christelle Forcet
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France
| | - Tie-Liu Shi
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Jean-Marc Vanacker
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Lyon 1, CNRS UMR5242, Ecole Normale Supérieure de Lyon, 32-34 Avenue Tony Garnier, 69007, Lyon, France.
| |
Collapse
|
7
|
Kashima M, Shida Y, Yamashiro T, Hirata H, Kurosaka H. Intracellular and Intercellular Gene Regulatory Network Inference From Time-Course Individual RNA-Seq. FRONTIERS IN BIOINFORMATICS 2021; 1:777299. [PMID: 36303726 PMCID: PMC9580923 DOI: 10.3389/fbinf.2021.777299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 10/26/2021] [Indexed: 11/13/2022] Open
Abstract
Gene regulatory network (GRN) inference is an effective approach to understand the molecular mechanisms underlying biological events. Generally, GRN inference mainly targets intracellular regulatory relationships such as transcription factors and their associated targets. In multicellular organisms, there are both intracellular and intercellular regulatory mechanisms. Thus, we hypothesize that GRNs inferred from time-course individual (whole embryo) RNA-Seq during development can reveal intercellular regulatory relationships (signaling pathways) underlying the development. Here, we conducted time-course bulk RNA-Seq of individual mouse embryos during early development, followed by pseudo-time analysis and GRN inference. The results demonstrated that GRN inference from RNA-Seq with pseudo-time can be applied for individual bulk RNA-Seq similar to scRNA-Seq. Validation using an experimental-source-based database showed that our approach could significantly infer GRN for all transcription factors in the database. Furthermore, the inferred ligand-related and receptor-related downstream genes were significantly overlapped. Thus, the inferred GRN based on whole organism could include intercellular regulatory relationships, which cannot be inferred from scRNA-Seq based only on gene expression data. Overall, inferring GRN from time-course bulk RNA-Seq is an effective approach to understand the regulatory relationships underlying biological events in multicellular organisms.
Collapse
Affiliation(s)
- Makoto Kashima
- College of Science and Engineering, Aoyama Gakuin University, Sagamihara, Japan
| | - Yuki Shida
- Department of Orthodontics and Dentofacial Orthopedics, Osaka University, Suita, Japan
| | - Takashi Yamashiro
- Department of Orthodontics and Dentofacial Orthopedics, Osaka University, Suita, Japan
| | - Hiromi Hirata
- College of Science and Engineering, Aoyama Gakuin University, Sagamihara, Japan
| | - Hiroshi Kurosaka
- Department of Orthodontics and Dentofacial Orthopedics, Osaka University, Suita, Japan
| |
Collapse
|
8
|
Abstract
Cancer is a genetic disease in which multiple genes are perturbed. Thus, information about the regulatory relationships between genes is necessary for the identification of biomarkers and therapeutic targets. In this review, methods for inference of gene regulatory networks (GRNs) from transcriptomics data that are used in cancer research are introduced. The methods are classified into three categories according to the analysis model. The first category includes methods that use pair-wise measures between genes, including correlation coefficient and mutual information. The second category includes methods that determine the genetic regulatory relationship using multivariate measures, which consider the expression profiles of all genes concurrently. The third category includes methods using supervised and integrative approaches. The supervised approach estimates the regulatory relationship using a supervised learning method that constructs a regression or classification model for predicting whether there is a regulatory relationship between genes with input data of gene expression profiles and class labels of prior biological knowledge. The integrative method is an expansion of the supervised method and uses more data and biological knowledge for predicting the regulatory relationship. Furthermore, simulation and experimental validation of the estimated GRNs are also discussed in this review. This review identified that most GRN inference methods are not specific for cancer transcriptome data, and such methods are required for better understanding of cancer pathophysiology. In addition, more systematic methods for validation of the estimated GRNs need to be developed in the context of cancer biology.
Collapse
|
9
|
KBoost: a new method to infer gene regulatory networks from gene expression data. Sci Rep 2021; 11:15461. [PMID: 34326402 PMCID: PMC8322418 DOI: 10.1038/s41598-021-94919-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 07/06/2021] [Indexed: 01/11/2023] Open
Abstract
Reconstructing gene regulatory networks is crucial to understand biological processes and holds potential for developing personalized treatment. Yet, it is still an open problem as state-of-the-art algorithms are often not able to process large amounts of data within reasonable time. Furthermore, many of the existing methods predict numerous false positives and have limited capabilities to integrate other sources of information, such as previously known interactions. Here we introduce KBoost, an algorithm that uses kernel PCA regression, boosting and Bayesian model averaging for fast and accurate reconstruction of gene regulatory networks. We have benchmarked KBoost against other high performing algorithms using three different datasets. The results show that our method compares favorably to other methods across datasets. We have also applied KBoost to a large cohort of close to 2000 breast cancer patients and 24,000 genes in less than 2 h on standard hardware. Our results show that molecularly defined breast cancer subtypes also feature differences in their GRNs. An implementation of KBoost in the form of an R package is available at: https://github.com/Luisiglm/KBoost and as a Bioconductor software package.
Collapse
|
10
|
|
11
|
Zhang Y, Chang X, Liu X. Inference of gene regulatory networks using pseudo-time series data. Bioinformatics 2021; 37:2423-2431. [PMID: 33576787 DOI: 10.1093/bioinformatics/btab099] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 01/18/2021] [Accepted: 02/10/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Inferring gene regulatory networks (GRNs) from high-throughput data is an important and challenging problem in systems biology. Although numerous GRN methods have been developed, most have focused on the verification of the specific data set. However, it is difficult to establish directed topological networks that are both suitable for time-series and non-time-series datasets due to the complexity and diversity of biological networks. RESULTS Here, we proposed a novel method, GNIPLR (Gene networks inference based on projection and lagged regression) to infer GRNs from time-series or non-time-series gene expression data. GNIPLR projected gene data twice using the LASSO projection (LSP) algorithm and the linear projection (LP) approximation to produce a linear and monotonous pseudo-time series, and then determined the direction of regulation in combination with lagged regression analyses. The proposed algorithm was validated using simulated and real biological data. Moreover, we also applied the GNIPLR algorithm to the liver hepatocellular carcinoma (LIHC) and bladder urothelial carcinoma (BLCA) cancer expression datasets. These analyses revealed significantly higher accuracy and AUC values than other popular methods. AVAILABILITY The GNIPLR tool is freely available at https://github.com/zyllluck/GNIPLR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuelei Zhang
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| | - Xiao Chang
- Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China
| | - Xiaoping Liu
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| |
Collapse
|
12
|
GENAVOS: A New Tool for Modelling and Analyzing Cancer Gene Regulatory Networks Using Delayed Nonlinear Variable Order Fractional System. Symmetry (Basel) 2021. [DOI: 10.3390/sym13020295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Gene regulatory networks (GRN) are one of the etiologies associated with cancer. Their dysregulation can be associated with cancer formation and asymmetric cellular functions in cancer stem cells, leading to disease persistence and resistance to treatment. Systems that model the complex dynamics of these networks along with adapting to partially known real omics data are closer to reality and may be useful to understand the mechanisms underlying neoplastic phenomena. In this paper, for the first time, modelling of GRNs is performed using delayed nonlinear variable order fractional (VOF) systems in the state space by a new tool called GENAVOS. Although the tool uses gene expression time series data to identify and optimize system parameters, it also models possible epigenetic signals, and the results show that the nonlinear VOF systems have very good flexibility in adapting to real data. We found that GRNs in cancer cells actually have a larger delay parameter than in normal cells. It is also possible to create weak chaotic, periodic, and quasi-periodic oscillations by changing the parameters. Chaos can be associated with the onset of cancer. Our findings indicate a profound effect of time-varying orders on these networks, which may be related to a type of cellular epigenetic memory. By changing the delay parameter and the variable order functions (possible epigenetics signals) for a normal cell system, its behaviour becomes quite similar to the behaviour of a cancer cell. This work confirms the effective role of the miR-17-92 cluster as an epigenetic factor in the cancer cell cycle.
Collapse
|
13
|
Ajmal HB, Madden MG. Inferring dynamic gene regulatory networks with low-order conditional independencies – an evaluation of the method. Stat Appl Genet Mol Biol 2020. [DOI: 10.1515/sagmb-2020-0051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractOver a decade ago, Lèbre (2009) proposed an inference method, G1DBN, to learn the structure of gene regulatory networks (GRNs) from high dimensional, sparse time-series gene expression data. Their approach is based on concept of low-order conditional independence graphs that they extend to dynamic Bayesian networks (DBNs). They present results to demonstrate that their method yields better structural accuracy compared to the related Lasso and Shrinkage methods, particularly where the data is sparse, that is, the number of time measurements n is much smaller than the number of genes p. This paper challenges these claims using a careful experimental analysis, to show that the GRNs reverse engineered from time-series data using the G1DBN approach are less accurate than claimed by Lèbre (2009). We also show that the Lasso method yields higher structural accuracy for graphs learned from the simulated data, compared to the G1DBN method, particularly when the data is sparse ($n{< }{< }p$). The Lasso method is also better than G1DBN at identifying the transcription factors (TFs) involved in the cell cycle of Saccharomyces cerevisiae.
Collapse
Affiliation(s)
- Hamda B. Ajmal
- School of Computer Science, National University of Ireland, Galway, Ireland
| | - Michael G. Madden
- School of Computer Science, National University of Ireland, Galway, Ireland
| |
Collapse
|
14
|
Ma B, Fang M, Jiao X. Inference of gene regulatory networks based on nonlinear ordinary differential equations. Bioinformatics 2020; 36:4885-4893. [DOI: 10.1093/bioinformatics/btaa032] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 12/30/2019] [Accepted: 01/15/2020] [Indexed: 01/05/2023] Open
Abstract
Abstract
Motivation
Gene regulatory networks (GRNs) capture the regulatory interactions between genes, resulting from the fundamental biological process of transcription and translation. In some cases, the topology of GRNs is not known, and has to be inferred from gene expression data. Most of the existing GRNs reconstruction algorithms are either applied to time-series data or steady-state data. Although time-series data include more information about the system dynamics, steady-state data imply stability of the underlying regulatory networks.
Results
In this article, we propose a method for inferring GRNs from time-series and steady-state data jointly. We make use of a non-linear ordinary differential equations framework to model dynamic gene regulation and an importance measurement strategy to infer all putative regulatory links efficiently. The proposed method is evaluated extensively on the artificial DREAM4 dataset and two real gene expression datasets of yeast and Escherichia coli. Based on public benchmark datasets, the proposed method outperforms other popular inference algorithms in terms of overall score. By comparing the performance on the datasets with different scales, the results show that our method still keeps good robustness and accuracy at a low computational complexity.
Availability and implementation
The proposed method is written in the Python language, and is available at: https://github.com/lab319/GRNs_nonlinear_ODEs
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Baoshan Ma
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Mingkun Fang
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Xiangtian Jiao
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
15
|
Razaghi-Moghadam Z, Nikoloski Z. Supervised learning of gene-regulatory networks based on graph distance profiles of transcriptomics data. NPJ Syst Biol Appl 2020; 6:21. [PMID: 32606380 PMCID: PMC7327016 DOI: 10.1038/s41540-020-0140-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 06/09/2020] [Indexed: 02/07/2023] Open
Abstract
Characterisation of gene-regulatory network (GRN) interactions provides a stepping stone to understanding how genes affect cellular phenotypes. Yet, despite advances in profiling technologies, GRN reconstruction from gene expression data remains a pressing problem in systems biology. Here, we devise a supervised learning approach, GRADIS, which utilises support vector machine to reconstruct GRNs based on distance profiles obtained from a graph representation of transcriptomics data. By employing the data from Escherichia coli and Saccharomyces cerevisiae as well as synthetic networks from the DREAM4 and five network inference challenges, we demonstrate that our GRADIS approach outperforms the state-of-the-art supervised and unsupervided approaches. This holds when predictions about target genes for individual transcription factors as well as for the entire network are considered. We employ experimentally verified GRNs from E. coli and S. cerevisiae to validate the predictions and obtain further insights in the performance of the proposed approach. Our GRADIS approach offers the possibility for usage of other network-based representations of large-scale data, and can be readily extended to help the characterisation of other cellular networks, including protein–protein and protein–metabolite interactions.
Collapse
Affiliation(s)
- Zahra Razaghi-Moghadam
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany.,Systems Biology and Mathematical Modeling group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam, Germany
| | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany. .,Systems Biology and Mathematical Modeling group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam, Germany.
| |
Collapse
|
16
|
Morais-Rodrigues F, Silv Erio-Machado R, Kato RB, Rodrigues DLN, Valdez-Baez J, Fonseca V, San EJ, Gomes LGR, Dos Santos RG, Vinicius Canário Viana M, da Cruz Ferraz Dutra J, Teixeira Dornelles Parise M, Parise D, Campos FF, de Souza SJ, Ortega JM, Barh D, Ghosh P, Azevedo VAC, Dos Santos MA. Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression. Gene 2019; 726:144168. [PMID: 31759986 DOI: 10.1016/j.gene.2019.144168] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Revised: 09/21/2019] [Accepted: 10/11/2019] [Indexed: 01/02/2023]
Abstract
Methods based around statistics and linear algebra have been increasingly used in attempts to address emerging questions in microarray literature. Microarray technology is a long-used tool in the global analysis of gene expression, allowing for the simultaneous investigation of hundreds or thousands of genes in a sample. It is characterized by a low sample size and a large feature number created a non-square matrix, and by the incomplete rank, that can generate countless more solution in classifiers. To avoid the problem of the 'curse of dimensionality' many authors have performed feature selection or reduced the size of data matrix. In this work, we introduce a new logistic regression-based model to classify breast cancer tumor samples based on microarray expression data, including all features of gene expression and without reducing the microarray data matrix. If the user still deems it necessary to perform feature reduction, it can be done after the application of the methodology, still maintaining a good classification. This methodology allowed the correct classification of breast cancer sample data sets from Gene Expression Omnibus (GEO) data series GSE65194, GSE20711, and GSE25055, which contain the microarray data of said breast cancer samples. Classification had a minimum performance of 80% (sensitivity and specificity), and explored all possible data combinations, including breast cancer subtypes. This methodology highlighted genes not yet studied in breast cancer, some of which have been observed in Gene Regulatory Networks (GRNs). In this work we examine the patterns and features of a GRN composed of transcription factors (TFs) in MCF-7 breast cancer cell lines, providing valuable information regarding breast cancer. In particular, some genes whose αi ∗ associated parameter values revealed extreme positive and negative values, and, as such, can be identified as breast cancer prediction genes. We indicate that the PKN2, MKL1, MED23, CUL5 and GLI genes demonstrate a tumor suppressor profile, and that the MTR, ITGA2B, TELO2, MRPL9, MTTL1, WIPI1, KLHL20, PI4KB, FOLR1 and SHC1 genes demonstrate an oncogenic profile. We propose that these may serve as potential breast cancer prediction genes, and should be prioritized for further clinical studies on breast cancer. This new model allows for the assignment of values to the αi ∗ parameters associated with gene expression. It was noted that some αi ∗ parameters are associated with genes previously described as breast cancer biomarkers, as well as other genes not yet studied in relation to this disease.
Collapse
Affiliation(s)
- Francielly Morais-Rodrigues
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil.
| | - Rita Silv Erio-Machado
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Rodrigo Bentes Kato
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Diego Lucas Neres Rodrigues
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Juan Valdez-Baez
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Vagner Fonseca
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil; KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban 4001, South Africa
| | - Emmanuel James San
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban 4001, South Africa
| | - Lucas Gabriel Rodrigues Gomes
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Roselane Gonçalves Dos Santos
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Marcus Vinicius Canário Viana
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil; Federal University of Pará, UFPA, Brazil
| | - Joyce da Cruz Ferraz Dutra
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Mariana Teixeira Dornelles Parise
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Doglas Parise
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Frederico F Campos
- Department of Computer Science, Federal University of Minas Gerais, Brazil Av Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | | | - José Miguel Ortega
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Debmalya Barh
- Centre for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied Biotechnology (IIOAB), Nonakuri, Purba Medinipur, West Bengal 721172, India
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Vasco A C Azevedo
- Institute of Biological Sciences, Federal University of Minas Gerais, Brazil. Av. Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| | - Marcos A Dos Santos
- Department of Computer Science, Federal University of Minas Gerais, Brazil Av Antônio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil
| |
Collapse
|
17
|
Spies D, Renz PF, Beyer TA, Ciaudo C. Comparative analysis of differential gene expression tools for RNA sequencing time course data. Brief Bioinform 2019; 20:288-298. [PMID: 29028903 PMCID: PMC6357553 DOI: 10.1093/bib/bbx115] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2017] [Indexed: 02/05/2023] Open
Abstract
RNA sequencing (RNA-seq) has become a standard procedure to investigate transcriptional changes between conditions and is routinely used in research and clinics. While standard differential expression (DE) analysis between two conditions has been extensively studied, and improved over the past decades, RNA-seq time course (TC) DE analysis algorithms are still in their early stages. In this study, we compare, for the first time, existing TC RNA-seq tools on an extensive simulation data set and validated the best performing tools on published data. Surprisingly, TC tools were outperformed by the classical pairwise comparison approach on short time series (<8 time points) in terms of overall performance and robustness to noise, mostly because of high number of false positives, with the exception of ImpulseDE2. Overlapping of candidate lists between tools improved this shortcoming, as the majority of false-positive, but not true-positive, candidates were unique for each method. On longer time series, pairwise approach was less efficient on the overall performance compared with splineTC and maSigPro, which did not identify any false-positive candidate.
Collapse
Affiliation(s)
- Daniel Spies
- Swiss Federal Institute of Technology Zurich, Department of Biology, IMHS, Zurich, Switzerland.,Life Science Zurich Graduate School, Molecular Life Science program, University of Zürich, Switzerland
| | - Peter F Renz
- Swiss Federal Institute of Technology Zurich, Department of Biology, IMHS, Zurich, Switzerland.,Life Science Zurich Graduate School, Molecular Life Science program, University of Zürich, Switzerland
| | - Tobias A Beyer
- Swiss Federal Institute of Technology Zurich, Department of Biology, IMHS, Zurich, Switzerland
| | - Constance Ciaudo
- Swiss Federal Institute of Technology Zurich, Department of Biology, IMHS, Zurich, Switzerland
| |
Collapse
|
18
|
Ahn H, Jo K, Jeong D, Pak M, Hur J, Jung W, Kim S. PropaNet: Time-Varying Condition-Specific Transcriptional Network Construction by Network Propagation. FRONTIERS IN PLANT SCIENCE 2019; 10:698. [PMID: 31258543 PMCID: PMC6587906 DOI: 10.3389/fpls.2019.00698] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Accepted: 05/09/2019] [Indexed: 06/09/2023]
Abstract
Transcription factor (TF) has a significant influence on the state of a cell by regulating multiple down-stream genes. Thus, experimental and computational biologists have made great efforts to construct TF gene networks for regulatory interactions between TFs and their target genes. Now, an important research question is how to utilize TF networks to investigate the response of a plant to stress at the transcription control level using time-series transcriptome data. In this article, we present a new computational network, PropaNet, to investigate dynamics of TF networks from time-series transcriptome data using two state-of-the-art network analysis techniques, influence maximization and network propagation. PropaNet uses the influence maximization technique to produce a ranked list of TFs, in the order of TF that explains differentially expressed genes (DEGs) better at each time point. Then, a network propagation technique is used to select a group of TFs that explains DEGs best as a whole. For the analysis of Arabidopsis time series datasets from AtGenExpress, we used PlantRegMap as a template TF network and performed PropaNet analysis to investigate transcriptional dynamics of Arabidopsis under cold and heat stress. The time varying TF networks showed that Arabidopsis responded to cold and heat stress quite differently. For cold stress, bHLH and bZIP type TFs were the first responding TFs and the cold signal influenced histone variants, various genes involved in cell architecture, osmosis and restructuring of cells. However, the consequences of plants under heat stress were up-regulation of genes related to accelerating differentiation and starting re-differentiation. In terms of energy metabolism, plants under heat stress show elevated metabolic process and resulting in an exhausted status. We believe that PropaNet will be useful for the construction of condition-specific time-varying TF network for time-series data analysis in response to stress. PropaNet is available at http://biohealth.snu.ac.kr/software/PropaNet.
Collapse
Affiliation(s)
- Hongryul Ahn
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
| | - Kyuri Jo
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
| | - Dabin Jeong
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Minwoo Pak
- Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea
| | - Jihye Hur
- Department of Crop Science, Konkuk University, Seoul, South Korea
| | - Woosuk Jung
- Department of Crop Science, Konkuk University, Seoul, South Korea
| | - Sun Kim
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
- Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea
| |
Collapse
|
19
|
Jurman G, Filosi M, Visintainer R, Riccadonna S, Furlanello C. Stability in GRN Inference. Methods Mol Biol 2019; 1883:323-346. [PMID: 30547407 DOI: 10.1007/978-1-4939-8882-2_14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Reconstructing a gene regulatory network from one or more sets of omics measurements has been a major task of computational biology in the last 20 years. Despite an overwhelming number of algorithms proposed to solve the network inference problem either in the general scenario or in an ad-hoc tailored situation, assessing the stability of reconstruction is still an uncharted territory and exploratory studies mainly tackled theoretical aspects. We introduce here empirical stability, which is induced by variability of reconstruction as a function of data subsampling. By evaluating differences between networks that are inferred using different subsets of the same data we obtain quantitative indicators of the robustness of the algorithm, of the noise level affecting the data, and, overall, of the reliability of the reconstructed graph. We show that empirical stability can be used whenever no ground truth is available to compute a direct measure of the similarity between the inferred structure and the true network. The main ingredient here is a suite of indicators, called NetSI, providing statistics of distances between graphs generated by a given algorithm fed with different data subsets, where the chosen metric is the Hamming-Ipsen-Mikhailov (HIM) distance evaluating dissimilarity of graph topologies with shared nodes. Operatively, the NetSI family is demonstrated here on synthetic and high-throughput datasets, inferring graphs at different resolution levels (topology, direction, weight), showing how the stability indicators can be effectively used for the quantitative comparison of the stability of different reconstruction algorithms.
Collapse
Affiliation(s)
| | | | - Roberto Visintainer
- The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI), Rovereto, Italy
| | | | | |
Collapse
|
20
|
Pirgazi J, Khanteymoori AR. A robust gene regulatory network inference method base on Kalman filter and linear regression. PLoS One 2018; 13:e0200094. [PMID: 30001352 PMCID: PMC6044105 DOI: 10.1371/journal.pone.0200094] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 06/19/2018] [Indexed: 11/24/2022] Open
Abstract
The reconstruction of the topology of gene regulatory networks (GRNs) using high
throughput genomic data such as microarray gene expression data is an important
problem in systems biology. The main challenge in gene expression data is the
high number of genes and low number of samples; also the data are often
impregnated with noise. In this paper, in dealing with the noisy data, Kalman
filter based method that has the ability to use prior knowledge on learning the
network was used. In the proposed method namely (KFLR), in the
first phase by using mutual information, the noisy regulations with low
correlations were removed. The proposed method utilized a new closed form
solution to compute the posterior probabilities of the edges from regulators to
the target gene within a hybrid framework of Bayesian model averaging and linear
regression methods. In order to show the efficiency, the proposed method was
compared with several well know methods. The results of the evaluation indicate
that the inference accuracy was improved by the proposed method which also
demonstrated better regulatory relations with the noisy data.
Collapse
Affiliation(s)
- Jamshid Pirgazi
- Department of Computer Engineering, Engineering Faculty,
University of Zanjan, Zanjan, Iran
| | - Ali Reza Khanteymoori
- Department of Computer Engineering, Engineering Faculty,
University of Zanjan, Zanjan, Iran
- * E-mail:
| |
Collapse
|
21
|
Luo WM, Wang ZY, Zhang X. Identification of four differentially methylated genes as prognostic signatures for stage I lung adenocarcinoma. Cancer Cell Int 2018; 18:60. [PMID: 29713243 PMCID: PMC5909272 DOI: 10.1186/s12935-018-0547-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Accepted: 03/22/2018] [Indexed: 12/20/2022] Open
Abstract
Background Lung adenocarcinoma (LUAD) is the main subtype of non-small cell lung cancer with a low survival prognosis. We aimed to generate a prognostic model for the postoperative recurrence of LUAD. Methods The methylated DNA data of LUAD patients were downloaded from the Cancer Genome Atlas (TCGA). The differentially methylated genes were identified and protein–protein interacting network was constructed, with which prognostic signature of this cancer was generated. Survival and functional pathways analysis w used to evaluate the clustering ability of the prognostic signature. Results We identified 151 differentially methylated genes related to relapse-free survival of patients with LUAD. Nine hub genes were identified in PPI network, with which 4 gene pair signature was selected as prognostic signature. The potential functions of 6 genes (JDP2, SERPINA5, PLG, SEMG2, RFX5, and POLR3B) in the 4-gene pair signature were enriched in intracellular protein synthesis and transportation. Conclusion The four gene pair signature can predict the prognosis of patients with stage I LUAD. Our study provides a reference for patients with postoperative adjuvant therapy.
Collapse
Affiliation(s)
- Wei-Ming Luo
- Department of Radiation Oncology, Shanghai Minhang District Cancer Hospital, 106 Ruili Road, Shanghai, 200240 China
| | - Zheng-Yu Wang
- 2Department of Pharmacy, The Affiliated Huai'an Hospital of Xuzhou Medical University and The Second People's Hospital of Huai'an, 62 South Huai'hai Rode, Huai'an, China
| | - Xin Zhang
- Department of Medical Imaging, The Fourth People's Hospital of Huai'an, Huai'an, Jiangsu China
| |
Collapse
|
22
|
BTNET : boosted tree based gene regulatory network inference algorithm using time-course measurement data. BMC SYSTEMS BIOLOGY 2018; 12:20. [PMID: 29560827 PMCID: PMC5861501 DOI: 10.1186/s12918-018-0547-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Background Identifying gene regulatory networks is an important task for understanding biological systems. Time-course measurement data became a valuable resource for inferring gene regulatory networks. Various methods have been presented for reconstructing the networks from time-course measurement data. However, existing methods have been validated on only a limited number of benchmark datasets, and rarely verified on real biological systems. Results We first integrated benchmark time-course gene expression datasets from previous studies and reassessed the baseline methods. We observed that GENIE3-time, a tree-based ensemble method, achieved the best performance among the baselines. In this study, we introduce BTNET, a boosted tree based gene regulatory network inference algorithm which improves the state-of-the-art. We quantitatively validated BTNET on the integrated benchmark dataset. The AUROC and AUPR scores of BTNET were higher than those of the baselines. We also qualitatively validated the results of BTNET through an experiment on neuroblastoma cells treated with an antidepressant. The inferred regulatory network from BTNET showed that brachyury, a transcription factor, was regulated by fluoxetine, an antidepressant, which was verified by the expression of its downstream genes. Conclusions We present BTENT that infers a GRN from time-course measurement data using boosting algorithms. Our model achieved the highest AUROC and AUPR scores on the integrated benchmark dataset. We further validated BTNET qualitatively through a wet-lab experiment and showed that BTNET can produce biologically meaningful results. Electronic supplementary material The online version of this article (10.1186/s12918-018-0547-0) contains supplementary material, which is available to authorized users.
Collapse
|
23
|
Yang G, Wang L, Wang X. Reconstruction of Complex Directional Networks with Group Lasso Nonlinear Conditional Granger Causality. Sci Rep 2017; 7:2991. [PMID: 28592807 PMCID: PMC5462833 DOI: 10.1038/s41598-017-02762-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 04/18/2017] [Indexed: 12/19/2022] Open
Abstract
Reconstruction of networks underlying complex systems is one of the most crucial problems in many areas of engineering and science. In this paper, rather than identifying parameters of complex systems governed by pre-defined models or taking some polynomial and rational functions as a prior information for subsequent model selection, we put forward a general framework for nonlinear causal network reconstruction from time-series with limited observations. With obtaining multi-source datasets based on the data-fusion strategy, we propose a novel method to handle nonlinearity and directionality of complex networked systems, namely group lasso nonlinear conditional granger causality. Specially, our method can exploit different sets of radial basis functions to approximate the nonlinear interactions between each pair of nodes and integrate sparsity into grouped variables selection. The performance characteristic of our approach is firstly assessed with two types of simulated datasets from nonlinear vector autoregressive model and nonlinear dynamic models, and then verified based on the benchmark datasets from DREAM3 Challenge4. Effects of data size and noise intensity are also discussed. All of the results demonstrate that the proposed method performs better in terms of higher area under precision-recall curve.
Collapse
Affiliation(s)
- Guanxue Yang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, P. R. China
| | - Lin Wang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, P. R. China
| | - Xiaofan Wang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, P. R. China.
| |
Collapse
|
24
|
Santra T, Roche S, Conlon N, O’Donovan N, Crown J, O’Connor R, Kolch W. Identification of potential new treatment response markers and therapeutic targets using a Gaussian process-based method in lapatinib insensitive breast cancer models. PLoS One 2017; 12:e0177058. [PMID: 28481952 PMCID: PMC5421758 DOI: 10.1371/journal.pone.0177058] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 04/23/2017] [Indexed: 12/15/2022] Open
Abstract
Molecularly targeted therapeutics hold promise of revolutionizing treatments of advanced malignancies. However, a large number of patients do not respond to these treatments. Here, we take a systems biology approach to understand the molecular mechanisms that prevent breast cancer (BC) cells from responding to lapatinib, a dual kinase inhibitor that targets human epidermal growth factor receptor 2 (HER2) and epidermal growth factor receptor (EGFR). To this end, we analysed temporal gene expression profiles of four BC cell lines, two of which respond and the remaining two do not respond to lapatinib. For this analysis, we developed a Gaussian process based algorithm which can accurately find differentially expressed genes by analysing time course gene expression profiles at a fraction of the computational cost of other state-of-the-art algorithms. Our analysis identified 519 potential genes which are characteristic of lapatinib non-responsiveness in the tested cell lines. Data from the Genomics of Drug Sensitivity in Cancer (GDSC) database suggested that the basal expressions 120 of the above genes correlate with the response of BC cells to HER2 and/or EGFR targeted therapies. We selected 27 genes from the larger panel of 519 genes for experimental verification and 16 of these were successfully validated. Further bioinformatics analysis identified vitamin D receptor (VDR) as a potential target of interest for lapatinib non-responsive BC cells. Experimentally, calcitriol, a commonly used reagent for VDR targeted therapy, in combination with lapatinib additively inhibited proliferation in two HER2 positive cell lines, lapatinib insensitive MDA-MB-453 and lapatinib resistant HCC 1954-L cells.
Collapse
Affiliation(s)
- Tapesh Santra
- Systems Biology Ireland, University College Dublin, Belfield, Dublin, Ireland
- * E-mail:
| | - Sandra Roche
- National Institute for Cellular Biotechnology, Dublin City University, Dublin, Ireland
| | - Neil Conlon
- National Institute for Cellular Biotechnology, Dublin City University, Dublin, Ireland
| | - Norma O’Donovan
- National Institute for Cellular Biotechnology, Dublin City University, Dublin, Ireland
| | - John Crown
- National Institute for Cellular Biotechnology, Dublin City University, Dublin, Ireland
- Department of Medical Oncology, St Vincent’s University Hospital, Dublin, Elm Park, Ireland
| | - Robert O’Connor
- National Institute for Cellular Biotechnology, Dublin City University, Dublin, Ireland
| | - Walter Kolch
- Systems Biology Ireland, University College Dublin, Belfield, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin, Ireland
- School of Medicine, University College Dublin, Belfield, Dublin, Ireland
| |
Collapse
|