1
|
Borisov N, Tkachev V, Simonov A, Sorokin M, Kim E, Kuzmin D, Karademir-Yilmaz B, Buzdin A. Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns. Front Mol Biosci 2023; 10:1237129. [PMID: 37745690 PMCID: PMC10511763 DOI: 10.3389/fmolb.2023.1237129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open
Abstract
Introduction: Co-normalization of RNA profiles obtained using different experimental platforms and protocols opens avenue for comprehensive comparison of relevant features like differentially expressed genes associated with disease. Currently, most of bioinformatic tools enable normalization in a flexible format that depends on the individual datasets under analysis. Thus, the output data of such normalizations will be poorly compatible with each other. Recently we proposed a new approach to gene expression data normalization termed Shambhala which returns harmonized data in a uniform shape, where every expression profile is transformed into a pre-defined universal format. We previously showed that following shambhalization of human RNA profiles, overall tissue-specific clustering features are strongly retained while platform-specific clustering is dramatically reduced. Methods: Here, we tested Shambhala performance in retention of fold-change gene expression features and other functional characteristics of gene clusters such as pathway activation levels and predicted cancer drug activity scores. Results: Using 6,793 cancer and 11,135 normal tissue gene expression profiles from the literature and experimental datasets, we applied twelve performance criteria for different versions of Shambhala and other methods of transcriptomic harmonization with flexible output data format. Such criteria dealt with the biological type classifiers, hierarchical clustering, correlation/regression properties, stability of drug efficiency scores, and data quality for using machine learning classifiers. Discussion: Shambhala-2 harmonizer demonstrated the best results with the close to 1 correlation and linear regression coefficients for the comparison of training vs validation datasets and more than two times lesser instability for calculation of drug efficiency scores compared to other methods.
Collapse
Affiliation(s)
- Nicolas Borisov
- Omicsway Corp, Walnut, CA, United States
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | | | - Alexander Simonov
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- Oncobox Ltd., Moscow, Russia
| | - Maxim Sorokin
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- Oncobox Ltd., Moscow, Russia
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, Moscow, Russia
| | - Ella Kim
- Clinic for Neurosurgery, Laboratory of Experimental Neurooncology, Johannes Gutenberg University Medical Centre, Mainz, Germany
| | - Denis Kuzmin
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Betul Karademir-Yilmaz
- Department of Biochemistry, School of Medicine/Genetic and Metabolic Diseases Research and Investigation Center (GEMHAM) Marmara University, Istanbul, Türkiye
| | - Anton Buzdin
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, Moscow, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
- PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), Brussels, Belgium
| |
Collapse
|
2
|
Tihagam RD, Bhatnagar S. A multi-platform normalization method for meta-analysis of gene expression data. Methods 2023:S1046-2023(23)00110-X. [PMID: 37423473 DOI: 10.1016/j.ymeth.2023.06.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 06/21/2023] [Accepted: 06/29/2023] [Indexed: 07/11/2023] Open
Abstract
Transcriptomic profiling is a mainstay of translational cancer research and is often used to identify cancer subtypes, stratify responders vs. non-responders patients, predict survival, and identify potential targets for therapeutic intervention. Analysis of gene expression data gathered by RNA sequencing (RNA-seq) and microarray is generally the first step in identifying and characterizing cancer-associated molecular determinants. The methodological advancements and reduced costs associated with transcriptomic profiling have increased the number of publicly available gene expression profiles for cancer subtypes. Data integration from multiple datasets is routinely done to increase the number of samples, improve statistical power, and provide better insight into the heterogeneity of the biological determinant. However, utilizing raw data from multiple platforms, species, and sources introduces systematic variations due to noise, batch effects, and biases. As such, the integrated data is mathematically adjusted through normalization, which allows direct comparison of expression measures among studies while minimizing technical and systemic variations. This study applied meta-analysis to multiple independent Affymetrix microarray and Illumina RNA-seq datasets available through the Gene Expression Omnibus (GEO) and The Cancer Gene Atlas (TCGA). We have previously identified a tripartite motif containing 37 (TRIM37), a breast cancer oncogene, that drives tumorigenesis and metastasis in triple-negative breast cancer. In this article, we adapted and assessed the validity of Stouffer's z-score normalization method to interrogate TRIM37 expression across different cancer types using multiple large-scale datasets.
Collapse
Affiliation(s)
- Rachisan Djiake Tihagam
- Department of Medical Microbiology and Immunology, The University of California Davis School of Medicine, Davis, CA 95616, USA
| | - Sanchita Bhatnagar
- Department of Medical Microbiology and Immunology, The University of California Davis School of Medicine, Davis, CA 95616, USA.
| |
Collapse
|
3
|
Scott MA, Woolums AR, Swiderski CE, Perkins AD, Nanduri B. Genes and regulatory mechanisms associated with experimentally-induced bovine respiratory disease identified using supervised machine learning methodology. Sci Rep 2021; 11:22916. [PMID: 34824337 PMCID: PMC8616896 DOI: 10.1038/s41598-021-02343-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2021] [Accepted: 11/08/2021] [Indexed: 11/28/2022] Open
Abstract
Bovine respiratory disease (BRD) is a multifactorial disease involving complex host immune interactions shaped by pathogenic agents and environmental factors. Advancements in RNA sequencing and associated analytical methods are improving our understanding of host response related to BRD pathophysiology. Supervised machine learning (ML) approaches present one such method for analyzing new and previously published transcriptome data to identify novel disease-associated genes and mechanisms. Our objective was to apply ML models to lung and immunological tissue datasets acquired from previous clinical BRD experiments to identify genes that classify disease with high accuracy. Raw mRNA sequencing reads from 151 bovine datasets (n = 123 BRD, n = 28 control) were downloaded from NCBI-GEO. Quality filtered reads were assembled in a HISAT2/Stringtie2 pipeline. Raw gene counts for ML analysis were normalized, transformed, and analyzed with MLSeq, utilizing six ML models. Cross-validation parameters (fivefold, repeated 10 times) were applied to 70% of the compiled datasets for ML model training and parameter tuning; optimized ML models were tested with the remaining 30%. Downstream analysis of significant genes identified by the top ML models, based on classification accuracy for each etiological association, was performed within WebGestalt and Reactome (FDR ≤ 0.05). Nearest shrunken centroid and Poisson linear discriminant analysis with power transformation models identified 154 and 195 significant genes for IBR and BRSV, respectively; from these genes, the two ML models discriminated IBR and BRSV with 100% accuracy compared to sham controls. Significant genes classified by the top ML models in IBR (154) and BRSV (195), but not BVDV (74), were related to type I interferon production and IL-8 secretion, specifically in lymphoid tissue and not homogenized lung tissue. Genes identified in Mannheimia haemolytica infections (97) were involved in activating classical and alternative pathways of complement. Novel findings, including expression of genes related to reduced mitochondrial oxygenation and ATP synthesis in consolidated lung tissue, were discovered. Genes identified in each analysis represent distinct genomic events relevant to understanding and predicting clinical BRD. Our analysis demonstrates the utility of ML with published datasets for discovering functional information to support the prediction and understanding of clinical BRD.
Collapse
Affiliation(s)
- Matthew A Scott
- Veterinary Education, Research, and Outreach Center, Texas A&M University and West Texas A&M University, Canyon, TX, USA.
| | - Amelia R Woolums
- Department of Pathobiology and Population Medicine, Mississippi State University, Mississippi State, MS, USA
| | - Cyprianna E Swiderski
- Department of Pathobiology and Population Medicine, Mississippi State University, Mississippi State, MS, USA
| | - Andy D Perkins
- Department of Computer Science and Engineering, Mississippi State University, Mississippi State, MS, USA
| | - Bindu Nanduri
- Department of Comparative Biomedical Sciences, Mississippi State University, Mississippi State, MS, USA
| |
Collapse
|
4
|
Quintero E, Isla J, Jordano P. Methodological overview and data‐merging approaches in the study of plant–frugivore interactions. OIKOS 2021. [DOI: 10.1111/oik.08379] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Affiliation(s)
| | - Jorge Isla
- Estación Biológica de Doñana, CSIC Sevilla Spain
| | - Pedro Jordano
- Estación Biológica de Doñana, CSIC Sevilla Spain
- Dept Biología Vegetal y Ecología, Univ. de Sevilla Sevilla Spain
| |
Collapse
|
5
|
Garbulowski M, Smolinska K, Diamanti K, Pan G, Maqbool K, Feuk L, Komorowski J. Interpretable Machine Learning Reveals Dissimilarities Between Subtypes of Autism Spectrum Disorder. Front Genet 2021; 12:618277. [PMID: 33719335 PMCID: PMC7946989 DOI: 10.3389/fgene.2021.618277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 01/12/2021] [Indexed: 01/16/2023] Open
Abstract
Autism spectrum disorder (ASD) is a heterogeneous neuropsychiatric disorder with a complex genetic background. Analysis of altered molecular processes in ASD patients requires linear and nonlinear methods that provide interpretable solutions. Interpretable machine learning provides legible models that allow explaining biological mechanisms and support analysis of clinical subgroups. In this work, we investigated several case-control studies of gene expression measurements of ASD individuals. We constructed a rule-based learning model from three independent datasets that we further visualized as a nonlinear gene-gene co-predictive network. To find dissimilarities between ASD subtypes, we scrutinized a topological structure of the network and estimated a centrality distance. Our analysis revealed that autism is the most severe subtype of ASD, while pervasive developmental disorder-not otherwise specified and Asperger syndrome are closely related and milder ASD subtypes. Furthermore, we analyzed the most important ASD-related features that were described in terms of gene co-predictors. Among others, we found a strong co-predictive mechanism between EMC4 and TMEM30A, which may suggest a co-regulation between these genes. The present study demonstrates the potential of applying interpretable machine learning in bioinformatics analyses. Although the proposed methodology was designed for transcriptomics data, it can be applied to other omics disciplines.
Collapse
Affiliation(s)
- Mateusz Garbulowski
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Karolina Smolinska
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Klev Diamanti
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Gang Pan
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Khurram Maqbool
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Lars Feuk
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Jan Komorowski
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden.,Swedish Collegium for Advanced Study, Uppsala, Sweden.,Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland.,Washington National Primate Research Center, Seattle, WA, United States
| |
Collapse
|
6
|
Myall AC, Perkins S, Rushton D, David J, Spencer P, Jones AR, Antczak P. An OMICs based meta-analysis to support infection state stratification. Bioinformatics 2021; 37:2347-2355. [PMID: 33560295 PMCID: PMC8388022 DOI: 10.1093/bioinformatics/btab089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 01/06/2021] [Accepted: 01/24/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION A fundamental problem for disease treatment is that while antibiotics are a powerful counter to bacteria, they are ineffective against viruses. Often, bacterial and viral infections are confused due to their similar symptoms and lack of rapid diagnostics. With many clinicians relying primarily on symptoms for diagnosis, overuse and misuse of modern antibiotics are rife, contributing to the growing pool of antibiotic resistance. To ensure an individual receives optimal treatment given their disease state and to reduce over-prescription of antibiotics, the host response can in theory be measured quickly to distinguish between the two states. To establish a predictive biomarker panel of disease state (viral/bacterial/no-infection) we conducted a meta-analysis of human blood infection studies using Machine Learning (ML). RESULTS We focused on publicly available gene expression data from two widely used platforms, Affymetrix and Illumina microarrays as they represented a significant proportion of the available data. We were able to develop multi-class models with high accuracies with our best model predicting 93% of bacterial and 89% viral samples correctly. To compare the selected features in each of the different technologies, we reverse engineered the underlying molecular regulatory network and explored the neighbourhood of the selected features. The networks highlighted that although on the gene-level the models differed, they contained genes from the same areas of the network. Specifically, this convergence was to pathways including the Type I interferon Signalling Pathway, Chemotaxis, Apoptotic Processes, and Inflammatory/Innate Response. AVAILABILITY Data and code are available on the Gene Expression Omnibus and github. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ashleigh C Myall
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, United Kingdom.,Department of Mathematics, Imperial College London, London, United Kingdom
| | - Simon Perkins
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - David Rushton
- Defence and Security Analysis Division, Defence Science and Technology laboratory (DSTL), Porton Down, Salisbury, United Kingdom
| | - Jonathan David
- Chemical, Biological and Radiological Division, Defence Science and Technology laboratory (DSTL), Porton Down, Salisbury, United Kingdom
| | - Phillippa Spencer
- Cyber and Information Systems Division, Defence Science and Technology laboratory (DSTL), Porton Down, Salisbury United Kingdom
| | - Andrew R Jones
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Philipp Antczak
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, United Kingdom.,Center for Molecular Medicine, University of Cologne, Cologne, Germany
| |
Collapse
|
7
|
Machine learning approach to integrated endometrial transcriptomic datasets reveals biomarkers predicting uterine receptivity in cattle at seven days after estrous. Sci Rep 2020; 10:16981. [PMID: 33046742 PMCID: PMC7550564 DOI: 10.1038/s41598-020-72988-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 09/07/2020] [Indexed: 12/12/2022] Open
Abstract
The main goal was to apply machine learning (ML) methods on integrated multi-transcriptomic data, to identify endometrial genes capable of predicting uterine receptivity according to their expression patterns in the cow. Public data from five studies were re-analyzed. In all of them, endometrial samples were obtained at day 6–7 of the estrous cycle, from cows or heifers of four different European breeds, classified as pregnant (n = 26) or not (n = 26). First, gene selection was performed through supervised and unsupervised ML algorithms. Then, the predictive ability of potential key genes was evaluated through support vector machine as classifier, using the expression levels of the samples from all the breeds but one, to train the model, and the samples from that one breed, to test it. Finally, the biological meaning of the key genes was explored. Fifty genes were identified, and they could predict uterine receptivity with an overall 96.1% accuracy, despite the animal’s breed and category. Genes with higher expression in the pregnant cows were related to circadian rhythm, Wnt receptor signaling pathway, and embryonic development. This novel and robust combination of computational tools allowed the identification of a group of biologically relevant endometrial genes that could support pregnancy in the cattle.
Collapse
|
8
|
Rabaglino MB, Conrad KP. Evidence for shared molecular pathways of dysregulated decidualization in preeclampsia and endometrial disorders revealed by microarray data integration. FASEB J 2019; 33:11682-11695. [PMID: 31356122 DOI: 10.1096/fj.201900662r] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Microarray data of chorionic villous samples (CVSs) obtained from women of ∼11.5 gestational weeks who developed preeclampsia with severe features (sPE; PE-CVS) revealed a molecular signature of impaired endometrial maturation (decidualization) before and during early pregnancy. Because endometrial disorders are also associated with aberrant decidualization, we asked whether they share molecular features with sPE. We employed microarray data integration to compare the molecular pathologies of PE-CVS and endometrial disorders, as well as decidua obtained postpartum from women with sPE. Eight public databases were reanalyzed with R software to determine differentially expressed genes (DEGs) in pathologic tissues relative to normal controls. DEGs were then compared to explore overlap. Shared DEGs were examined for enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Principal component and network analyses were subsequently applied to selected DEGs. There was significant overlap of DEGs changing in the same direction for PE-CVS and endometrial disorders, suggesting common molecular pathways. Shared DEGs were enriched for cytokine-cytokine receptor interaction. Genes in this pathway revealed expression patterns forming 2 distinct clusters, one for normal and the other pathologic endometrium. The most affected hub genes were related to decidualization and NK cell function. Few DEGs were shared by PE-CVS, and PE decidua obtained postpartum. sPE may be part of a biologic continuum of "endometrial spectrum disorders."-Rabaglino, M. B., Conrad, K. P. Evidence for shared molecular pathways of dysregulated decidualization in preeclampsia and endometrial disorders revealed by microarray data integration.
Collapse
Affiliation(s)
- Maria Belen Rabaglino
- Instituto de Investigación en Ciencias de la Salud (INICSA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Córdoba, Argentina; and
| | - Kirk P Conrad
- Department of Physiology and Functional Genomics, University of Florida, Gainesville, Florida, USA.,Department of Obstetrics and Gynecology, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
9
|
Abstract
Feature (or variable) selection is the process of identifying the minimal set of features with the highest predictive performance on the target variable of interest. Numerous feature selection algorithms have been developed over the years, but only few have been implemented in R as a package. The R package MXM is such an example, which not only offers a variety of feature selection algorithms, but has unique features that make it advantageous over its competitors: a) it contains feature selection algorithms that can treat numerous types of target variables, including continuous, percentages, time to event (survival), binary, nominal, ordinal, clustered, counts, left censored, etc; b) it contains a variety of regression models to plug into the feature selection algorithms; c) it includes an algorithm for detecting multiple solutions (many sets of equivalent features); and d) it includes memory efficient algorithms for high volume data, data that cannot be loaded into R. In this paper we qualitatively compare MXM with other relevant packages and discuss its advantages and disadvantages. We also provide a demonstration of its algorithms using real high-dimensional data from various applications.
Collapse
Affiliation(s)
- Michail Tsagris
- Department of Economics, University of Crete, Rethymnon, 74100, Greece
- Department of Computer Science, University of Crete, Heraklion, Crete, 70013, Greece
- Statistical Learning Lab, Foundation of Research and Technology Hellas, Heraklion, Crete, 70013, Greece
| | - Ioannis Tsamardinos
- Department of Computer Science, University of Crete, Heraklion, Crete, 70013, Greece
- Institute of Applied and Computational Mathematics, Foundation of Research and Technology Hellas, Heraklion, Crete, 70013, Greece
- Gnosis Data Analysis (PC), Heraklion, Crete, 71305, Greece
| |
Collapse
|
10
|
Abstract
Feature (or variable) selection is the process of identifying the minimal set of features with the highest predictive performance on the target variable of interest. Numerous feature selection algorithms have been developed over the years, but only few have been implemented in R and made publicly available R as packages while offering few options. The R package MXM offers a variety of feature selection algorithms, and has unique features that make it advantageous over its competitors: a) it contains feature selection algorithms that can treat numerous types of target variables, including continuous, percentages, time to event (survival), binary, nominal, ordinal, clustered, counts, left censored, etc; b) it contains a variety of regression models that can be plugged into the feature selection algorithms (for example with time to event data the user can choose among Cox, Weibull, log logistic or exponential regression); c) it includes an algorithm for detecting multiple solutions (many sets of statistically equivalent features, plain speaking, two features can carry statistically equivalent information when substituting one with the other does not effect the inference or the conclusions); and d) it includes memory efficient algorithms for high volume data, data that cannot be loaded into R (In a 16GB RAM terminal for example, R cannot directly load data of 16GB size. By utilizing the proper package, we load the data and then perform feature selection.). In this paper, we qualitatively compare MXM with other relevant feature selection packages and discuss its advantages and disadvantages. Further, we provide a demonstration of MXM's algorithms using real high-dimensional data from various applications.
Collapse
Affiliation(s)
- Michail Tsagris
- Department of Economics, University of Crete, Rethymnon, 74100, Greece
- Department of Computer Science, University of Crete, Heraklion, Crete, 70013, Greece
- Statistical Learning Lab, Foundation of Research and Technology Hellas, Heraklion, Crete, 70013, Greece
| | - Ioannis Tsamardinos
- Department of Computer Science, University of Crete, Heraklion, Crete, 70013, Greece
- Institute of Applied and Computational Mathematics, Foundation of Research and Technology Hellas, Heraklion, Crete, 70013, Greece
- Gnosis Data Analysis (PC), Heraklion, Crete, 71305, Greece
| |
Collapse
|
11
|
Wong PS, Tashiro K, Kuhara S, Aburatani S. Elucidation of the sequential transcriptional activity in Escherichia coli using time-series RNA-seq data. Bioinformation 2017; 13:25-30. [PMID: 28479747 PMCID: PMC5405090 DOI: 10.6026/97320630013025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2016] [Accepted: 01/25/2017] [Indexed: 11/23/2022] Open
Abstract
Functional genomics and gene regulation inference has readily expanded our knowledge and understanding of gene interactions with regards to expression regulation. With the advancement of transcriptome sequencing in time-series comes the ability to study the sequential changes of the transcriptome. Here, we present a new method to augment regulation networks accumulated in literature with transcriptome data gathered from time-series experiments to construct a sequential representation of transcription factor activity. We apply our method on a time-series RNA-Seq data set of Escherichia coli as it transitions from growth to stationary phase over five hours and investigate the various activity in gene regulation process by taking advantage of the correlation between regulatory gene pairs to examine their activity on a dynamic network. We analyse the changes in metabolic activity of the pagP gene and associated transcription factors during phase transition, and visualize the sequential transcriptional activity to describe the change in metabolic pathway activity originating from the pagP transcription factor, phoP. We observe a shift from amino acid and nucleic acid metabolism, to energy metabolism during the transition to stationary phase in E. coli.
Collapse
Affiliation(s)
- Pui Shan Wong
- Biotechnology Research Institute for Drug Discovery, National Institute of AIST, Tokyo, Japan
| | - Kosuke Tashiro
- Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka, Japan
| | - Satoru Kuhara
- Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka, Japan
| | - Sachiyo Aburatani
- Biotechnology Research Institute for Drug Discovery, National Institute of AIST, Tokyo, Japan
- Com. Bio Big Data Open Innovation Lab. (CBBD-OIL), National Institute of AIST, Tokyo, Japan
| |
Collapse
|
12
|
Karathanasis N, Tsamardinos I, Lagani V. omicsNPC: Applying the Non-Parametric Combination Methodology to the Integrative Analysis of Heterogeneous Omics Data. PLoS One 2016; 11:e0165545. [PMID: 27812137 PMCID: PMC5094732 DOI: 10.1371/journal.pone.0165545] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 10/13/2016] [Indexed: 12/17/2022] Open
Abstract
Background The advance of omics technologies has made possible to measure several data modalities on a system of interest. In this work, we illustrate how the Non-Parametric Combination methodology, namely NPC, can be used for simultaneously assessing the association of different molecular quantities with an outcome of interest. We argue that NPC methods have several potential applications in integrating heterogeneous omics technologies, as for example identifying genes whose methylation and transcriptional levels are jointly deregulated, or finding proteins whose abundance shows the same trends of the expression of their encoding genes. Results We implemented the NPC methodology within “omicsNPC”, an R function specifically tailored for the characteristics of omics data. We compare omicsNPC against a range of alternative methods on simulated as well as on real data. Comparisons on simulated data point out that omicsNPC produces unbiased / calibrated p-values and performs equally or significantly better than the other methods included in the study; furthermore, the analysis of real data show that omicsNPC (a) exhibits higher statistical power than other methods, (b) it is easily applicable in a number of different scenarios, and (c) its results have improved biological interpretability. Conclusions The omicsNPC function competitively behaves in all comparisons conducted in this study. Taking into account that the method (i) requires minimal assumptions, (ii) it can be used on different studies designs and (iii) it captures the dependences among heterogeneous data modalities, omicsNPC provides a flexible and statistically powerful solution for the integrative analysis of different omics data.
Collapse
Affiliation(s)
- Nestoras Karathanasis
- Institute of Computer Science, Foundation for Research and Technology—Hellas, Heraklion, Greece
| | | | - Vincenzo Lagani
- Department of Computer Science, University of Crete, Heraklion, Greece
- * E-mail:
| |
Collapse
|
13
|
Lagani V, Karozou AD, Gomez-Cabrero D, Silberberg G, Tsamardinos I. Erratum to: A comparative evaluation of data-merging and meta-analysis methods for reconstructing gene-gene interactions. BMC Bioinformatics 2016; 17:290. [PMID: 27465624 PMCID: PMC4963931 DOI: 10.1186/s12859-016-1153-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Affiliation(s)
- Vincenzo Lagani
- Institute of Computer Science, Foundation for Research and Technology - Hellas, Heraklion, Greece.,Computer Science Department, University of Crete, Heraklion, Greece
| | - Argyro D Karozou
- Institute of Computer Science, Foundation for Research and Technology - Hellas, Heraklion, Greece
| | - David Gomez-Cabrero
- Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, 171 77, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, 171 77, Stockholm, Sweden.,Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176, Stockholm, Sweden.,Science for Life Laboratory, 17121, Solna, Sweden
| | - Gilad Silberberg
- Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, 171 77, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, 171 77, Stockholm, Sweden.,Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176, Stockholm, Sweden.,Science for Life Laboratory, 17121, Solna, Sweden
| | - Ioannis Tsamardinos
- Institute of Computer Science, Foundation for Research and Technology - Hellas, Heraklion, Greece. .,Computer Science Department, University of Crete, Heraklion, Greece.
| |
Collapse
|