1
|
Singh V, Singh V. Inferring Interaction Networks from Transcriptomic Data: Methods and Applications. Methods Mol Biol 2024; 2812:11-37. [PMID: 39068355 DOI: 10.1007/978-1-0716-3886-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Transcriptomic data is a treasure trove in modern molecular biology, as it offers a comprehensive viewpoint into the intricate nuances of gene expression dynamics underlying biological systems. This genetic information must be utilized to infer biomolecular interaction networks that can provide insights into the complex regulatory mechanisms underpinning the dynamic cellular processes. Gene regulatory networks and protein-protein interaction networks are two major classes of such networks. This chapter thoroughly investigates the wide range of methodologies used for distilling insightful revelations from transcriptomic data that include association-based methods (based on correlation among expression vectors), probabilistic models (using Bayesian and Gaussian models), and interologous methods. We reviewed different approaches for evaluating the significance of interactions based on the network topology and biological functions of the interacting molecules and discuss various strategies for the identification of functional modules. The chapter concludes with highlighting network-based techniques of prioritizing key genes, outlining the centrality-based, diffusion- based, and subgraph-based methods. The chapter provides a meticulous framework for investigating transcriptomic data to uncover assembly of complex molecular networks for their adaptable analyses across a broad spectrum of biological domains.
Collapse
Affiliation(s)
- Vikram Singh
- Centre for Computational Biology and Bioinformatics, Central University of Himachal Pradesh, Dharamshala, Himachal Pradesh, India
| | - Vikram Singh
- Centre for Computational Biology and Bioinformatics, Central University of Himachal Pradesh, Dharamshala, Himachal Pradesh, India.
| |
Collapse
|
2
|
Bi-EB: Empirical Bayesian Biclustering for Multi-Omics Data Integration Pattern Identification among Species. Genes (Basel) 2022; 13:genes13111982. [DOI: 10.3390/genes13111982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 09/20/2022] [Accepted: 09/23/2022] [Indexed: 11/16/2022] Open
Abstract
Although several biclustering algorithms have been studied, few are used for cross-pattern identification across species using multi-omics data mining. A fast empirical Bayesian biclustering (Bi-EB) algorithm is developed to detect the patterns shared from both integrated omics data and between species. The Bi-EB algorithm addresses the clinical critical translational question using the bioinformatics strategy, which addresses how modules of genotype variation associated with phenotype from cancer cell screening data can be identified and how these findings can be directly translated to a cancer patient subpopulation. Empirical Bayesian probabilistic interpretation and ratio strategy are proposed in Bi-EB for the first time to detect the pairwise regulation patterns among species and variations in multiple omics on a gene level, such as proteins and mRNA. An expectation–maximization (EM) optimal algorithm is used to extract the foreground co-current variations out of its background noise data by adjusting parameters with bicluster membership probability threshold Ac; and the bicluster average probability p. Three simulation experiments and two real biology mRNA and protein data analyses conducted on the well-known Cancer Genomics Atlas (TCGA) and The Cancer Cell Line Encyclopedia (CCLE) verify that the proposed Bi-EB algorithm can significantly improve the clustering recovery and relevance accuracy, outperforming the other seven biclustering methods—Cheng and Church (CC), xMOTIFs, BiMax, Plaid, Spectral, FABIA, and QUBIC—with a recovery score of 0.98 and a relevance score of 0.99. At the same time, the Bi-EB algorithm is used to determine shared the causality patterns of mRNA to the protein between patients and cancer cells in TCGA and CCLE breast cancer. The clinically well-known treatment target protein module estrogen receptor (ER), ER (p118), AR, BCL2, cyclin E1, and IGFBP2 are identified in accordance with their mRNA expression variations in the luminal-like subtype. Ten genes, including CCNB1, CDH1, KDR, RAB25, PRKCA, etc., found which can maintain the high accordance of mRNA–protein for both breast cancer patients and cell lines in basal-like subtypes for the first time. Bi-EB provides a useful biclustering analysis tool to discover the cross patterns hidden both in multiple data matrixes (omics) and species. The implementation of the Bi-EB method in the clinical setting will have a direct impact on administrating translational research based on the cancer cell screening guidance.
Collapse
|
3
|
Wang J, Wang X, Yu G, Domeniconi C, Yu Z, Zhang Z. Discovering Multiple Co-Clusterings With Matrix Factorization. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:3576-3587. [PMID: 31751260 DOI: 10.1109/tcyb.2019.2950568] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Clustering is a fundamental data exploration task which aims at discovering the hidden grouping structure in the data. The traditional clustering methods typically compute a single partition. However, there often exist different and equally meaningful clusterings in complex data. To solve this issue, multiple clustering approaches have emerged with the goal of exploring alternative clusterings from different perspectives. Existing solutions to this problem mainly focus on one-way clustering, that is, they cluster either the samples or the features. However, for many practical tasks, it is meaningful and desirable to explore alternative two-way clusterings (or co-clusterings), which capture not only the sample cluster structure but also the feature cluster structure. To tackle this interesting and unresolved task, we introduce an approach, called multiple co-clusterings (MultiCCs), to generate multiple alternative co-clusterings at the same time. MultiCC takes advantage of matrix tri-factorization to seek the co-clustering indicator matrices for samples and features and defines the row and column redundancy quantification terms to enforce diversity among co-clusterings based on these indicator matrices. After that, it integrates matrix tri-factorization and two nonredundancy terms into a unified objective function and gives an alternative optimization procedure to optimize the objective function. Extensive experimental results demonstrate that MultiCC performs significantly better than the existing multiple clustering methods. In addition, MultiCC can find out interesting co-clusters, which cannot be made by those comparing methods.
Collapse
|
4
|
|
5
|
Zolotareva O, Khakabimamaghani S, Isaeva OI, Chervontseva Z, Savchik A, Ester M. Identification Of Differentially Expressed Gene Modules In Heterogeneous Diseases. Bioinformatics 2020; 37:1691-1698. [PMID: 33325506 DOI: 10.1093/bioinformatics/btaa1038] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Revised: 11/25/2020] [Accepted: 12/02/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Identification of differentially expressed genes is necessary for unraveling disease pathogenesis. This task is complicated by the fact that many diseases are heterogeneous at the molecular level and samples representing distinct disease subtypes may demonstrate different patterns of dysregulation. Biclustering methods are capable of identifying genes that follow a similar expression pattern only in a subset of samples and hence can consider disease heterogeneity. However, identifying biologically significant and reproducible sets of genes and samples remains challenging for the existing tools. Many recent studies have shown that the integration of gene expression and protein interaction data improves the robustness of prediction and classification and advances biomarker discovery. RESULTS Here we present DESMOND, a new method for identification of Differentially ExpreSsed gene MOdules iN Diseases. DESMOND performs network-constrained biclustering on gene expression data and identifies gene modules - connected sets of genes up- or down-regulated in subsets of samples. We applied DESMOND on expression profiles of samples from two large breast cancer cohorts and have shown that the capability of DESMOND to incorporate protein interactions allows identifying the biologically meaningful gene and sample subsets and improves the reproducibility of the results. AVAILABILITY https://github.com/ozolotareva/DESMOND. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Olga Zolotareva
- International Research Training Group" Computational Methods for the Analysis of the Diversity and Dynamics of Genomes" and Genome Informatics, Faculty of Technology and Center for Biotechnology, Bielefeld University, Germany.,Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | | | - Olga I Isaeva
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Russia.,BostonGene LLC, Lincoln, Massachusetts, USA.,Divisions of Molecular Oncology & Immunology; Tumor Biology & Immunology; Molecular Carcinogenesis, The Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Zoe Chervontseva
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Russia.,A.A.Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences (RAS), Moscow, Russia
| | - Alexey Savchik
- A.A.Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences (RAS), Moscow, Russia
| | - Martin Ester
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada.,Vancouver Prostate Centre, Vancouver, BC, Canada
| |
Collapse
|
6
|
Yu G, Yu X, Wang J. Network-aided Bi-Clustering for discovering cancer subtypes. Sci Rep 2017; 7:1046. [PMID: 28432308 PMCID: PMC5430742 DOI: 10.1038/s41598-017-01064-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Accepted: 03/28/2017] [Indexed: 12/18/2022] Open
Abstract
Bi-clustering is a widely used data mining technique for analyzing gene expression data. It simultaneously groups genes and samples of an input gene expression data matrix to discover bi-clusters that relevant samples exhibit similar gene expression profiles over a subset of genes. The discovered bi-clusters bring insights for categorization of cancer subtypes, gene treatments and others. Most existing bi-clustering approaches can only enumerate bi-clusters with constant values. Gene interaction networks can help to understand the pattern of cancer subtypes, but they are rarely integrated with gene expression data for exploring cancer subtypes. In this paper, we propose a novel method called Network-aided Bi-Clustering (NetBC). NetBC assigns weights to genes based on the structure of gene interaction network, and it iteratively optimizes sum-squared residue to obtain the row and column indicative matrices of bi-clusters by matrix factorization. NetBC can not only efficiently discover bi-clusters with constant values, but also bi-clusters with coherent trends. Empirical study on large-scale cancer gene expression datasets demonstrates that NetBC can more accurately discover cancer subtypes than other related algorithms.
Collapse
Affiliation(s)
- Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Xianxue Yu
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Chongqing, China.
| |
Collapse
|
7
|
Hauschild AC, Frisch T, Baumbach JI, Baumbach J. Carotta: Revealing Hidden Confounder Markers in Metabolic Breath Profiles. Metabolites 2015; 5:344-63. [PMID: 26065494 PMCID: PMC4495376 DOI: 10.3390/metabo5020344] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Revised: 05/20/2015] [Accepted: 05/25/2015] [Indexed: 12/20/2022] Open
Abstract
Computational breath analysis is a growing research area aiming at identifying volatile organic compounds (VOCs) in human breath to assist medical diagnostics of the next generation. While inexpensive and non-invasive bioanalytical technologies for metabolite detection in exhaled air and bacterial/fungal vapor exist and the first studies on the power of supervised machine learning methods for profiling of the resulting data were conducted, we lack methods to extract hidden data features emerging from confounding factors. Here, we present Carotta, a new cluster analysis framework dedicated to uncovering such hidden substructures by sophisticated unsupervised statistical learning methods. We study the power of transitivity clustering and hierarchical clustering to identify groups of VOCs with similar expression behavior over most patient breath samples and/or groups of patients with a similar VOC intensity pattern. This enables the discovery of dependencies between metabolites. On the one hand, this allows us to eliminate the effect of potential confounding factors hindering disease classification, such as smoking. On the other hand, we may also identify VOCs associated with disease subtypes or concomitant diseases. Carotta is an open source software with an intuitive graphical user interface promoting data handling, analysis and visualization. The back-end is designed to be modular, allowing for easy extensions with plugins in the future, such as new clustering methods and statistics. It does not require much prior knowledge or technical skills to operate. We demonstrate its power and applicability by means of one artificial dataset. We also apply Carotta exemplarily to a real-world example dataset on chronic obstructive pulmonary disease (COPD). While the artificial data are utilized as a proof of concept, we will demonstrate how Carotta finds candidate markers in our real dataset associated with confounders rather than the primary disease (COPD) and bronchial carcinoma (BC). Carotta is publicly available at http://carotta.compbio.sdu.dk [1].
Collapse
Affiliation(s)
- Anne-Christin Hauschild
- Computational Systems Biology Group, Max Planck Institute for Informatics, Saarbrücken 66123, Germany.
- Computational Biology Group, Department of Mathematics and Computer Science, University of Southern Denmark, Odense 5230, Denmark.
| | - Tobias Frisch
- Computational Systems Biology Group, Max Planck Institute for Informatics, Saarbrücken 66123, Germany.
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany.
| | - Jörg Ingo Baumbach
- Faculty of Applied Chemistry, Reutlingen University, Reutlingen 72762, Germany.
| | - Jan Baumbach
- Computational Biology Group, Department of Mathematics and Computer Science, University of Southern Denmark, Odense 5230, Denmark.
| |
Collapse
|
8
|
Deveci M, Küçüktunç O, Eren K, Bozdağ D, Kaya K, Çatalyürek ÜV. Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering. Methods Mol Biol 2015; 1375:55-74. [PMID: 26626937 DOI: 10.1007/7651_2015_246] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Rapid development and increasing popularity of gene expression microarrays have resulted in a number of studies on the discovery of co-regulated genes. One important way of discovering such co-regulations is the query-based search since gene co-expressions may indicate a shared role in a biological process. Although there exist promising query-driven search methods adapting clustering, they fail to capture many genes that function in the same biological pathway because microarray datasets are fraught with spurious samples or samples of diverse origin, or the pathways might be regulated under only a subset of samples. On the other hand, a class of clustering algorithms known as biclustering algorithms which simultaneously cluster both the items and their features are useful while analyzing gene expression data, or any data in which items are related in only a subset of their samples. This means that genes need not be related in all samples to be clustered together. Because many genes only interact under specific circumstances, biclustering may recover the relationships that traditional clustering algorithms can easily miss. In this chapter, we briefly summarize the literature using biclustering for querying co-regulated genes. Then we present a novel biclustering approach and evaluate its performance by a thorough experimental analysis.
Collapse
Affiliation(s)
- Mehmet Deveci
- Computer Science and Engineering, The Ohio State University, Columbus, OH, USA
| | - Onur Küçüktunç
- Computer Science and Engineering, The Ohio State University, Columbus, OH, USA
| | - Kemal Eren
- Computer Science and Engineering, The Ohio State University, Columbus, OH, USA
| | - Doruk Bozdağ
- Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Kamer Kaya
- Computer Science and Engineering, Sabancı University, Istanbul, Turkey
| | - Ümit V Çatalyürek
- Biomedical Informatics, Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|