1
|
Zhao L, Ma Y, Chen S, Zhou J. Multi-view co-clustering with multi-similarity. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04385-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
2
|
Wang J, Zhang H, Ren W, Guo M, Yu G. EpiMC: Detecting Epistatic Interactions Using Multiple Clusterings. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:243-254. [PMID: 33989157 DOI: 10.1109/tcbb.2021.3080462] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Detecting single nucleotide polymorphisms (SNPs) interactions is crucial to identify susceptibility genes associated with complex human diseases in genome-wide association studies. Clustering-based approaches are widely used in reducing search space and exploring potential relationships between SNPs in epistasis analysis. However, these approaches all only use a single measure to filter out nonsignificant SNP combinations, which may be significant ones from another perspective. In this paper, we propose a two-stage approach named EpiMC (Epistatic Interactions detection based on Multiple Clusterings) that employs multiple clusterings to obtain more precise candidate sets and more comprehensively detect high-order interactions based on these sets. In the first stage, EpiMC proposes a matrix factorization based multiple clusterings algorithm to generate multiple diverse clusterings, each of which divide all SNPs into different clusters. This stage aims to reduce the chance of filtering out potential candidates overlooked by a single clustering and groups associated SNPs together from different clustering perspectives. In the next stage, EpiMC considers both the single-locus effects and interaction effects to select high-quality disease associated SNPs, and then uses Jaccard similarity to get candidate sets. Finally, EpiMC uses exhaustive search on the obtained small candidate sets to precisely detect epsitatic interactions. Extensive simulation experiments show that EpiMC has a better performance in detecting high-order interactions than state-of-the-art solutions. On the Wellcome Trust Case Control Consortium (WTCCC) dataset, EpiMC detects several significant epistatic interactions associated with breast cancer (BC) and age-related macular degeneration (AMD), which again corroborate the effectiveness of EpiMC.
Collapse
|
3
|
Fratello M, Cattelani L, Federico A, Pavel A, Scala G, Serra A, Greco D. Unsupervised Algorithms for Microarray Sample Stratification. Methods Mol Biol 2022; 2401:121-146. [PMID: 34902126 DOI: 10.1007/978-1-0716-1839-4_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The amount of data made available by microarrays gives researchers the opportunity to delve into the complexity of biological systems. However, the noisy and extremely high-dimensional nature of this kind of data poses significant challenges. Microarrays allow for the parallel measurement of thousands of molecular objects spanning different layers of interactions. In order to be able to discover hidden patterns, the most disparate analytical techniques have been proposed. Here, we describe the basic methodologies to approach the analysis of microarray datasets that focus on the task of (sub)group discovery.
Collapse
Affiliation(s)
- Michele Fratello
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Alisa Pavel
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Giovanni Scala
- Department of Biology, University of Naples Federico II, Naples, Italy
| | - Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.
- BioMediTech Institute, Tampere University, Tampere, Finland.
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland.
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
4
|
Network Approaches for Precision Oncology. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:199-213. [DOI: 10.1007/978-3-030-91836-1_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
5
|
Framework for classification of cancer gene expression data using Bayesian hyper-parameter optimization. Med Biol Eng Comput 2021; 59:2353-2371. [PMID: 34609687 DOI: 10.1007/s11517-021-02442-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 09/13/2021] [Indexed: 10/20/2022]
Abstract
Computational classification of cancers is an important research problem. Gene expression data has 1000s of features, very few samples, and a class imbalance problem. In this paper, we have proposed a framework for the classification of cancer gene expression profiles. The framework consists of a pipeline of methods for data pre-processing, feature selection, and classification. Data pre-processing is done by standard scaling and normalization of the features. The feature selection is performed in two steps. First, recursive feature elimination (RFE) is used; then, a genetic algorithm is applied only in case RFE results in a feature subset of size more than a specific threshold. Next, is a meta-pool of diverse, individual as well as ensemble classifiers. Hyper-parameters of each member in the meta-pool are optimized using Bayesian Optimization. An algorithm is developed to select the best classifier from the meta-pool based on classification accuracy and computation time taken. We evaluated the framework on 6 publicly available microarray datasets and the PAN-Cancer RNA Sequencing dataset. We found that the classifier selected by the proposed framework produced significant improvement in classification accuracy and computation time required to predict labels for test datasets. A detailed comparison with the state-of-the-art methods shows that the proposed framework outperforms all of them.
Collapse
|
6
|
Song J, Peng W, Wang F. Identifying cancer patient subgroups by finding co-modules from the driver mutation profiles and downstream gene expression profiles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; PP:2863-2872. [PMID: 34415837 DOI: 10.1109/tcbb.2021.3106344] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Identifying cancer subtypes shed new light on effective personalized cancer medicine, future therapeutic strategies and minimizing treatment-related costs. Recently, there are many clustering methods have been proposed in categorizing cancer patients. However, these methods still fail to fully use the prior known biological information in the model designing process to improve precision and efficiency. It is acknowledged that the driver gene always regulates its downstream genes in the net-work to perform a certain function. By analyzing the known clinic cancer subtype data, we found some special co-pathways between the driver genes and the downstream genes in the cancer patients of the same subgroup. Hence, we proposed a novel model named DDCMNMF(Driver and Downstream gene Co-Module Assisted Multiple Non-negative Matrix Factorization model) that first stratify cancer sub-types by identifying co-modules of driver genes and downstream genes. We applied our model on lung and breast cancer datasets and compared it with the other four state-of-the-art models. The final results show that our model could identify the cancer subtypes with high compactness and separateness and achieve a high degree of consistency with the known cancer subtypes. The survival time analysis further proves the significant clinical characteristic of identified cancer subgroups by our model.
Collapse
|
7
|
Fan Y, Han Z, Lu X, Arbab AAI, Nazar M, Yang Y, Yang Z. Short Time-Series Expression Transcriptome Data Reveal the Gene Expression Patterns of Dairy Cow Mammary Gland as Milk Yield Decreased Process. Genes (Basel) 2021; 12:genes12060942. [PMID: 34203058 PMCID: PMC8235497 DOI: 10.3390/genes12060942] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 06/14/2021] [Accepted: 06/18/2021] [Indexed: 12/29/2022] Open
Abstract
The existing research on dairy cow mammary gland genes is extensive, but there have been few reports about dynamic changes in dairy cow mammary gland genes as milk yield decrease. For the first time, transcriptome analysis based on short time-series expression miner (STEM) and histological observations were performed using the Holstein dairy cow mammary gland to explore gene expression patterns in this process of decrease (at peak, mid-, and late lactation). Histological observations suggested that the number of mammary acinous cells at peak/mid-lactation was significantly higher than that at mid-/late lactation, and the lipid droplets area secreted by dairy cows was almost unaltered across the three stages of lactation (p > 0.05). Totals of 882 and 1439 genes were differentially expressed at mid- and late lactation, respectively, compared to peak lactation. Function analysis showed that differentially expressed genes (DEGs) were mainly related to apoptosis and energy metabolism (fold change ≥ 2 or fold change ≤ 0.5, p-value ≤ 0.05). Transcriptome analysis based on STEM identified 16 profiles of differential gene expression patterns, including 5 significant profiles (false discovery rate, FDR ≤ 0.05). Function analysis revealed DEGs involved in milk fat synthesis were downregulated in Profile 0 and DEGs in Profile 12 associated with protein synthesis. These findings provide a foundation for future studies on the molecular mechanisms underlying mammary gland development in dairy cows.
Collapse
Affiliation(s)
- Yongliang Fan
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, China; (Y.F.); (Z.H.); (X.L.); (A.A.I.A.); (M.N.)
- Joint International Research Laboratory of Agriculture & Agri-Product Safety, Ministry of Education, Yangzhou University, Yangzhou 225009, China
| | - Ziyin Han
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, China; (Y.F.); (Z.H.); (X.L.); (A.A.I.A.); (M.N.)
- Joint International Research Laboratory of Agriculture & Agri-Product Safety, Ministry of Education, Yangzhou University, Yangzhou 225009, China
| | - Xubin Lu
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, China; (Y.F.); (Z.H.); (X.L.); (A.A.I.A.); (M.N.)
- Joint International Research Laboratory of Agriculture & Agri-Product Safety, Ministry of Education, Yangzhou University, Yangzhou 225009, China
| | - Abdelaziz Adam Idriss Arbab
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, China; (Y.F.); (Z.H.); (X.L.); (A.A.I.A.); (M.N.)
- Joint International Research Laboratory of Agriculture & Agri-Product Safety, Ministry of Education, Yangzhou University, Yangzhou 225009, China
| | - Mudasir Nazar
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, China; (Y.F.); (Z.H.); (X.L.); (A.A.I.A.); (M.N.)
- Joint International Research Laboratory of Agriculture & Agri-Product Safety, Ministry of Education, Yangzhou University, Yangzhou 225009, China
| | - Yi Yang
- Jiangsu Co-Innovation Center for the Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University College of Veterinary Medicine, Yangzhou 225009, China;
| | - Zhangping Yang
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, China; (Y.F.); (Z.H.); (X.L.); (A.A.I.A.); (M.N.)
- Joint International Research Laboratory of Agriculture & Agri-Product Safety, Ministry of Education, Yangzhou University, Yangzhou 225009, China
- Correspondence: ; Tel.: +86-0514-87979269
| |
Collapse
|
8
|
Lazareva O, Canzar S, Yuan K, Baumbach J, Blumenthal DB, Tieri P, Kacprowski T, List M. BiCoN: network-constrained biclustering of patients and omics data. Bioinformatics 2020; 37:2398-2404. [PMID: 33367514 DOI: 10.1093/bioinformatics/btaa1076] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 11/25/2020] [Accepted: 12/15/2020] [Indexed: 12/21/2022] Open
Abstract
Abstract
Motivation
Unsupervised learning approaches are frequently used to stratify patients into clinically relevant subgroups and to identify biomarkers such as disease-associated genes. However, clustering and biclustering techniques are oblivious to the functional relationship of genes and are thus not ideally suited to pinpoint molecular mechanisms along with patient subgroups.
Results
We developed the network-constrained biclustering approach Biclustering Constrained by Networks (BiCoN) which (i) restricts biclusters to functionally related genes connected in molecular interaction networks and (ii) maximizes the difference in gene expression between two subgroups of patients. This allows BiCoN to simultaneously pinpoint molecular mechanisms responsible for the patient grouping. Network-constrained clustering of genes makes BiCoN more robust to noise and batch effects than typical clustering and biclustering methods. BiCoN can faithfully reproduce known disease subtypes as well as novel, clinically relevant patient subgroups, as we could demonstrate using breast and lung cancer datasets. In summary, BiCoN is a novel systems medicine tool that combines several heuristic optimization strategies for robust disease mechanism extraction. BiCoN is well-documented and freely available as a python package or a web interface.
Availability and implementation
PyPI package: https://pypi.org/project/bicon.
Web interface
https://exbio.wzw.tum.de/bicon.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Olga Lazareva
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Weihenstephan, 80333 Munich, Germany
| | - Stefan Canzar
- Gene Center, Ludwig-Maximilians-University of Munich, 81377 Munich, Germany
| | - Kevin Yuan
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Weihenstephan, 80333 Munich, Germany
| | - Jan Baumbach
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Weihenstephan, 80333 Munich, Germany
| | - David B Blumenthal
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Weihenstephan, 80333 Munich, Germany
| | - Paolo Tieri
- CNR National Research Council, IAC Institute for Applied Computing, Rome 00185, Italy
- La Sapienza University of Rome, Rome 00185, Italy
| | - Tim Kacprowski
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Weihenstephan, 80333 Munich, Germany
- Division of Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, TU Braunschweig and Hannover Medical School, Brunswick 38106, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Weihenstephan, 80333 Munich, Germany
| |
Collapse
|
9
|
Li Z, Chang C, Kundu S, Long Q. Bayesian generalized biclustering analysis via adaptive structured shrinkage. Biostatistics 2020; 21:610-624. [PMID: 30596887 PMCID: PMC7307984 DOI: 10.1093/biostatistics/kxy081] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 09/18/2018] [Accepted: 11/21/2018] [Indexed: 12/13/2022] Open
Abstract
Biclustering techniques can identify local patterns of a data matrix by clustering feature space and sample space at the same time. Various biclustering methods have been proposed and successfully applied to analysis of gene expression data. While existing biclustering methods have many desirable features, most of them are developed for continuous data and few of them can efficiently handle -omics data of various types, for example, binomial data as in single nucleotide polymorphism data or negative binomial data as in RNA-seq data. In addition, none of existing methods can utilize biological information such as those from functional genomics or proteomics. Recent work has shown that incorporating biological information can improve variable selection and prediction performance in analyses such as linear regression and multivariate analysis. In this article, we propose a novel Bayesian biclustering method that can handle multiple data types including Gaussian, Binomial, and Negative Binomial. In addition, our method uses a Bayesian adaptive structured shrinkage prior that enables feature selection guided by existing biological information. Our simulation studies and application to multi-omics datasets demonstrate robust and superior performance of the proposed method, compared to other existing biclustering methods.
Collapse
Affiliation(s)
- Ziyi Li
- Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Road, NE, Atlanta, GA, USA
| | - Changgee Chang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, USA
| | - Suprateek Kundu
- Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Road, NE, Atlanta, GA, USA
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, USA
| |
Collapse
|
10
|
Blatti C, Emad A, Berry MJ, Gatzke L, Epstein M, Lanier D, Rizal P, Ge J, Liao X, Sobh O, Lambert M, Post CS, Xiao J, Groves P, Epstein AT, Chen X, Srinivasan S, Lehnert E, Kalari KR, Wang L, Weinshilboum RM, Song JS, Jongeneel CV, Han J, Ravaioli U, Sobh N, Bushell CB, Sinha S. Knowledge-guided analysis of "omics" data using the KnowEnG cloud platform. PLoS Biol 2020; 18:e3000583. [PMID: 31971940 PMCID: PMC6977717 DOI: 10.1371/journal.pbio.3000583] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 12/19/2019] [Indexed: 12/19/2022] Open
Abstract
We present Knowledge Engine for Genomics (KnowEnG), a free-to-use computational system for analysis of genomics data sets, designed to accelerate biomedical discovery. It includes tools for popular bioinformatics tasks such as gene prioritization, sample clustering, gene set analysis, and expression signature analysis. The system specializes in "knowledge-guided" data mining and machine learning algorithms, in which user-provided data are analyzed in light of prior information about genes, aggregated from numerous knowledge bases and encoded in a massive "Knowledge Network." KnowEnG adheres to "FAIR" principles (findable, accessible, interoperable, and reuseable): its tools are easily portable to diverse computing environments, run on the cloud for scalable and cost-effective execution, and are interoperable with other computing platforms. The analysis tools are made available through multiple access modes, including a web portal with specialized visualization modules. We demonstrate the KnowEnG system's potential value in democratization of advanced tools for the modern genomics era through several case studies that use its tools to recreate and expand upon the published analysis of cancer data sets.
Collapse
Affiliation(s)
- Charles Blatti
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Amin Emad
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Electrical and Computer Engineering, McGill University, Montreal, Canada
| | - Matthew J. Berry
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Lisa Gatzke
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Milt Epstein
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Daniel Lanier
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Pramod Rizal
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Jing Ge
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Xiaoxia Liao
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Omar Sobh
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Mike Lambert
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Corey S. Post
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Jinfeng Xiao
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Peter Groves
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Aidan T. Epstein
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Xi Chen
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Subhashini Srinivasan
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Erik Lehnert
- Seven Bridges Genomics, Charlestown, Massachusetts, United States of America
| | - Krishna R. Kalari
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Liewei Wang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Richard M. Weinshilboum
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Jun S. Song
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - C. Victor Jongeneel
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Jiawei Han
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Umberto Ravaioli
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Nahil Sobh
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Colleen B. Bushell
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Saurabh Sinha
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
11
|
Chang C, Oh J, Min EJ, Long Q. Knowledge-Guided Biclustering via Sparse Variational EM Algorithm. 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE : PROCEEDINGS : 10-11 NOVEMBER 2019, BEIJING, CHINA. IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (10TH : 2019 : BEIJING, CHINA) 2019; 2019:25-32. [PMID: 34290493 PMCID: PMC8291726 DOI: 10.1109/icbk.2019.00012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
A biclustering in the analysis of a gene expression data matrix, for example, is defined as a set of biclusters where each bicluster is a group of genes and a group of samples for which the genes are differentially expressed. Although many data mining approaches for biclustering exist in the literature, only few are able to incorporate prior knowledge to the analysis, which can lead to great improvements in terms of accuracy and interpretability, and all are limited in handling discrete data types. We propose a generalized biclustering approach that can be used for integrative analysis of multi-omics data with different data types. Our method is capable of utilizing biological information that can be represented by graph such as functional genomics and functional proteomics and accommodating a combination of continuous and discrete data types. The proposed method builds on a generalized Bayesian factor analysis framework and a variational EM approach is used to obtain parameter estimates, where the latent quantities in the loglikelihood are iteratively imputed by their conditional expectations. The biclusters are retrieved via the sparse estimates of the factor loadings and the conditional expectation of the latent factors. In order to obtain the sparse conditional expectation of the latent factors, a novel sparse variational EM algorithm is used. We demonstrate the superiority of our method over several existing biclustering methods in extensive simulation experiements and in integrative analysis of multi-omics data.
Collapse
Affiliation(s)
- Changgee Chang
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, USA
| | - Jihwan Oh
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, USA
| | - Eun Jeong Min
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, USA
| | - Qi Long
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, USA
| |
Collapse
|
12
|
Zhao Y, Chang C, Long Q. Knowledge-Guided Statistical Learning Methods for Analysis of High-Dimensional -Omics Data in Precision Oncology. JCO Precis Oncol 2019; 3:PO.19.00018. [PMID: 35100722 PMCID: PMC9797232 DOI: 10.1200/po.19.00018] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/25/2019] [Indexed: 12/31/2022] Open
Abstract
High-dimensional -omics data such as genomic, transcriptomic, and metabolomic data offer great promise in advancing precision medicine. In particular, such data have enabled the investigation of complex diseases such as cancer at an unprecedented scale and in multiple dimensions. However, a number of analytical challenges complicate analysis of high-dimensional -omics data. One is the growing recognition that complex diseases such as cancer are multifactorial and may be attributed to harmful changes on multiple -omics levels and on the pathway level. When individual genes in an important pathway have relatively weak signals, it can be challenging to detect them on their own, but the aggregated signal in the pathway can be considerably stronger and hence easier to detect with the same sample size. To address these challenges, there is a growing body of literature on knowledge-guided statistical learning methods for analysis of high-dimensional -omics data that can incorporate biological knowledge such as functional genomics and functional proteomics. These methods have been shown to improve predication and classification accuracy and yield biologically more interpretable results compared with statistical learning methods that do not use biological knowledge. In this review, we survey current knowledge-guided statistical learning methods, including both supervised learning and unsupervised learning, and their applications to precision oncology, and we discuss future research directions.
Collapse
Affiliation(s)
- Yize Zhao
- Weill Cornell Medicine, New York, NY
| | - Changgee Chang
- University of Pennsylvania Perelman School
of Medicine, Philadelphia, PA
| | - Qi Long
- University of Pennsylvania Perelman School
of Medicine, Philadelphia, PA
| |
Collapse
|
13
|
Ozturk K, Dow M, Carlin DE, Bejar R, Carter H. The Emerging Potential for Network Analysis to Inform Precision Cancer Medicine. J Mol Biol 2018; 430:2875-2899. [PMID: 29908887 PMCID: PMC6097914 DOI: 10.1016/j.jmb.2018.06.016] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 05/30/2018] [Accepted: 06/06/2018] [Indexed: 12/19/2022]
Abstract
Precision cancer medicine promises to tailor clinical decisions to patients using genomic information. Indeed, successes of drugs targeting genetic alterations in tumors, such as imatinib that targets BCR-ABL in chronic myelogenous leukemia, have demonstrated the power of this approach. However, biological systems are complex, and patients may differ not only by the specific genetic alterations in their tumor, but also by more subtle interactions among such alterations. Systems biology and more specifically, network analysis, provides a framework for advancing precision medicine beyond clinical actionability of individual mutations. Here we discuss applications of network analysis to study tumor biology, early methods for N-of-1 tumor genome analysis, and the path for such tools to the clinic.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Michelle Dow
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Daniel E Carlin
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Rafael Bejar
- Moores Cancer Center, Division of Hematology and Oncology, University of California San Diego, La Jolla, CA 92093, USA
| | - Hannah Carter
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA; Moores Cancer Center and Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA; CIFAR, MaRS Centre, West Tower, 661 University Ave., Suite 505, Toronto, ON M5G 1M1, Canada.
| |
Collapse
|