101
|
Maj C, Azevedo T, Giansanti V, Borisov O, Dimitri GM, Spasov S, Lió P, Merelli I. Integration of Machine Learning Methods to Dissect Genetically Imputed Transcriptomic Profiles in Alzheimer's Disease. Front Genet 2019; 10:726. [PMID: 31552082 PMCID: PMC6735530 DOI: 10.3389/fgene.2019.00726] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 07/10/2019] [Indexed: 12/12/2022] Open
Abstract
The genetic component of many common traits is associated with the gene expression and several variants act as expression quantitative loci, regulating the gene expression in a tissue specific manner. In this work, we applied tissue-specific cis-eQTL gene expression prediction models on the genotype of 808 samples including controls, subjects with mild cognitive impairment, and patients with Alzheimer's Disease. We then dissected the imputed transcriptomic profiles by means of different unsupervised and supervised machine learning approaches to identify potential biological associations. Our analysis suggests that unsupervised and supervised methods can provide complementary information, which can be integrated for a better characterization of the underlying biological system. In particular, a variational autoencoder representation of the transcriptomic profiles, followed by a support vector machine classification, has been used for tissue-specific gene prioritizations. Interestingly, the achieved gene prioritizations can be efficiently integrated as a feature selection step for improving the accuracy of deep learning classifier networks. The identified gene-tissue information suggests a potential role for inflammatory and regulatory processes in gut-brain axis related tissues. In line with the expected low heritability that can be apportioned to eQTL variants, we were able to achieve only relatively low prediction capability with deep learning classification models. However, our analysis revealed that the classification power strongly depends on the network structure, with recurrent neural networks being the best performing network class. Interestingly, cross-tissue analysis suggests a potentially greater role of models trained in brain tissues also by considering dementia-related endophenotypes. Overall, the present analysis suggests that the combination of supervised and unsupervised machine learning techniques can be used for the evaluation of high dimensional omics data.
Collapse
Affiliation(s)
- Carlo Maj
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Bonn, Germany
| | - Tiago Azevedo
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Valentina Giansanti
- National Research Council, Institute for Biomedical Technologies, Milan, Italy
| | - Oleg Borisov
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Bonn, Germany
| | - Giovanna Maria Dimitri
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Simeon Spasov
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | | | - Pietro Lió
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Ivan Merelli
- National Research Council, Institute for Biomedical Technologies, Milan, Italy
| |
Collapse
|
102
|
Siebert JC, Neff CP, Schneider JM, Regner EH, Ohri N, Kuhn KA, Palmer BE, Lozupone CA, Görg C. VOLARE: visual analysis of disease-associated microbiome-immune system interplay. BMC Bioinformatics 2019; 20:432. [PMID: 31429723 PMCID: PMC6701114 DOI: 10.1186/s12859-019-3021-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2019] [Accepted: 08/06/2019] [Indexed: 02/08/2023] Open
Abstract
Background Relationships between specific microbes and proper immune system development, composition, and function have been reported in a number of studies. However, researchers have discovered only a fraction of the likely relationships. “Omic” methodologies such as 16S ribosomal RNA (rRNA) sequencing and time-of-flight mass cytometry (CyTOF) immunophenotyping generate data that support generation of hypotheses, with the potential to identify additional relationships at a level of granularity ripe for further experimentation. Pairwise linear regressions between microbial and host immune features provide one approach for quantifying relationships between “omes”, and the differences in these relationships across study cohorts or arms. This approach yields a top table of candidate results. However, the top table alone lacks the detail that domain experts such as microbiologists and immunologists need to vet candidate results for follow-up experiments. Results To support this vetting, we developed VOLARE (Visualization Of LineAr Regression Elements), a web application that integrates a searchable top table, small in-line graphs illustrating the fitted models, a network summarizing the top table, and on-demand detailed regression plots showing full sample-level detail. We applied VOLARE to three case studies—microbiome:cytokine data from fecal samples in human immunodeficiency virus (HIV), microbiome:cytokine data in inflammatory bowel disease and spondyloarthritis, and microbiome:immune cell data from gut biopsies in HIV. We present both patient-specific phenomena and relationships that differ by disease state. We also analyzed interaction data from system logs to characterize usage scenarios. This log analysis revealed that users frequently generated detailed regression plots, suggesting that this detail aids the vetting of results. Conclusions Systematically integrating microbe:immune cell readouts through pairwise linear regressions and presenting the top table in an interactive environment supports the vetting of results for scientific relevance. VOLARE allows domain experts to control the analysis of their results, screening dozens of candidate relationships with ease. This interactive environment transcends the limitations of a static top table. Electronic supplementary material The online version of this article (10.1186/s12859-019-3021-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Janet C Siebert
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA. .,CytoAnalytics, Denver, CO, 80113, USA.
| | - Charles Preston Neff
- Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Jennifer M Schneider
- Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Emilie H Regner
- Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Neha Ohri
- Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Kristine A Kuhn
- Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Brent E Palmer
- Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Catherine A Lozupone
- Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | | |
Collapse
|
103
|
Lodise TP, Bonine NG, Ye JM, Folse HJ, Gillard P. Development of a bedside tool to predict the probability of drug-resistant pathogens among hospitalized adult patients with gram-negative infections. BMC Infect Dis 2019; 19:718. [PMID: 31412809 PMCID: PMC6694572 DOI: 10.1186/s12879-019-4363-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 08/06/2019] [Indexed: 01/27/2023] Open
Abstract
Background We developed a clinical bedside tool to simultaneously estimate the probabilities of third-generation cephalosporin-resistant Enterobacteriaceae (3GC-R), carbapenem-resistant Enterobacteriaceae (CRE), and multidrug-resistant Pseudomonas aeruginosa (MDRP) among hospitalized adult patients with Gram-negative infections. Methods Data were obtained from a retrospective observational study of the Premier Hospital that included hospitalized adult patients with a complicated urinary tract infection (cUTI), complicated intra-abdominal infection (cIAI), hospital-acquired/ventilator-associated pneumonia (HAP/VAP), or bloodstream infection (BSI) due to Gram-negative bacteria between 2011 and 2015. Risk factors for 3GC-R, CRE, and MDRP were ascertained by multivariate logistic regression, and separate models were developed for patients with community-acquired versus hospital-acquired infections for each resistance phenotype (N = 6). Models were converted to a singular user-friendly interface to estimate the probabilities of a patient having an infection due to 3GC-R, CRE, or MDRP when ≥ 1 risk factor was present. Results Overall, 124,068 patients contributed to the dataset. Percentages of patients admitted for cUTI, cIAI, HAP/VAP, and BSI were 61.6, 4.6, 16.5, and 26.4%, respectively (some patients contributed > 1 infection type). Resistant infection rates were 1.90% for CRE, 12.09% for 3GC-R, and 3.91% for MDRP. A greater percentage of the resistant infections were community-acquired relative to hospital-acquired (CRE, 1.30% vs 0.62% of 1.90%; 3GC-R, 9.27% vs 3.42% of 12.09%; MDRP, 2.39% vs 1.59% of 3.91%). The most important predictors of having an 3GC-R, CRE or MDRP infection were prior number of antibiotics; infection site; infection during the previous 3 months; and hospital prevalence of 3GC-R, CRE, or MDRP. To enable application of the six predictive multivariate logistic regression models to real-world clinical practice, we developed a user-friendly interface that estimates the risk of 3GC-R, CRE, and MDRP simultaneously in a given patient with a Gram-negative infection based on their risk (Additional file 1). Conclusions We developed a clinical prediction tool to estimate the probabilities of 3GC-R, CRE, and MDRP among hospitalized adult patients with confirmed community- and hospital-acquired Gram-negative infections. Our predictive model has been implemented as a user-friendly bedside tool for use by clinicians/healthcare professionals to predict the probability of resistant infections in individual patients, to guide early appropriate therapy. Electronic supplementary material The online version of this article (10.1186/s12879-019-4363-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Thomas P Lodise
- Albany College of Pharmacy and Health Sciences, Albany, NY, 12208-3492, USA.
| | | | | | | | | |
Collapse
|
104
|
|
105
|
Rappoport N, Shamir R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res 2019; 46:10546-10562. [PMID: 30295871 PMCID: PMC6237755 DOI: 10.1093/nar/gky889] [Citation(s) in RCA: 229] [Impact Index Per Article: 45.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Accepted: 09/20/2018] [Indexed: 12/18/2022] Open
Abstract
Recent high throughput experimental methods have been used to collect large biomedical omics datasets. Clustering of single omic datasets has proven invaluable for biological and medical research. The decreasing cost and development of additional high throughput methods now enable measurement of multi-omic data. Clustering multi-omic data has the potential to reveal further systems-level insights, but raises computational and biological challenges. Here, we review algorithms for multi-omics clustering, and discuss key issues in applying these algorithms. Our review covers methods developed specifically for omic data as well as generic multi-view methods developed in the machine learning community for joint clustering of multiple data types. In addition, using cancer data from TCGA, we perform an extensive benchmark spanning ten different cancer types, providing the first systematic comparison of leading multi-omics and multi-view clustering algorithms. The results highlight key issues regarding the use of single- versus multi-omics, the choice of clustering strategy, the power of generic multi-view methods and the use of approximated p-values for gauging solution quality. Due to the growing use of multi-omics data, we expect these issues to be important for future progress in the field.
Collapse
Affiliation(s)
- Nimrod Rappoport
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
106
|
Zampieri G, Vijayakumar S, Yaneske E, Angione C. Machine and deep learning meet genome-scale metabolic modeling. PLoS Comput Biol 2019; 15:e1007084. [PMID: 31295267 PMCID: PMC6622478 DOI: 10.1371/journal.pcbi.1007084] [Citation(s) in RCA: 150] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Omic data analysis is steadily growing as a driver of basic and applied molecular biology research. Core to the interpretation of complex and heterogeneous biological phenotypes are computational approaches in the fields of statistics and machine learning. In parallel, constraint-based metabolic modeling has established itself as the main tool to investigate large-scale relationships between genotype, phenotype, and environment. The development and application of these methodological frameworks have occurred independently for the most part, whereas the potential of their integration for biological, biomedical, and biotechnological research is less known. Here, we describe how machine learning and constraint-based modeling can be combined, reviewing recent works at the intersection of both domains and discussing the mathematical and practical aspects involved. We overlap systematic classifications from both frameworks, making them accessible to nonexperts. Finally, we delineate potential future scenarios, propose new joint theoretical frameworks, and suggest concrete points of investigation for this joint subfield. A multiview approach merging experimental and knowledge-driven omic data through machine learning methods can incorporate key mechanistic information in an otherwise biologically-agnostic learning process.
Collapse
Affiliation(s)
- Guido Zampieri
- Department of Computer Science and Information Systems, Teesside University, Middlesbrough, United Kingdom
| | - Supreeta Vijayakumar
- Department of Computer Science and Information Systems, Teesside University, Middlesbrough, United Kingdom
| | - Elisabeth Yaneske
- Department of Computer Science and Information Systems, Teesside University, Middlesbrough, United Kingdom
| | - Claudio Angione
- Department of Computer Science and Information Systems, Teesside University, Middlesbrough, United Kingdom
- Healthcare Innovation Centre, Teesside University, Middlesbrough, United Kingdom
| |
Collapse
|
107
|
Predicting the decision making chemicals used for bacterial growth. Sci Rep 2019; 9:7251. [PMID: 31076576 PMCID: PMC6510730 DOI: 10.1038/s41598-019-43587-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 04/24/2019] [Indexed: 01/01/2023] Open
Abstract
Predicting the contribution of media components to bacterial growth was first initiated by introducing machine learning to high-throughput growth assays. A total of 1336 temporal growth records corresponding to 225 different media, which were composed of 13 chemical components, were generated. The growth rate and saturated density of each growth curve were automatically calculated with the newly developed data processing program. To identify the decision making factors related to growth among the 13 chemicals, big datasets linking the growth parameters to the chemical combinations were subjected to decision tree learning. The results showed that the only carbon source, glucose, determined bacterial growth, but it was not the first priority. Instead, the top decision making chemicals in relation to the growth rate and saturated density were ammonium and ferric ions, respectively. Three chemical components (NH4+, Mg2+ and glucose) commonly appeared in the decision trees of the growth rate and saturated density, but they exhibited different mechanisms. The concentration ranges for fast growth and high density were overlapped for glucose but distinguished for NH4+ and Mg2+. The results suggested that these chemicals were crucial in determining the growth speed and growth maximum in either a universal use or a trade-off manner. This differentiation might reflect the diversity in the resource allocation mechanisms for growth priority depending on the environmental restrictions. This study provides a representative example for clarifying the contribution of the environment to population dynamics through an innovative viewpoint of employing modern data science within traditional microbiology to obtain novel findings.
Collapse
|
108
|
Luo L, Hudson LG, Lewis J, Lee JH. Two-step approach for assessing the health effects of environmental chemical mixtures: application to simulated datasets and real data from the Navajo Birth Cohort Study. Environ Health 2019; 18:46. [PMID: 31072361 PMCID: PMC6507239 DOI: 10.1186/s12940-019-0482-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 04/16/2019] [Indexed: 05/07/2023]
Abstract
BACKGROUND There is increasing interest in examining the consequences of simultaneous exposures to chemical mixtures. However, a consensus or recommendations on how to appropriately select the statistical approach analyzing the health effects of mixture exposures which best aligns with study goals has not been well established. We recognize the limitations that existing methods have in effectively reducing data dimension and detecting interaction effects when analyzing chemical mixture exposures collected in high dimensional datasets with varying degrees of variable intercorrelations. In this research, we aim to examine the performance of a two-step statistical approach in addressing the analytical challenges of chemical mixture exposures using two simulated data sets, and an existing data set from the Navajo Birth Cohort Study as a representative case study. METHODS We propose to use a two-step approach: a robust variable selection step using the random forest approach followed by adaptive lasso methods that incorporate both dimensionality reduction and quantification of the degree of association between the chemical exposures and the outcome of interest, including interaction terms. We compared the proposed method with other approaches including (1) single step adaptive lasso; and (2) two-step Classification and regression trees (CART) followed by adaptive lasso method. RESULTS Utilizing simulated data sets and applying the method to a real-life dataset from the Navajo Birth Cohort Study, we have demonstrated good performance of the proposed two-step approach. Results from the simulation datasets indicated the effectiveness of variable dimension reduction and reliable identification of a parsimonious model compared to other methods: single-step adaptive lasso or two-step CART followed by adaptive lasso method. CONCLUSIONS Our proposed two-step approach provides a robust way of analyzing the effects of high-throughput chemical mixture exposures on health outcomes by combining the strengths of variable selection and adaptive shrinkage strategies.
Collapse
Affiliation(s)
- Li Luo
- Department of Internal Medicine, MSC10-5550, 1 University of New Mexico, Albuquerque, NM, 87131, USA.
- University of New Mexico Comprehensive Cancer Center, Albuquerque, NM, USA.
| | - Laurie G Hudson
- Department of Pharmaceutical Sciences, College of Pharmacy, University of New Mexico, Albuquerque, NM, USA
| | - Johnnye Lewis
- Community Environmental Health Program, College of Pharmacy, University of New Mexico, Albuquerque, NM, USA
| | - Ji-Hyun Lee
- Department of Internal Medicine, MSC10-5550, 1 University of New Mexico, Albuquerque, NM, 87131, USA
- University of New Mexico Comprehensive Cancer Center, Albuquerque, NM, USA
- Present Address: Division of Quantitative Sciences, University of Florida Health Cancer Center; Department of Biostatistics, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
109
|
Azuaje F. Artificial intelligence for precision oncology: beyond patient stratification. NPJ Precis Oncol 2019; 3:6. [PMID: 30820462 PMCID: PMC6389974 DOI: 10.1038/s41698-019-0078-1] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Accepted: 01/22/2019] [Indexed: 12/18/2022] Open
Abstract
The data-driven identification of disease states and treatment options is a crucial challenge for precision oncology. Artificial intelligence (AI) offers unique opportunities for enhancing such predictive capabilities in the lab and the clinic. AI, including its best-known branch of research, machine learning, has significant potential to enable precision oncology well beyond relatively well-known pattern recognition applications, such as the supervised classification of single-source omics or imaging datasets. This perspective highlights key advances and challenges in that direction. Furthermore, it argues that AI's scope and depth of research need to be expanded to achieve ground-breaking progress in precision oncology.
Collapse
Affiliation(s)
- Francisco Azuaje
- Bioinformatics and Modelling Research Group, Department of Oncology, Luxembourg Institute of Health (LIH), L-1445 Strassen, Luxembourg
- Present Address: Computational Biomedicine Research Group, Center for Quantitative Biology, Luxembourg Institute of Health (LIH), L-1445 Strassen, Luxembourg
| |
Collapse
|
110
|
Siebert JC, Görg C, Palmer B, Lozupone C. Visualizing microbiome-immune system interplay. Immunotherapy 2019; 11:63-67. [PMID: 30730269 PMCID: PMC6354219 DOI: 10.2217/imt-2018-0138] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Accepted: 10/23/2018] [Indexed: 12/21/2022] Open
Affiliation(s)
- Janet C Siebert
- School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- CytoAnalytics, Denver, CO 80113, USA
| | - Carsten Görg
- School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Brent Palmer
- School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Catherine Lozupone
- School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| |
Collapse
|
111
|
Mirza B, Wang W, Wang J, Choi H, Chung NC, Ping P. Machine Learning and Integrative Analysis of Biomedical Big Data. Genes (Basel) 2019; 10:E87. [PMID: 30696086 PMCID: PMC6410075 DOI: 10.3390/genes10020087] [Citation(s) in RCA: 153] [Impact Index Per Article: 30.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 01/08/2019] [Accepted: 01/21/2019] [Indexed: 12/11/2022] Open
Abstract
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues.
Collapse
Affiliation(s)
- Bilal Mirza
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Wei Wang
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Scalable Analytics Institute (ScAi), University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Jie Wang
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Howard Choi
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
| | - Neo Christopher Chung
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland.
| | - Peipei Ping
- NIH BD2K Center of Excellence for Biomedical Computing, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Physiology, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Scalable Analytics Institute (ScAi), University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Bioinformatics, University of California Los Angeles, Los Angeles, CA 90095, USA.
- Department of Medicine (Cardiology), University of California Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
112
|
Wu C, Zhou F, Ren J, Li X, Jiang Y, Ma S. A Selective Review of Multi-Level Omics Data Integration Using Variable Selection. High Throughput 2019; 8:E4. [PMID: 30669303 PMCID: PMC6473252 DOI: 10.3390/ht8010004] [Citation(s) in RCA: 114] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 12/24/2018] [Accepted: 01/10/2019] [Indexed: 01/02/2023] Open
Abstract
High-throughput technologies have been used to generate a large amount of omics data. In the past, single-level analysis has been extensively conducted where the omics measurements at different levels, including mRNA, microRNA, CNV and DNA methylation, are analyzed separately. As the molecular complexity of disease etiology exists at all different levels, integrative analysis offers an effective way to borrow strength across multi-level omics data and can be more powerful than single level analysis. In this article, we focus on reviewing existing multi-omics integration studies by paying special attention to variable selection methods. We first summarize published reviews on integrating multi-level omics data. Next, after a brief overview on variable selection methods, we review existing supervised, semi-supervised and unsupervised integrative analyses within parallel and hierarchical integration studies, respectively. The strength and limitations of the methods are discussed in detail. No existing integration method can dominate the rest. The computation aspects are also investigated. The review concludes with possible limitations and future directions for multi-level omics data integration.
Collapse
Affiliation(s)
- Cen Wu
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA.
| | - Fei Zhou
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA.
| | - Jie Ren
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA.
| | - Xiaoxi Li
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA.
| | - Yu Jiang
- Division of Epidemiology, Biostatistics and Environmental Health, School of Public Health, University of Memphis, Memphis, TN 38152, USA.
| | - Shuangge Ma
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT 06510, USA.
| |
Collapse
|
113
|
Zhang L, Yu G, Xia D, Wang J. Protein–protein interactions prediction based on ensemble deep neural networks. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.02.097] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
114
|
Fürtauer L, Pschenitschnigg A, Scharkosi H, Weckwerth W, Nägele T. Combined multivariate analysis and machine learning reveals a predictive module of metabolic stress response in Arabidopsis thaliana. Mol Omics 2018; 14:437-449. [PMID: 30387490 PMCID: PMC6289107 DOI: 10.1039/c8mo00095f] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 08/24/2018] [Indexed: 12/25/2022]
Abstract
Abiotic stress exposure of plants induces metabolic reprogramming which is tightly regulated by signalling cascades connecting transcriptional with translational and metabolic regulation. Complexity of such interconnected metabolic networks impedes the functional understanding of molecular plant stress response compromising the design of breeding strategies and biotechnological processes. Thus, defining a molecular network to enable the prediction of a plant's stress mode will improve the understanding of stress responsive biochemical regulation and will yield novel molecular targets for technological application. Arabidopsis wild type plants and two mutant lines with deficiency in sucrose or starch metabolism were grown under ambient and combined cold/high light stress conditions. Stress-induced dynamics of the primary metabolome and the proteome were quantified by mass spectrometry. Wild type data were used to train a machine learning algorithm to classify mutant lines under control and stress conditions. Multivariate analysis and classification identified a module consisting of 23 proteins enabling the reliable prediction of combined temperature/high light stress conditions. 18 of these 23 proteins displayed putative protein-protein interactions connecting transcriptional regulation with regulation of primary and secondary metabolism. The identified stress-responsive core module supports prediction of complex biochemical regulation under changing environmental conditions.
Collapse
Affiliation(s)
- Lisa Fürtauer
- Ludwig-Maximilians-Universität München
, Department Biology I
, Plant Evolutionary Cell Biology
,
Großhadernerstr. 2-4
, D-82152 Planegg-Martinsried
, Germany
.
; Fax: +49-89-2180-74661
; Tel: +49-89-2180-74660
| | - Alice Pschenitschnigg
- Department of Ecogenomics and Systems Biology
, University of Vienna
,
Vienna
, Austria
| | - Helene Scharkosi
- Department of Ecogenomics and Systems Biology
, University of Vienna
,
Vienna
, Austria
| | - Wolfram Weckwerth
- Department of Ecogenomics and Systems Biology
, University of Vienna
,
Vienna
, Austria
- Vienna Metabolomics Center
, University of Vienna
,
Vienna
, Austria
| | - Thomas Nägele
- Ludwig-Maximilians-Universität München
, Department Biology I
, Plant Evolutionary Cell Biology
,
Großhadernerstr. 2-4
, D-82152 Planegg-Martinsried
, Germany
.
; Fax: +49-89-2180-74661
; Tel: +49-89-2180-74660
| |
Collapse
|
115
|
Stein-O'Brien GL, Arora R, Culhane AC, Favorov AV, Garmire LX, Greene CS, Goff LA, Li Y, Ngom A, Ochs MF, Xu Y, Fertig EJ. Enter the Matrix: Factorization Uncovers Knowledge from Omics. Trends Genet 2018; 34:790-805. [PMID: 30143323 PMCID: PMC6309559 DOI: 10.1016/j.tig.2018.07.003] [Citation(s) in RCA: 111] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 06/01/2018] [Accepted: 07/16/2018] [Indexed: 12/20/2022]
Abstract
Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.
Collapse
Affiliation(s)
- Genevieve L Stein-O'Brien
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA; Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Raman Arora
- Department of Computer Science, Institute for Data Intensive Engineering and Science, Johns Hopkins University, Baltimore, MD, USA
| | - Aedin C Culhane
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Alexander V Favorov
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA; Vavilov Institute of General Genetics, Moscow, Russia
| | | | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, PA, USA; Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, PA, USA
| | - Loyal A Goff
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Yifeng Li
- Digital Technologies Research Centre, National Research Council of Canada, Ottawa, ON, Canada
| | - Aloune Ngom
- School of Computer Science, University of Windsor, Windsor, ON, Canada
| | - Michael F Ochs
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ, USA
| | - Yanxun Xu
- Department of Applied Mathematics and Statistics, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Elana J Fertig
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
116
|
Li Y, Bie R, Teran Hidalgo SJ, Qin Y, Wu M, Ma S. Assisted gene expression-based clustering with AWNCut. Stat Med 2018; 37:4386-4403. [PMID: 30094873 DOI: 10.1002/sim.7928] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2018] [Revised: 05/15/2018] [Accepted: 07/05/2018] [Indexed: 01/06/2023]
Abstract
In the research on complex diseases, gene expression (GE) data have been extensively used for clustering samples. The clusters so generated can serve as the basis for disease subtype identification, risk stratification, and many other purposes. With the small sample sizes of genetic profiling studies and noisy nature of GE data, clustering analysis results are often unsatisfactory. In the most recent studies, a prominent trend is to conduct multidimensional profiling, which collects data on GEs and their regulators (copy number alterations, microRNAs, methylation, etc.) on the same subjects. With the regulation relationships, regulators contain important information on the properties of GEs. We develop a novel assisted clustering method, which effectively uses regulator information to improve clustering analysis using GE data. To account for the fact that not all GEs are informative, we propose a weighted strategy, where the weights are determined data-dependently and can discriminate informative GEs from noises. The proposed method is built on the NCut technique and effectively realized using a simulated annealing algorithm. Simulations demonstrate that it can well outperform multiple direct competitors. In the analysis of TCGA cutaneous melanoma and lung adenocarcinoma data, biologically sensible findings different from the alternatives are made.
Collapse
Affiliation(s)
- Yang Li
- Center for Applied Statistics, Renmin University of China, Beijing, China.,School of Statistics, Renmin University of China, Beijing, China
| | - Ruofan Bie
- School of Statistics, Renmin University of China, Beijing, China
| | | | - Yichen Qin
- Department of Operations, Business Analytics, and Information Systems, University of Cincinnati, Cincinnati, Ohio
| | - Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China.,Department of Biostatistics, Yale University, New Haven, Connecticut
| | - Shuangge Ma
- School of Statistics, Renmin University of China, Beijing, China.,Department of Biostatistics, Yale University, New Haven, Connecticut
| |
Collapse
|
117
|
Glaab E. Computational systems biology approaches for Parkinson's disease. Cell Tissue Res 2018; 373:91-109. [PMID: 29185073 PMCID: PMC6015628 DOI: 10.1007/s00441-017-2734-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 11/06/2017] [Indexed: 12/26/2022]
Abstract
Parkinson's disease (PD) is a prime example of a complex and heterogeneous disorder, characterized by multifaceted and varied motor- and non-motor symptoms and different possible interplays of genetic and environmental risk factors. While investigations of individual PD-causing mutations and risk factors in isolation are providing important insights to improve our understanding of the molecular mechanisms behind PD, there is a growing consensus that a more complete understanding of these mechanisms will require an integrative modeling of multifactorial disease-associated perturbations in molecular networks. Identifying and interpreting the combinatorial effects of multiple PD-associated molecular changes may pave the way towards an earlier and reliable diagnosis and more effective therapeutic interventions. This review provides an overview of computational systems biology approaches developed in recent years to study multifactorial molecular alterations in complex disorders, with a focus on PD research applications. Strengths and weaknesses of different cellular pathway and network analyses, and multivariate machine learning techniques for investigating PD-related omics data are discussed, and strategies proposed to exploit the synergies of multiple biological knowledge and data sources. A final outlook provides an overview of specific challenges and possible next steps for translating systems biology findings in PD to new omics-based diagnostic tools and targeted, drug-based therapeutic approaches.
Collapse
Affiliation(s)
- Enrico Glaab
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 7 avenue des Hauts Fourneaux, L-4362, Esch-sur-Alzette, Luxembourg.
| |
Collapse
|
118
|
Choy G, Khalilzadeh O, Michalski M, Do S, Samir AE, Pianykh OS, Geis JR, Pandharipande PV, Brink JA, Dreyer KJ. Current Applications and Future Impact of Machine Learning in Radiology. Radiology 2018; 288:318-328. [PMID: 29944078 DOI: 10.1148/radiol.2018171820] [Citation(s) in RCA: 434] [Impact Index Per Article: 72.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Recent advances and future perspectives of machine learning techniques offer promising applications in medical imaging. Machine learning has the potential to improve different steps of the radiology workflow including order scheduling and triage, clinical decision support systems, detection and interpretation of findings, postprocessing and dose estimation, examination quality control, and radiology reporting. In this article, the authors review examples of current applications of machine learning and artificial intelligence techniques in diagnostic radiology. In addition, the future impact and natural extension of these techniques in radiology practice are discussed.
Collapse
Affiliation(s)
- Garry Choy
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Omid Khalilzadeh
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Mark Michalski
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Synho Do
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Anthony E Samir
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Oleg S Pianykh
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - J Raymond Geis
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Pari V Pandharipande
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - James A Brink
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| | - Keith J Dreyer
- From the Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit St, Boston, Mass 02114 (G.C., O.K., M.M., S.D., A.E.S., O.S.P., P.V.P., J.A.B., K.J.D.); and Department of Radiology, University of Colorado School of Medicine, Aurora, Colo (J.R.G.)
| |
Collapse
|
119
|
Li Y, Shi W, Wasserman WW. Genome-wide prediction of cis-regulatory regions using supervised deep learning methods. BMC Bioinformatics 2018; 19:202. [PMID: 29855387 PMCID: PMC5984344 DOI: 10.1186/s12859-018-2187-1] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Accepted: 05/04/2018] [Indexed: 01/07/2023] Open
Abstract
Background In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Results Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). Conclusion The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations. Electronic supplementary material The online version of this article (10.1186/s12859-018-2187-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yifeng Li
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Department of Medical Genetics, University of British Columbia, Rm 3109, 950 West 28th Avenue, Vancouver, V5Z 4H4, Canada.,Digital Technologies Research Centre, National Research Council Canada, Building M-50, 1200 Montreal Road, Ottawa, K1A 0R6, Canada
| | - Wenqiang Shi
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Department of Medical Genetics, University of British Columbia, Rm 3109, 950 West 28th Avenue, Vancouver, V5Z 4H4, Canada
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Department of Medical Genetics, University of British Columbia, Rm 3109, 950 West 28th Avenue, Vancouver, V5Z 4H4, Canada.
| |
Collapse
|
120
|
Hameed PN, Verspoor K, Kusljic S, Halgamuge S. A two-tiered unsupervised clustering approach for drug repositioning through heterogeneous data integration. BMC Bioinformatics 2018; 19:129. [PMID: 29642848 PMCID: PMC5896044 DOI: 10.1186/s12859-018-2123-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Accepted: 03/21/2018] [Indexed: 01/02/2023] Open
Abstract
Background Drug repositioning is the process of identifying new uses for existing drugs. Computational drug repositioning methods can reduce the time, costs and risks of drug development by automating the analysis of the relationships in pharmacology networks. Pharmacology networks are large and heterogeneous. Clustering drugs into small groups can simplify large pharmacology networks, these subgroups can also be used as a starting point for repositioning drugs. In this paper, we propose a two-tiered drug-centric unsupervised clustering approach for drug repositioning, integrating heterogeneous drug data profiles: drug-chemical, drug-disease, drug-gene, drug-protein and drug-side effect relationships. Results The proposed drug repositioning approach is threefold; (i) clustering drugs based on their homogeneous profiles using the Growing Self Organizing Map (GSOM); (ii) clustering drugs based on drug-drug relation matrices based on the previous step, considering three state-of-the-art graph clustering methods; and (iii) inferring drug repositioning candidates and assigning a confidence value for each identified candidate. In this paper, we compare our two-tiered clustering approach against two existing heterogeneous data integration approaches with reference to the Anatomical Therapeutic Chemical (ATC) classification, using GSOM. Our approach yields Normalized Mutual Information (NMI) and Standardized Mutual Information (SMI) of 0.66 and 36.11, respectively, while the two existing methods yield NMI of 0.60 and 0.64 and SMI of 22.26 and 33.59. Moreover, the two existing approaches failed to produce useful cluster separations when using graph clustering algorithms while our approach is able to identify useful clusters for drug repositioning. Furthermore, we provide clinical evidence for four predicted results (Chlorthalidone, Indomethacin, Metformin and Thioridazine) to support that our proposed approach can be reliably used to infer ATC code and drug repositioning. Conclusion The proposed two-tiered unsupervised clustering approach is suitable for drug clustering and enables heterogeneous data integration. It also enables identifying reliable repositioning drug candidates with reference to ATC therapeutic classification. The repositioning drug candidates identified consistently by multiple clustering algorithms and with high confidence have a higher possibility of being effective repositioning candidates. Electronic supplementary material The online version of this article (10.1186/s12859-018-2123-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pathima Nusrath Hameed
- Department of Mechanical Engineering, University of Melbourne, Parkville, Melbourne, 3010, Australia. .,Data61, Victoria Research Lab, West Melbourne, 3003, Australia. .,Department of Computer Science, University of Ruhuna, Matara, 81000, Sri Lanka.
| | - Karin Verspoor
- Department of Computing and Information Systems, University of Melbourne, Parkville, Melbourne, 3010, Australia
| | - Snezana Kusljic
- Department of Nursing, University of Melbourne, Parkville, Melbourne, 3010, Australia.,The Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, Melbourne, 3010, Australia
| | - Saman Halgamuge
- Research School of Engineering, College of Engineering & Computer Science, The Australian National University, Canberra, ACT, 2601, Australia
| |
Collapse
|
121
|
|
122
|
Ezzat A, Wu M, Li XL, Kwoh CK. Computational prediction of drug–target interactions using chemogenomic approaches: an empirical survey. Brief Bioinform 2018; 20:1337-1357. [DOI: 10.1093/bib/bby002] [Citation(s) in RCA: 117] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Revised: 12/21/2017] [Indexed: 01/18/2023] Open
Abstract
Abstract
Computational prediction of drug–target interactions (DTIs) has become an essential task in the drug discovery process. It narrows down the search space for interactions by suggesting potential interaction candidates for validation via wet-lab experiments that are well known to be expensive and time-consuming. In this article, we aim to provide a comprehensive overview and empirical evaluation on the computational DTI prediction techniques, to act as a guide and reference for our fellow researchers. Specifically, we first describe the data used in such computational DTI prediction efforts. We then categorize and elaborate the state-of-the-art methods for predicting DTIs. Next, an empirical comparison is performed to demonstrate the prediction performance of some representative methods under different scenarios. We also present interesting findings from our evaluation study, discussing the advantages and disadvantages of each method. Finally, we highlight potential avenues for further enhancement of DTI prediction performance as well as related research directions.
Collapse
|
123
|
Chen J, Schwarz E. The role of blood-based biomarkers in advancing personalized therapy of schizophrenia. EXPERT REVIEW OF PRECISION MEDICINE AND DRUG DEVELOPMENT 2017. [DOI: 10.1080/23808993.2017.1400906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Junfang Chen
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| |
Collapse
|
124
|
Identification of candidate drugs using tensor-decomposition-based unsupervised feature extraction in integrated analysis of gene expression between diseases and DrugMatrix datasets. Sci Rep 2017; 7:13733. [PMID: 29062063 PMCID: PMC5653784 DOI: 10.1038/s41598-017-13003-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Accepted: 09/13/2017] [Indexed: 01/28/2023] Open
Abstract
Identifying drug target genes in gene expression profiles is not straightforward. Because a drug targets proteins and not mRNAs, the mRNA expression of drug target genes is not always altered. In addition, the interaction between a drug and protein can be context dependent; this means that simple drug incubation experiments on cell lines do not always reflect the real situation during active disease. In this paper, I applied tensor-decomposition-based unsupervised feature extraction to the integrated analysis using a mathematical product of gene expression in various diseases and gene expression in the DrugMatrix dataset, where comprehensive data on gene expression during various drug treatments of rats are reported. I found that this strategy, in a fully unsupervised manner, enables researchers to identify a combined set of genes and compounds that significantly overlap with gene and drug interactions identified in the past. As an example illustrating the usefulness of this strategy in drug discovery experiments, I considered cirrhosis, for which no effective drugs have ever been proposed. The present strategy identified two promising therapeutic-target genes, CYPOR and HNFA4; for their protein products, bezafibrate was identified as a promising candidate drug, supported by in silico docking analysis.
Collapse
|
125
|
Taguchi YH. Tensor decomposition-based unsupervised feature extraction applied to matrix products for multi-view data processing. PLoS One 2017; 12:e0183933. [PMID: 28841719 PMCID: PMC5571984 DOI: 10.1371/journal.pone.0183933] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Accepted: 08/04/2017] [Indexed: 01/17/2023] Open
Abstract
In the current era of big data, the amount of data available is continuously increasing. Both the number and types of samples, or features, are on the rise. The mixing of distinct features often makes interpretation more difficult. However, separate analysis of individual types requires subsequent integration. A tensor is a useful framework to deal with distinct types of features in an integrated manner without mixing them. On the other hand, tensor data is not easy to obtain since it requires the measurements of huge numbers of combinations of distinct features; if there are m kinds of features, each of which has N dimensions, the number of measurements needed are as many as Nm, which is often too large to measure. In this paper, I propose a new method where a tensor is generated from individual features without combinatorial measurements, and the generated tensor was decomposed back to matrices, by which unsupervised feature extraction was performed. In order to demonstrate the usefulness of the proposed strategy, it was applied to synthetic data, as well as three omics datasets. It outperformed other matrix-based methodologies.
Collapse
Affiliation(s)
- Y-h. Taguchi
- Department of Physics, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan
- * E-mail:
| |
Collapse
|
126
|
Sato M, Kawana K, Adachi K, Fujimoto A, Yoshida M, Nakamura H, Nishida H, Inoue T, Taguchi A, Ogishima J, Eguchi S, Yamashita A, Tomio K, Wada-Hiraike O, Oda K, Nagamatsu T, Osuga Y, Fujii T. Intracellular signaling entropy can be a biomarker for predicting the development of cervical intraepithelial neoplasia. PLoS One 2017; 12:e0176353. [PMID: 28453530 PMCID: PMC5409150 DOI: 10.1371/journal.pone.0176353] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Accepted: 04/10/2017] [Indexed: 01/06/2023] Open
Abstract
While the mortality rates for cervical cancer have been drastically reduced after the introduction of the Pap smear test, it still is one of the leading causes of death in women worldwide. Additionally, studies that appropriately evaluate the risk of developing cervical lesions are needed. Therefore, we investigated whether intracellular signaling entropy, which is measured with microarray data, could be useful for predicting the risks of developing cervical lesions. We used three datasets, GSE63514 (histology), GSE27678 (cytology) and GSE75132 (cytology, a prospective study). From the data in GSE63514, the entropy rate was significantly increased with disease progression (normal < cervical intraepithelial neoplasia, CIN < cancer) (Kruskal-Wallis test, p < 0.0001). From the data in GSE27678, similar results (normal < low-grade squamous intraepithelial lesions, LSILs < high-grade squamous intraepithelial lesions, HSILs ≤ cancer) were obtained (Kruskal-Wallis test, p < 0.001). From the data in GSE75132, the entropy rate tended to be higher in the HPV-persistent groups than the HPV-negative group. The group that was destined to progress to CIN 3 or higher had a tendency to have a higher entropy rate than the HPV16-positive without progression group. In conclusion, signaling entropy was suggested to be different for different lesion statuses and could be a useful biomarker for predicting the development of cervical intraepithelial neoplasia.
Collapse
Affiliation(s)
- Masakazu Sato
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Kei Kawana
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
- Department of Obstetrics and Gynecology, School of Medicine, Nihon University, Itabashi-ku, Tokyo, Japan
- * E-mail:
| | - Katsuyuki Adachi
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Asaha Fujimoto
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Mitsuyo Yoshida
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Hiroe Nakamura
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Haruka Nishida
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Tomoko Inoue
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Ayumi Taguchi
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Juri Ogishima
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Satoko Eguchi
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Aki Yamashita
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Kensuke Tomio
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Osamu Wada-Hiraike
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Katsutoshi Oda
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Takeshi Nagamatsu
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Yutaka Osuga
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Tomoyuki Fujii
- Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| |
Collapse
|