Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ray B, Henaff M, Ma S, Efstathiadis E, Peskin ER, Picone M, Poli T, Aliferis CF, Statnikov A. Information content and analysis methods for multi-modal high-throughput biomedical data. Sci Rep 2014;4:4411. [PMID: 24651673 PMCID: PMC3961740 DOI: 10.1038/srep04411] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Accepted: 02/27/2014] [Indexed: 01/30/2023] Open

For:	Ray B, Henaff M, Ma S, Efstathiadis E, Peskin ER, Picone M, Poli T, Aliferis CF, Statnikov A. Information content and analysis methods for multi-modal high-throughput biomedical data. Sci Rep 2014;4:4411. [PMID: 24651673 PMCID: PMC3961740 DOI: 10.1038/srep04411] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Accepted: 02/27/2014] [Indexed: 01/30/2023] Open

Number

Cited by Other Article(s)

Qin X, Lock TR, Kallenbach RL. DA: Population structure inference using discriminant analysis. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13748] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

A mechanism-aware and multiomic machine-learning pipeline characterizes yeast cell growth. Proc Natl Acad Sci U S A 2020;117:18869-18879. [PMID: 32675233 DOI: 10.1073/pnas.2002959117] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Abstract

Metabolic modeling and machine learning are key components in the emerging next generation of systems and synthetic biology tools, targeting the genotype-phenotype-environment relationship. Rather than being used in isolation, it is becoming clear that their value is maximized when they are combined. However, the potential of integrating these two frameworks for omic data augmentation and integration is largely unexplored. We propose, rigorously assess, and compare machine-learning-based data integration techniques, combining gene expression profiles with computationally generated metabolic flux data to predict yeast cell growth. To this end, we create strain-specific metabolic models for 1,143 Saccharomyces cerevisiae mutants and we test 27 machine-learning methods, incorporating state-of-the-art feature selection and multiview learning approaches. We propose a multiview neural network using fluxomic and transcriptomic data, showing that the former increases the predictive accuracy of the latter and reveals functional patterns that are not directly deducible from gene expression alone. We test the proposed neural network on a further 86 strains generated in a different experiment, therefore verifying its robustness to an additional independent dataset. Finally, we show that introducing mechanistic flux features improves the predictions also for knockout strains whose genes were not modeled in the metabolic reconstruction. Our results thus demonstrate that fusing experimental cues with in silico models, based on known biochemistry, can contribute with disjoint information toward biologically informed and interpretable machine learning. Overall, this study provides tools for understanding and manipulating complex phenotypes, increasing both the prediction accuracy and the extent of discernible mechanistic biological insights.

Collapse

Nguyen LH, Holmes S. Ten quick tips for effective dimensionality reduction. PLoS Comput Biol 2019;15:e1006907. [PMID: 31220072 PMCID: PMC6586259 DOI: 10.1371/journal.pcbi.1006907] [Citation(s) in RCA: 97] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Mitra S, Saha S. A multiobjective multi-view cluster ensemble technique: Application in patient subclassification. PLoS One 2019;14:e0216904. [PMID: 31120942 PMCID: PMC6533037 DOI: 10.1371/journal.pone.0216904] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2019] [Accepted: 04/30/2019] [Indexed: 11/21/2022] Open

Krahe MA, Toohey J, Wolski M, Scuffham PA, Reilly S. Research data management in practice: Results from a cross-sectional survey of health and medical researchers from an academic institution in Australia. HEALTH INF MANAG J 2019;49:108-116. [DOI: 10.1177/1833358319831318] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Athreya A, Iyer R, Neavin D, Wang L, Weinshilboum R, Kaddurah-Daouk R, Rush J, Frye M, Bobo W. Augmentation of Physician Assessments with Multi-Omics Enhances Predictability of Drug Response: A Case Study of Major Depressive Disorder. IEEE COMPUT INTELL M 2018;13:20-31. [PMID: 30467458 DOI: 10.1109/mci.2018.2840660] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Galatzer-Levy IR, Ruggles K, Chen Z. Data Science in the Research Domain Criteria Era: Relevance of Machine Learning to the Study of Stress Pathology, Recovery, and Resilience. CHRONIC STRESS (THOUSAND OAKS, CALIF.) 2018;2:247054701774755. [PMID: 29527592 PMCID: PMC5841258 DOI: 10.1177/2470547017747553] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Ray B, Liu W, Fenyö D. Adaptive Multiview Nonnegative Matrix Factorization Algorithm for Integration of Multimodal Biomedical Data. Cancer Inform 2017;16:1176935117725727. [PMID: 28835735 PMCID: PMC5564898 DOI: 10.1177/1176935117725727] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 07/08/2017] [Indexed: 11/16/2022] Open

Abstract

The amounts and types of available multimodal tumor data are rapidly increasing, and their integration is critical for fully understanding the underlying cancer biology and personalizing treatment. However, the development of methods for effectively integrating multimodal data in a principled manner is lagging behind our ability to generate the data. In this article, we introduce an extension to a multiview nonnegative matrix factorization algorithm (NNMF) for dimensionality reduction and integration of heterogeneous data types and compare the predictive modeling performance of the method on unimodal and multimodal data. We also present a comparative evaluation of our novel multiview approach and current data integration methods. Our work provides an efficient method to extend an existing dimensionality reduction method. We report rigorous evaluation of the method on large-scale quantitative protein and phosphoprotein tumor data from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) acquired using state-of-the-art liquid chromatography mass spectrometry. Exome sequencing and RNA-Seq data were also available from The Cancer Genome Atlas for the same tumors. For unimodal data, in case of breast cancer, transcript levels were most predictive of estrogen and progesterone receptor status and copy number variation of human epidermal growth factor receptor 2 status. For ovarian and colon cancers, phosphoprotein and protein levels were most predictive of tumor grade and stage and residual tumor, respectively. When multiview NNMF was applied to multimodal data to predict outcomes, the improvement in performance is not overall statistically significant beyond unimodal data, suggesting that proteomics data may contain more predictive information regarding tumor phenotypes than transcript levels, probably due to the fact that proteins are the functional gene products and therefore a more direct measurement of the functional state of the tumor. Here, we have applied our proposed approach to multimodal molecular data for tumors, but it is generally applicable to dimensionality reduction and joint analysis of any type of multimodal data.

Collapse

Saxe GN, Ma S, Ren J, Aliferis C. Machine learning methods to predict child posttraumatic stress: a proof of concept study. BMC Psychiatry 2017;17:223. [PMID: 28689495 PMCID: PMC5502325 DOI: 10.1186/s12888-017-1384-1] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Accepted: 06/09/2017] [Indexed: 02/08/2023] Open

Abstract

BACKGROUND

The care of traumatized children would benefit significantly from accurate predictive models for Posttraumatic Stress Disorder (PTSD), using information available around the time of trauma. Machine Learning (ML) computational methods have yielded strong results in recent applications across many diseases and data types, yet they have not been previously applied to childhood PTSD. Since these methods have not been applied to this complex and debilitating disorder, there is a great deal that remains to be learned about their application. The first step is to prove the concept: Can ML methods - as applied in other fields - produce predictive classification models for childhood PTSD? Additionally, we seek to determine if specific variables can be identified - from the aforementioned predictive classification models - with putative causal relations to PTSD.

METHODS

ML predictive classification methods - with causal discovery feature selection - were applied to a data set of 163 children hospitalized with an injury and PTSD was determined three months after hospital discharge. At the time of hospitalization, 105 risk factor variables were collected spanning a range of biopsychosocial domains.

RESULTS

Seven percent of subjects had a high level of PTSD symptoms. A predictive classification model was discovered with significant predictive accuracy. A predictive model constructed based on subsets of potentially causally relevant features achieves similar predictivity compared to the best predictive model constructed with all variables. Causal Discovery feature selection methods identified 58 variables of which 10 were identified as most stable.

CONCLUSIONS

In this first proof-of-concept application of ML methods to predict childhood Posttraumatic Stress we were able to determine both predictive classification models for childhood PTSD and identify several causal variables. This set of techniques has great potential for enhancing the methodological toolkit in the field and future studies should seek to replicate, refine, and extend the results produced in this study.

Collapse

Ruggles KV, Krug K, Wang X, Clauser KR, Wang J, Payne SH, Fenyö D, Zhang B, Mani DR. Methods, Tools and Current Perspectives in Proteogenomics. Mol Cell Proteomics 2017;16:959-981. [PMID: 28456751 DOI: 10.1074/mcp.mr117.000024] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Indexed: 12/20/2022] Open

Utilization of machine learning for prediction of post-traumatic stress: a re-examination of cortisol in the prediction and pathways to non-remitting PTSD. Transl Psychiatry 2017;7:e0. [PMID: 28323285 PMCID: PMC5416681 DOI: 10.1038/tp.2017.38] [Citation(s) in RCA: 79] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Revised: 12/01/2016] [Accepted: 12/15/2016] [Indexed: 01/24/2023] Open

Ray B, Ghedin E, Chunara R. Network inference from multimodal data: A review of approaches from infectious disease transmission. J Biomed Inform 2016;64:44-54. [PMID: 27612975 PMCID: PMC7106161 DOI: 10.1016/j.jbi.2016.09.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2016] [Revised: 07/10/2016] [Accepted: 09/03/2016] [Indexed: 02/02/2023]

Abstract

Networks inference problems are commonly found in multiple biomedical subfields such as genomics, metagenomics, neuroscience, and epidemiology. Networks are useful for representing a wide range of complex interactions ranging from those between molecular biomarkers, neurons, and microbial communities, to those found in human or animal populations. Recent technological advances have resulted in an increasing amount of healthcare data in multiple modalities, increasing the preponderance of network inference problems. Multi-domain data can now be used to improve the robustness and reliability of recovered networks from unimodal data. For infectious diseases in particular, there is a body of knowledge that has been focused on combining multiple pieces of linked information. Combining or analyzing disparate modalities in concert has demonstrated greater insight into disease transmission than could be obtained from any single modality in isolation. This has been particularly helpful in understanding incidence and transmission at early stages of infections that have pandemic potential. Novel pieces of linked information in the form of spatial, temporal, and other covariates including high-throughput sequence data, clinical visits, social network information, pharmaceutical prescriptions, and clinical symptoms (reported as free-text data) also encourage further investigation of these methods. The purpose of this review is to provide an in-depth analysis of multimodal infectious disease transmission network inference methods with a specific focus on Bayesian inference. We focus on analytical Bayesian inference-based methods as this enables recovering multiple parameters simultaneously, for example, not just the disease transmission network, but also parameters of epidemic dynamics. Our review studies their assumptions, key inference parameters and limitations, and ultimately provides insights about improving future network inference methods in multiple applications.

Collapse

Ma S, Ren J, Fenyö D. Breast Cancer Prognostics Using Multi-Omics Data. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2016;2016:52-9. [PMID: 27570650 PMCID: PMC5001766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Goff DC, Romero K, Paul J, Mercedes Perez-Rodriguez M, Crandall D, Potkin SG. Biomarkers for drug development in early psychosis: Current issues and promising directions. Eur Neuropsychopharmacol 2016;26:923-37. [PMID: 27005595 DOI: 10.1016/j.euroneuro.2016.01.009] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Revised: 01/20/2016] [Accepted: 01/23/2016] [Indexed: 12/14/2022]

Świtnicki MP, Juul M, Madsen T, Sørensen KD, Pedersen JS. PINCAGE: probabilistic integration of cancer genomics data for perturbed gene identification and sample classification. Bioinformatics 2016;32:1353-65. [PMID: 26740525 DOI: 10.1093/bioinformatics/btv758] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Accepted: 12/17/2015] [Indexed: 02/02/2023] Open

Abstract

MOTIVATION

Cancer development and progression is driven by a complex pattern of genomic and epigenomic perturbations. Both types of perturbations can affect gene expression levels and disease outcome. Integrative analysis of cancer genomics data may therefore improve detection of perturbed genes and prediction of disease state. As different data types are usually dependent, analysis based on independence assumptions will make inefficient use of the data and potentially lead to false conclusions.

MODEL

Here, we present PINCAGE (Probabilistic INtegration of CAncer GEnomics data), a method that uses probabilistic integration of cancer genomics data for combined evaluation of RNA-seq gene expression and 450k array DNA methylation measurements of promoters as well as gene bodies. It models the dependence between expression and methylation using modular graphical models, which also allows future inclusion of additional data types.

RESULTS

We apply our approach to a Breast Invasive Carcinoma dataset from The Cancer Genome Atlas consortium, which includes 82 adjacent normal and 730 cancer samples. We identify new biomarker candidates of breast cancer development (PTF1A, RABIF, RAG1AP1, TIMM17A, LOC148145) and progression (SERPINE3, ZNF706). PINCAGE discriminates better between normal and tumour tissue and between progressing and non-progressing tumours in comparison with established methods that assume independence between tested data types, especially when using evidence from multiple genes. Our method can be applied to any type of cancer or, more generally, to any genomic disease for which sufficient amount of molecular data is available.

AVAILABILITY AND IMPLEMENTATION

R scripts available at http://moma.ki.au.dk/prj/pincage/

CONTACT

: michal.switnicki@clin.au.dk or jakob.skou@clin.au.dk

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Serra A, Fratello M, Fortino V, Raiconi G, Tagliaferri R, Greco D. MVDA: a multi-view genomic data integration methodology. BMC Bioinformatics 2015;16:261. [PMID: 26283178 PMCID: PMC4539887 DOI: 10.1186/s12859-015-0680-3] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Accepted: 07/20/2015] [Indexed: 11/18/2022] Open