1
|
Qin X, Lock TR, Kallenbach RL. DA: Population structure inference using discriminant analysis. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13748] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Xinghu Qin
- Beijing Institute of Genomics Chinese Academy of Sciences Beijing China
| | - Thomas Ryan Lock
- Division of Plant Sciences University of Missouri Columbia MO USA
| | | |
Collapse
|
2
|
A mechanism-aware and multiomic machine-learning pipeline characterizes yeast cell growth. Proc Natl Acad Sci U S A 2020; 117:18869-18879. [PMID: 32675233 DOI: 10.1073/pnas.2002959117] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Metabolic modeling and machine learning are key components in the emerging next generation of systems and synthetic biology tools, targeting the genotype-phenotype-environment relationship. Rather than being used in isolation, it is becoming clear that their value is maximized when they are combined. However, the potential of integrating these two frameworks for omic data augmentation and integration is largely unexplored. We propose, rigorously assess, and compare machine-learning-based data integration techniques, combining gene expression profiles with computationally generated metabolic flux data to predict yeast cell growth. To this end, we create strain-specific metabolic models for 1,143 Saccharomyces cerevisiae mutants and we test 27 machine-learning methods, incorporating state-of-the-art feature selection and multiview learning approaches. We propose a multiview neural network using fluxomic and transcriptomic data, showing that the former increases the predictive accuracy of the latter and reveals functional patterns that are not directly deducible from gene expression alone. We test the proposed neural network on a further 86 strains generated in a different experiment, therefore verifying its robustness to an additional independent dataset. Finally, we show that introducing mechanistic flux features improves the predictions also for knockout strains whose genes were not modeled in the metabolic reconstruction. Our results thus demonstrate that fusing experimental cues with in silico models, based on known biochemistry, can contribute with disjoint information toward biologically informed and interpretable machine learning. Overall, this study provides tools for understanding and manipulating complex phenotypes, increasing both the prediction accuracy and the extent of discernible mechanistic biological insights.
Collapse
|
3
|
Affiliation(s)
- Lan Huong Nguyen
- Institute for Mathematical and Computational Engineering, Stanford University, Stanford, California, United States of America
| | - Susan Holmes
- Department of Statistics, Stanford University, Stanford, California, United States of America
| |
Collapse
|
4
|
Mitra S, Saha S. A multiobjective multi-view cluster ensemble technique: Application in patient subclassification. PLoS One 2019; 14:e0216904. [PMID: 31120942 PMCID: PMC6533037 DOI: 10.1371/journal.pone.0216904] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2019] [Accepted: 04/30/2019] [Indexed: 11/21/2022] Open
Abstract
Recent high throughput omics technology has been used to assemble large biomedical omics datasets. Clustering of single omics data has proven invaluable in biomedical research. For the task of patient sub-classification, all the available omics data should be utilized combinedly rather than treating them individually. Clustering of multi-omics datasets has the potential to reveal deep insights. Here, we propose a late integration based multiobjective multi-view clustering algorithm which uses a special perturbation operator. Initially, a large number of diverse clustering solutions (called base partitionings) are generated for each omic dataset using four clustering algorithms, viz., k means, complete linkage, spectral and fast search clustering. These base partitionings of multi-omic datasets are suitably combined using a special perturbation operator. The perturbation operator uses an ensemble technique to generate new solutions from the base partitionings. The optimal combination of multiple partitioning solutions across different views is determined after optimizing the objective functions, namely conn-XB, for checking the quality of partitionings for different views, and agreement index, for checking agreement between the views. The search capability of a multiobjective simulated annealing approach, namely AMOSA is used for this purpose. Lastly, the non-dominated solutions of the different views are combined based on similarity to generate a single set of non-dominated solutions. The proposed algorithm is evaluated on 13 multi-view cancer datasets. An elaborated comparative study with several baseline methods and five state-of-the-art models is performed to show the effectiveness of the algorithm.
Collapse
Affiliation(s)
- Sayantan Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Patna, India
- * E-mail:
| | - Sriparna Saha
- Department of Computer Science and Engineering, Indian Institute of Technology Patna, India
| |
Collapse
|
5
|
Krahe MA, Toohey J, Wolski M, Scuffham PA, Reilly S. Research data management in practice: Results from a cross-sectional survey of health and medical researchers from an academic institution in Australia. HEALTH INF MANAG J 2019; 49:108-116. [DOI: 10.1177/1833358319831318] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background: Building or acquiring research data management (RDM) capacity is a major challenge for health and medical researchers and academic institutes alike. Considering that RDM practices influence the integrity and longevity of data, targeting RDM services and support in recognition of needs is especially valuable in health and medical research. Objective: This project sought to examine the current RDM practices of health and medical researchers from an academic institution in Australia. Method: A cross-sectional survey was used to collect information from a convenience sample of 81 members of a research institute (68 academic staff and 13 postgraduate students). A survey was constructed to assess selected data management tasks associated with the earlier stages of the research data life cycle. Results: Our study indicates that RDM tasks associated with creating, processing and analysis of data vary greatly among researchers and are likely influenced by their level of research experience and RDM practices within their immediate teams. Conclusion: Evaluating the data management practices of health and medical researchers, contextualised by tasks associated with the research data life cycle, is an effective way of shaping RDM services and support in this group. Implications: This study recognises that institutional strategies targeted at tasks associated with the creation, processing and analysis of data will strengthen researcher capacity, instil good research practice and, over time, improve health informatics and research data quality.
Collapse
Affiliation(s)
| | - Julie Toohey
- Library and Learning Services, Griffith University, Gold Coast, QLD, Australia
| | - Malcolm Wolski
- eResearch Services, Griffith University, Nathan, QLD, Australia
| | - Paul A Scuffham
- Centre for Applied Health Economics, Griffith University, Nathan, QLD, Australia
- Menzies Health Institute Queensland, Griffith University, Gold Coast, QLD, Australia
| | - Sheena Reilly
- Health Group, Griffith University, Gold Coast, QLD, Australia
| |
Collapse
|
6
|
Athreya A, Iyer R, Neavin D, Wang L, Weinshilboum R, Kaddurah-Daouk R, Rush J, Frye M, Bobo W. Augmentation of Physician Assessments with Multi-Omics Enhances Predictability of Drug Response: A Case Study of Major Depressive Disorder. IEEE COMPUT INTELL M 2018; 13:20-31. [PMID: 30467458 DOI: 10.1109/mci.2018.2840660] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
This work proposes a "learning-augmented clinical assessment" workflow to sequentially augment physician assessments of patients' symptoms and their socio-demographic measures with heterogeneous biological measures to accurately predict treatment outcomes using machine learning. Across many psychiatric illnesses, ranging from major depressive disorder to schizophrenia, symptom severity assessments are subjective and do not include biological measures, making predictability in eventual treatment outcomes a challenge. Using data from the Mayo Clinic PGRN-AMPS SSRI trial as a case study, this work demonstrates a significant improvement in the prediction accuracy for antidepressant treatment outcomes in patients with major depressive disorder from 35% to 80% individualized by patient, compared to using only a physician's assessment as the predictors. This improvement is achieved through an iterative overlay of biological measures, starting with metabolites (blood measures modulated by drug action) associated with symptom severity, and then adding in genes associated with metabolomic concentrations. Hence, therapeutic efficacy for a new patient can be assessed prior to treatment, using prediction models that take as inputs, selected biological measures and physician's assessments of depression severity. Of broader significance extending beyond psychiatry, the approach presented in this work can potentially be applied to predicting treatment outcomes for other medical conditions, such as migraine headaches or rheumatoid arthritis, for which patients are treated according to subject-reported assessments of symptom severity.
Collapse
Affiliation(s)
- Arjun Athreya
- Department of Electrical and Computer Engineering, Univ. of Illinois at Urbana-Champaign, IL, USA
| | - Ravishankar Iyer
- Department of Electrical and Computer Engineering, Univ. of Illinois at Urbana-Champaign, IL, USA
| | - Drew Neavin
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, MN, USA
| | - Liewei Wang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, MN, USA
| | - Richard Weinshilboum
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, MN, USA
| | | | - John Rush
- Department of Psychiatry and Behavioral Sciences, Duke University, NC, USA
| | - Mark Frye
- Department of Psychiatry and Psychology, Mayo Clinic, MN, USA
| | - William Bobo
- Department of Psychiatry and Psychology, Mayo Clinic, FL, USA
| |
Collapse
|
7
|
Galatzer-Levy IR, Ruggles K, Chen Z. Data Science in the Research Domain Criteria Era: Relevance of Machine Learning to the Study of Stress Pathology, Recovery, and Resilience. CHRONIC STRESS (THOUSAND OAKS, CALIF.) 2018; 2:247054701774755. [PMID: 29527592 PMCID: PMC5841258 DOI: 10.1177/2470547017747553] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Diverse environmental and biological systems interact to influence individual differences in response to environmental stress. Understanding the nature of these complex relationships can enhance the development of methods to: (1) identify risk, (2) classify individuals as healthy or ill, (3) understand mechanisms of change, and (4) develop effective treatments. The Research Domain Criteria (RDoC) initiative provides a theoretical framework to understand health and illness as the product of multiple inter-related systems but does not provide a framework to characterize or statistically evaluate such complex relationships. Characterizing and statistically evaluating models that integrate multiple levels (e.g. synapses, genes, environmental factors) as they relate to outcomes that a free from prior diagnostic benchmarks represents a challenge requiring new computational tools that are capable to capture complex relationships and identify clinically relevant populations. In the current review, we will summarize machine learning methods that can achieve these goals.
Collapse
Affiliation(s)
| | | | - Zhe Chen
- NYU School of Medicine, Department of Psychiatry
| |
Collapse
|
8
|
Ray B, Liu W, Fenyö D. Adaptive Multiview Nonnegative Matrix Factorization Algorithm for Integration of Multimodal Biomedical Data. Cancer Inform 2017; 16:1176935117725727. [PMID: 28835735 PMCID: PMC5564898 DOI: 10.1177/1176935117725727] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 07/08/2017] [Indexed: 11/16/2022] Open
Abstract
The amounts and types of available multimodal tumor data are rapidly increasing, and their integration is critical for fully understanding the underlying cancer biology and personalizing treatment. However, the development of methods for effectively integrating multimodal data in a principled manner is lagging behind our ability to generate the data. In this article, we introduce an extension to a multiview nonnegative matrix factorization algorithm (NNMF) for dimensionality reduction and integration of heterogeneous data types and compare the predictive modeling performance of the method on unimodal and multimodal data. We also present a comparative evaluation of our novel multiview approach and current data integration methods. Our work provides an efficient method to extend an existing dimensionality reduction method. We report rigorous evaluation of the method on large-scale quantitative protein and phosphoprotein tumor data from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) acquired using state-of-the-art liquid chromatography mass spectrometry. Exome sequencing and RNA-Seq data were also available from The Cancer Genome Atlas for the same tumors. For unimodal data, in case of breast cancer, transcript levels were most predictive of estrogen and progesterone receptor status and copy number variation of human epidermal growth factor receptor 2 status. For ovarian and colon cancers, phosphoprotein and protein levels were most predictive of tumor grade and stage and residual tumor, respectively. When multiview NNMF was applied to multimodal data to predict outcomes, the improvement in performance is not overall statistically significant beyond unimodal data, suggesting that proteomics data may contain more predictive information regarding tumor phenotypes than transcript levels, probably due to the fact that proteins are the functional gene products and therefore a more direct measurement of the functional state of the tumor. Here, we have applied our proposed approach to multimodal molecular data for tumors, but it is generally applicable to dimensionality reduction and joint analysis of any type of multimodal data.
Collapse
Affiliation(s)
- Bisakha Ray
- Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, NYU School of Medicine, New York, NY, USA
| | - Wenke Liu
- Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, NYU School of Medicine, New York, NY, USA
| | - David Fenyö
- Institute for Systems Genetics and Department of Biochemistry and Molecular Pharmacology, NYU School of Medicine, New York, NY, USA
| |
Collapse
|
9
|
Saxe GN, Ma S, Ren J, Aliferis C. Machine learning methods to predict child posttraumatic stress: a proof of concept study. BMC Psychiatry 2017; 17:223. [PMID: 28689495 PMCID: PMC5502325 DOI: 10.1186/s12888-017-1384-1] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Accepted: 06/09/2017] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND The care of traumatized children would benefit significantly from accurate predictive models for Posttraumatic Stress Disorder (PTSD), using information available around the time of trauma. Machine Learning (ML) computational methods have yielded strong results in recent applications across many diseases and data types, yet they have not been previously applied to childhood PTSD. Since these methods have not been applied to this complex and debilitating disorder, there is a great deal that remains to be learned about their application. The first step is to prove the concept: Can ML methods - as applied in other fields - produce predictive classification models for childhood PTSD? Additionally, we seek to determine if specific variables can be identified - from the aforementioned predictive classification models - with putative causal relations to PTSD. METHODS ML predictive classification methods - with causal discovery feature selection - were applied to a data set of 163 children hospitalized with an injury and PTSD was determined three months after hospital discharge. At the time of hospitalization, 105 risk factor variables were collected spanning a range of biopsychosocial domains. RESULTS Seven percent of subjects had a high level of PTSD symptoms. A predictive classification model was discovered with significant predictive accuracy. A predictive model constructed based on subsets of potentially causally relevant features achieves similar predictivity compared to the best predictive model constructed with all variables. Causal Discovery feature selection methods identified 58 variables of which 10 were identified as most stable. CONCLUSIONS In this first proof-of-concept application of ML methods to predict childhood Posttraumatic Stress we were able to determine both predictive classification models for childhood PTSD and identify several causal variables. This set of techniques has great potential for enhancing the methodological toolkit in the field and future studies should seek to replicate, refine, and extend the results produced in this study.
Collapse
Affiliation(s)
- Glenn N. Saxe
- 0000 0004 1936 8753grid.137628.9Department of Child and Adolescent Psychiatry, New York University School of Medicine, One Park Avenue, New York, NY 10016 USA
| | - Sisi Ma
- 0000000419368657grid.17635.36Institute for Health Informatics and Department of Medicine, University of Minnesota, 330 Diehl Hall, MMC912, 420 Delaware Street S.E, Minneapolis, Minnesota, Mpls, MN 55455 USA
| | - Jiwen Ren
- 0000 0004 1936 8753grid.137628.9Department of Child and Adolescent Psychiatry and Center for Health Informatics and Bioinformatics, New York University School of Medicine, One Park Avenue, New York, NY 10016 USA
| | - Constantin Aliferis
- 0000000419368657grid.17635.36Institute for Health Informatics, Department of Medicine, and Data Science Program, University of Minnesota, Minneapolis, MN USA ,0000 0001 2264 7217grid.152326.1Department of Biostatistics, Vanderbilt University, 330 Diehl Hall, MMC912, 420 Delaware Street S.E., Mpls, MN, Nashville, TN 55455 USA
| |
Collapse
|
10
|
Ruggles KV, Krug K, Wang X, Clauser KR, Wang J, Payne SH, Fenyö D, Zhang B, Mani DR. Methods, Tools and Current Perspectives in Proteogenomics. Mol Cell Proteomics 2017; 16:959-981. [PMID: 28456751 DOI: 10.1074/mcp.mr117.000024] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Indexed: 12/20/2022] Open
Abstract
With combined technological advancements in high-throughput next-generation sequencing and deep mass spectrometry-based proteomics, proteogenomics, i.e. the integrative analysis of proteomic and genomic data, has emerged as a new research field. Early efforts in the field were focused on improving protein identification using sample-specific genomic and transcriptomic sequencing data. More recently, integrative analysis of quantitative measurements from genomic and proteomic studies have identified novel insights into gene expression regulation, cell signaling, and disease. Many methods and tools have been developed or adapted to enable an array of integrative proteogenomic approaches and in this article, we systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenomic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization. We provide a comprehensive review of methods and available tools in each category and highlight their typical applications.
Collapse
Affiliation(s)
- Kelly V Ruggles
- From the ‡Department of Medicine, New York University School of Medicine, New York, New York 10016
| | - Karsten Krug
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Xiaojing Wang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030.,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - Karl R Clauser
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Jing Wang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030.,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - Samuel H Payne
- **Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354
| | - David Fenyö
- ‡‡Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, New York 10016; .,§§Institute for Systems Genetics, New York University School of Medicine, New York, New York 10016
| | - Bing Zhang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030; .,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - D R Mani
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142;
| |
Collapse
|
11
|
Utilization of machine learning for prediction of post-traumatic stress: a re-examination of cortisol in the prediction and pathways to non-remitting PTSD. Transl Psychiatry 2017; 7:e0. [PMID: 28323285 PMCID: PMC5416681 DOI: 10.1038/tp.2017.38] [Citation(s) in RCA: 79] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Revised: 12/01/2016] [Accepted: 12/15/2016] [Indexed: 01/24/2023] Open
Abstract
To date, studies of biological risk factors have revealed inconsistent relationships with subsequent post-traumatic stress disorder (PTSD). The inconsistent signal may reflect the use of data analytic tools that are ill equipped for modeling the complex interactions between biological and environmental factors that underlay post-traumatic psychopathology. Further, using symptom-based diagnostic status as the group outcome overlooks the inherent heterogeneity of PTSD, potentially contributing to failures to replicate. To examine the potential yield of novel analytic tools, we reanalyzed data from a large longitudinal study of individuals identified following trauma in the general emergency room (ER) that failed to find a linear association between cortisol response to traumatic events and subsequent PTSD. First, latent growth mixture modeling empirically identified trajectories of post-traumatic symptoms, which then were used as the study outcome. Next, support vector machines with feature selection identified sets of features with stable predictive accuracy and built robust classifiers of trajectory membership (area under the receiver operator characteristic curve (AUC)=0.82 (95% confidence interval (CI)=0.80-0.85)) that combined clinical, neuroendocrine, psychophysiological and demographic information. Finally, graph induction algorithms revealed a unique path from childhood trauma via lower cortisol during ER admission, to non-remitting PTSD. Traditional general linear modeling methods then confirmed the newly revealed association, thereby delineating a specific target population for early endocrine interventions. Advanced computational approaches offer innovative ways for uncovering clinically significant, non-shared biological signals in heterogeneous samples.
Collapse
|
12
|
Ray B, Ghedin E, Chunara R. Network inference from multimodal data: A review of approaches from infectious disease transmission. J Biomed Inform 2016; 64:44-54. [PMID: 27612975 PMCID: PMC7106161 DOI: 10.1016/j.jbi.2016.09.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2016] [Revised: 07/10/2016] [Accepted: 09/03/2016] [Indexed: 02/02/2023]
Abstract
Networks inference problems are commonly found in multiple biomedical subfields such as genomics, metagenomics, neuroscience, and epidemiology. Networks are useful for representing a wide range of complex interactions ranging from those between molecular biomarkers, neurons, and microbial communities, to those found in human or animal populations. Recent technological advances have resulted in an increasing amount of healthcare data in multiple modalities, increasing the preponderance of network inference problems. Multi-domain data can now be used to improve the robustness and reliability of recovered networks from unimodal data. For infectious diseases in particular, there is a body of knowledge that has been focused on combining multiple pieces of linked information. Combining or analyzing disparate modalities in concert has demonstrated greater insight into disease transmission than could be obtained from any single modality in isolation. This has been particularly helpful in understanding incidence and transmission at early stages of infections that have pandemic potential. Novel pieces of linked information in the form of spatial, temporal, and other covariates including high-throughput sequence data, clinical visits, social network information, pharmaceutical prescriptions, and clinical symptoms (reported as free-text data) also encourage further investigation of these methods. The purpose of this review is to provide an in-depth analysis of multimodal infectious disease transmission network inference methods with a specific focus on Bayesian inference. We focus on analytical Bayesian inference-based methods as this enables recovering multiple parameters simultaneously, for example, not just the disease transmission network, but also parameters of epidemic dynamics. Our review studies their assumptions, key inference parameters and limitations, and ultimately provides insights about improving future network inference methods in multiple applications.
Collapse
Affiliation(s)
- Bisakha Ray
- Center for Health Informatics and Bioinformatics, New York University School of Medicine, USA.
| | - Elodie Ghedin
- Department of Biology, Center for Genomics & Systems Biology, USA; College of Global Public Health, New York University, USA
| | - Rumi Chunara
- Dept. of Computer Science and Engineering, Tandon School of Engineering, USA; College of Global Public Health, New York University, USA
| |
Collapse
|
13
|
Ma S, Ren J, Fenyö D. Breast Cancer Prognostics Using Multi-Omics Data. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2016; 2016:52-9. [PMID: 27570650 PMCID: PMC5001766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Breast cancer affects one in eight women in America and is a leading cause of death from cancer worldwide. In the current study, four types of Omics data including copy number variation, gene expression, proteome and phosphoproteome were collected from seventy-seven breast cancer patients. Individual types of Omics data were used to separately construct predictive models to predict ten-year survival, an important clinical hallmark. The predictive models constructed with proteome data achieved decent predictivity (mean AUC = 0.725) and outperforms the models constructed with other types of Omics data. This indicates that high quality, large scale protein data is more effective for survival prediction compared to other types of omics data. Further, we experimented with ten different data fusion techniques (generic and Multi-kernel learning based) to test whether combining multi-Omics data can result in improved predictive performance. None of the data fusion techniques tested in the current study outperforms the predictive models built with the proteome data.
Collapse
Affiliation(s)
- Sisi Ma
- Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, New York, NY
| | - Jiwen Ren
- Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, New York, NY
| | - David Fenyö
- Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, New York, NY
| |
Collapse
|
14
|
Goff DC, Romero K, Paul J, Mercedes Perez-Rodriguez M, Crandall D, Potkin SG. Biomarkers for drug development in early psychosis: Current issues and promising directions. Eur Neuropsychopharmacol 2016; 26:923-37. [PMID: 27005595 DOI: 10.1016/j.euroneuro.2016.01.009] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Revised: 01/20/2016] [Accepted: 01/23/2016] [Indexed: 12/14/2022]
Abstract
A major goal of current research in schizophrenia is to understand the biology underlying onset and early progression and to develop interventions that modify these processes. Biomarkers can play a critical role in identifying disease state, factors contributing to underlying progression, as well as predicting and monitoring response to treatment. Once biomarker-based therapeutics are established, biomarkers can guide treatment selection. It is increasingly clear that a wide range of potential biomarkers should be examined in schizophrenia, given the large number of genetic and environmental factors that have been identified as risk factors. New models for analysis of biomarkers are needed that represent the central nervous system as a highly complex, dynamic, and interactive system. Many tools are available with which to study relevant brain chemistry, but most are indirect measures and represent only a small fraction of the potential etiologic factors contributing to the molecular, structural and functional components of schizophrenia. This review represents the work of the International Society for CNS Clinical Trials and Methodology (ISCTM) Biomarkers Working Group. It discusses advantages and disadvantages of different categories of biomarkers and provides a summary of evidence that biomarkers representing inflammation, oxidative stress, endocannabinoids, glucocorticoid, and biogenic amines systems are dysregulated and potentially interactive in early phase schizophrenia. As has been recently demonstrated in several neurodevelopmental and neurodegenerative disorders, a multi-modal, longitudinal strategy involving a diverse array of biomarkers and new approaches to statistical modeling are needed to improve early interventions based on the fuller understanding.
Collapse
Affiliation(s)
| | | | - Jeffrey Paul
- Astellas Pharma Global Development, Northbrook, IL, USA
| | | | | | | |
Collapse
|
15
|
Świtnicki MP, Juul M, Madsen T, Sørensen KD, Pedersen JS. PINCAGE: probabilistic integration of cancer genomics data for perturbed gene identification and sample classification. Bioinformatics 2016; 32:1353-65. [PMID: 26740525 DOI: 10.1093/bioinformatics/btv758] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Accepted: 12/17/2015] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Cancer development and progression is driven by a complex pattern of genomic and epigenomic perturbations. Both types of perturbations can affect gene expression levels and disease outcome. Integrative analysis of cancer genomics data may therefore improve detection of perturbed genes and prediction of disease state. As different data types are usually dependent, analysis based on independence assumptions will make inefficient use of the data and potentially lead to false conclusions. MODEL Here, we present PINCAGE (Probabilistic INtegration of CAncer GEnomics data), a method that uses probabilistic integration of cancer genomics data for combined evaluation of RNA-seq gene expression and 450k array DNA methylation measurements of promoters as well as gene bodies. It models the dependence between expression and methylation using modular graphical models, which also allows future inclusion of additional data types. RESULTS We apply our approach to a Breast Invasive Carcinoma dataset from The Cancer Genome Atlas consortium, which includes 82 adjacent normal and 730 cancer samples. We identify new biomarker candidates of breast cancer development (PTF1A, RABIF, RAG1AP1, TIMM17A, LOC148145) and progression (SERPINE3, ZNF706). PINCAGE discriminates better between normal and tumour tissue and between progressing and non-progressing tumours in comparison with established methods that assume independence between tested data types, especially when using evidence from multiple genes. Our method can be applied to any type of cancer or, more generally, to any genomic disease for which sufficient amount of molecular data is available. AVAILABILITY AND IMPLEMENTATION R scripts available at http://moma.ki.au.dk/prj/pincage/ CONTACT : michal.switnicki@clin.au.dk or jakob.skou@clin.au.dk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | | | - Jakob S Pedersen
- Department of Molecular Medicine (MOMA) Bioinformatics Research Centre (BiRC), Aarhus University, Aarhus, 8000, Denmark
| |
Collapse
|
16
|
Serra A, Fratello M, Fortino V, Raiconi G, Tagliaferri R, Greco D. MVDA: a multi-view genomic data integration methodology. BMC Bioinformatics 2015; 16:261. [PMID: 26283178 PMCID: PMC4539887 DOI: 10.1186/s12859-015-0680-3] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Accepted: 07/20/2015] [Indexed: 11/18/2022] Open
Abstract
Background Multiple high-throughput molecular profiling by omics technologies can be collected for the same individuals. Combining these data, rather than exploiting them separately, can significantly increase the power of clinically relevant patients subclassifications. Results We propose a multi-view approach in which the information from different data layers (views) is integrated at the levels of the results of each single view clustering iterations. It works by factorizing the membership matrices in a late integration manner. We evaluated the effectiveness and the performance of our method on six multi-view cancer datasets. In all the cases, we found patient sub-classes with statistical significance, identifying novel sub-groups previously not emphasized in literature. Our method performed better as compared to other multi-view clustering algorithms and, unlike other existing methods, it is able to quantify the contribution of single views on the final results. Conclusion Our observations suggest that integration of prior information with genomic features in the subtyping analysis is an effective strategy in identifying disease subgroups. The methodology is implemented in R and the source code is available online at http://neuronelab.unisa.it/a-multi-view-genomic-data-integration-methodology/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0680-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Angela Serra
- NeuRoNe Lab, Department of Computer Science, University of Salerno, Fisciano, Italy.
| | - Michele Fratello
- Department of Medical, Surgical, Neurological, Metabolic and Ageing Sciences, Second University of Napoli, Napoli, Italy.
| | - Vittorio Fortino
- Unit of Systems Toxicology and Nanosafety Research Centre, Finnish Institute of Occupational Health, FIOH, Helsinki, Finland.
| | - Giancarlo Raiconi
- NeuRoNe Lab, Department of Computer Science, University of Salerno, Fisciano, Italy.
| | - Roberto Tagliaferri
- NeuRoNe Lab, Department of Computer Science, University of Salerno, Fisciano, Italy.
| | - Dario Greco
- Unit of Systems Toxicology and Nanosafety Research Centre, Finnish Institute of Occupational Health, FIOH, Helsinki, Finland.
| |
Collapse
|