1
|
Dutta D, Sen A, Satagopan JM. Identifying genes associated with disease outcomes using joint sparse canonical correlation analysis-An application in renal clear cell carcinoma. Genet Epidemiol 2024; 48:414-432. [PMID: 38751238 PMCID: PMC11589067 DOI: 10.1002/gepi.22566] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 04/04/2024] [Accepted: 04/22/2024] [Indexed: 11/27/2024]
Abstract
Somatic changes like copy number aberrations (CNAs) and epigenetic alterations like methylation have pivotal effects on disease outcomes and prognosis in cancer, by regulating gene expressions, that drive critical biological processes. To identify potential biomarkers and molecular targets and understand how they impact disease outcomes, it is important to identify key groups of CNAs, the associated methylation, and the gene expressions they impact, through a joint integrative analysis. Here, we propose a novel analysis pipeline, the joint sparse canonical correlation analysis (jsCCA), an extension of sCCA, to effectively identify an ensemble of CNAs, methylation sites and gene (expression) components in the context of disease endpoints, especially tumor characteristics. Our approach detects potentially orthogonal gene components that are highly correlated with sets of methylation sites which in turn are correlated with sets of CNA sites. It then identifies the genes within these components that are associated with the outcome. Further, we aggregate the effect of each gene expression set on tumor stage by constructing "gene component scores" and test its interaction with traditional risk factors. Analyzing clinical and genomic data on 515 renal clear cell carcinoma (ccRCC) patients from the TCGA-KIRC, we found eight gene components to be associated with methylation sites, regulated by groups of proximally located CNA sites. Association analysis with tumor stage at diagnosis identified a novel association of expression of ASAH1 gene trans-regulated by methylation of several genes including SIX5 and by CNAs in the 10q25 region including TCF7L2. Further analysis to quantify the overall effect of gene sets on tumor stage, revealed that two of the eight gene components have significant interaction with smoking in relation to tumor stage. These gene components represent distinct biological functions including immune function, inflammatory responses, and hypoxia-regulated pathways. Our findings suggest that jsCCA analysis can identify interpretable and important genes, regulatory structures, and clinically consequential pathways. Such methods are warranted for comprehensive analysis of multimodal data especially in cancer genomics.
Collapse
Affiliation(s)
- Diptavo Dutta
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology and GeneticsNational Cancer InstituteRockvilleUSA
| | - Ananda Sen
- Department of BiostatisticsUniversity of MichiganAnn ArborUSA
- Department of Family MedicineUniversity of MichiganAnn ArborUSA
| | - Jaya M. Satagopan
- Department of Biostatistics and EpidemiologyRutgers School of Public HealthPiscatawayUSA
| |
Collapse
|
2
|
Karanth S, Pradhan AK. Advanced data analytics and "omics" techniques to control enteric foodborne pathogens. ADVANCES IN FOOD AND NUTRITION RESEARCH 2024; 113:383-422. [PMID: 40023564 DOI: 10.1016/bs.afnr.2024.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/04/2025]
Abstract
Enteric pathogens, particularly bacterial pathogens, are associated with millions of cases of foodborne illness in the U.S. and worldwide, necessitating the identification and development of methods to control and minimize their impact on public health. Predictive modeling and quantitative microbial risk assessment are two such methods that analyze data on microbial behavior, particularly as a response to changes in the food matrix, to predict and control the presence and prevalence of these pathogens in food. However, a number of these bacterial enteric pathogens, including Escherichia coli, Listeria monocytogenes, and Salmonella enterica, have inherent genetic and phenotypic differences among their subtypes and variants. This has led to an increasing reliance on "omics" technologies, including genomics, proteomics, transcriptomics, and metabolomics, to identify and characterize pathogenic microorganisms and their behavior in food systems. With this exponential increase in available data on these enteric pathogens, comes a need for the development of novel strategies to analyze this data. Advanced data analysis/analytics is a means to extract value from these large data sources, and is considered the core of data processing. In the past few years, advanced data analytics methods such as machine learning and artificial intelligence have been increasingly used to extract meaningful, actionable knowledge from these data sources to help mitigate food safety issues caused by enteric pathogens. This chapter reviews the latest in research into the use of advanced data analytics, particularly machine learning, to analyze "omics" data of enteric bacterial pathogens, and identifies potential future uses of these techniques in mitigating the risk of these pathogens on public health.
Collapse
Affiliation(s)
- Shraddha Karanth
- Department of Nutrition and Food Science, University of Maryland, College Park, MD, United States
| | - Abani K Pradhan
- Department of Nutrition and Food Science, University of Maryland, College Park, MD, United States; Center for Food Safety and Security Systems, University of Maryland, College Park, MD, United States.
| |
Collapse
|
3
|
Dutta D, Sen A, Satagopan J. Sparse canonical correlation to identify breast cancer related genes regulated by copy number aberrations. PLoS One 2022; 17:e0276886. [PMID: 36584096 PMCID: PMC9803132 DOI: 10.1371/journal.pone.0276886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 10/16/2022] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Copy number aberrations (CNAs) in cancer affect disease outcomes by regulating molecular phenotypes, such as gene expressions, that drive important biological processes. To gain comprehensive insights into molecular biomarkers for cancer, it is critical to identify key groups of CNAs, the associated gene modules, regulatory modules, and their downstream effect on outcomes. METHODS In this paper, we demonstrate an innovative use of sparse canonical correlation analysis (sCCA) to effectively identify the ensemble of CNAs, and gene modules in the context of binary and censored disease endpoints. Our approach detects potentially orthogonal gene expression modules which are highly correlated with sets of CNA and then identifies the genes within these modules that are associated with the outcome. RESULTS Analyzing clinical and genomic data on 1,904 breast cancer patients from the METABRIC study, we found 14 gene modules to be regulated by groups of proximally located CNA sites. We validated this finding using an independent set of 1,077 breast invasive carcinoma samples from The Cancer Genome Atlas (TCGA). Our analysis of 7 clinical endpoints identified several novel and interpretable regulatory associations, highlighting the role of CNAs in key biological pathways and processes for breast cancer. Genes significantly associated with the outcomes were enriched for early estrogen response pathway, DNA repair pathways as well as targets of transcription factors such as E2F4, MYC, and ETS1 that have recognized roles in tumor characteristics and survival. Subsequent meta-analysis across the endpoints further identified several genes through the aggregation of weaker associations. CONCLUSIONS Our findings suggest that sCCA analysis can aggregate weaker associations to identify interpretable and important genes, modules, and clinically consequential pathways.
Collapse
Affiliation(s)
- Diptavo Dutta
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, United States of America
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
- * E-mail: ,
| | - Ananda Sen
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States of America
- Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States of America
| | - Jaya Satagopan
- Department of Biostatistics and Epidemiology, Rutgers University, New Brunswick, NJ, United States of America
| |
Collapse
|
4
|
Harel T, Peshes-Yaloz N, Bacharach E, Gat-Viks I. Predicting Phenotypic Diversity from Molecular and Genetic Data. Genetics 2019; 213:297-311. [PMID: 31352366 PMCID: PMC6727812 DOI: 10.1534/genetics.119.302463] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 07/04/2019] [Indexed: 01/03/2023] Open
Abstract
Despite the importance of complex phenotypes, an in-depth understanding of the combined molecular and genetic effects on a phenotype has yet to be achieved. Here, we introduce InPhenotype, a novel computational approach for complex phenotype prediction, where gene-expression data and genotyping data are integrated to yield quantitative predictions of complex physiological traits. Unlike existing computational methods, InPhenotype makes it possible to model potential regulatory interactions between gene expression and genomic loci without compromising the continuous nature of the molecular data. We applied InPhenotype to synthetic data, exemplifying its utility for different data parameters, as well as its superiority compared to current methods in both prediction quality and the ability to detect regulatory interactions of genes and genomic loci. Finally, we show that InPhenotype can provide biological insights into both mouse and yeast datasets.
Collapse
Affiliation(s)
- Tom Harel
- School of Molecular Cell Biology and Biotechnology, The George S. Wise Faculty of Life Sciences, Tel Aviv University, 6997801 Israe
| | - Naama Peshes-Yaloz
- School of Molecular Cell Biology and Biotechnology, The George S. Wise Faculty of Life Sciences, Tel Aviv University, 6997801 Israe
| | - Eran Bacharach
- School of Molecular Cell Biology and Biotechnology, The George S. Wise Faculty of Life Sciences, Tel Aviv University, 6997801 Israe
| | - Irit Gat-Viks
- School of Molecular Cell Biology and Biotechnology, The George S. Wise Faculty of Life Sciences, Tel Aviv University, 6997801 Israe
| |
Collapse
|
5
|
Liu L, Wang J, Yang JR, Wang F, He X. The expression tractability of biological traits shaped by natural selection. J Genet Genomics 2019; 46:397-404. [PMID: 31471211 DOI: 10.1016/j.jgg.2019.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 07/31/2019] [Accepted: 08/01/2019] [Indexed: 10/26/2022]
Abstract
Understanding how gene expression is translated to phenotype is central to modern molecular biology, and the success is contingent on the intrinsic tractability of the specific traits under examination. However, an a priori estimate of trait tractability from the perspective of gene expression is unavailable. Motivated by the concept of entropy in a thermodynamic system, we here propose such an estimate (ST) by gauging the number (N) of expression states that underlie the same trait abnormality, with large ST corresponding to large N. By analyzing over 200 yeast morphological traits, we show that ST predicts the tractability of an expression-trait relationship. We further show that ST is ultimately determined by natural selection, which builds co-regulated gene modules to minimize possible expression states.
Collapse
Affiliation(s)
- Li Liu
- State Key Laboratory of Bio-control, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China
| | - Jianguo Wang
- State Key Laboratory of Bio-control, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China
| | - Jian-Rong Yang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Feng Wang
- State Key Laboratory of Bio-control, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China
| | - Xionglei He
- State Key Laboratory of Bio-control, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China.
| |
Collapse
|
6
|
Kagohara LT, Stein-O’Brien GL, Kelley D, Flam E, Wick HC, Danilova LV, Easwaran H, Favorov AV, Qian J, Gaykalova DA, Fertig EJ. Epigenetic regulation of gene expression in cancer: techniques, resources and analysis. Brief Funct Genomics 2019; 17:49-63. [PMID: 28968850 PMCID: PMC5860551 DOI: 10.1093/bfgp/elx018] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Cancer is a complex disease, driven by aberrant activity in numerous signaling pathways in even individual malignant cells. Epigenetic changes are critical mediators of these functional changes that drive and maintain the malignant phenotype. Changes in DNA methylation, histone acetylation and methylation, noncoding RNAs, posttranslational modifications are all epigenetic drivers in cancer, independent of changes in the DNA sequence. These epigenetic alterations were once thought to be crucial only for the malignant phenotype maintenance. Now, epigenetic alterations are also recognized as critical for disrupting essential pathways that protect the cells from uncontrolled growth, longer survival and establishment in distant sites from the original tissue. In this review, we focus on DNA methylation and chromatin structure in cancer. The precise functional role of these alterations is an area of active research using emerging high-throughput approaches and bioinformatics analysis tools. Therefore, this review also describes these high-throughput measurement technologies, public domain databases for high-throughput epigenetic data in tumors and model systems and bioinformatics algorithms for their analysis. Advances in bioinformatics data that combine these epigenetic data with genomics data are essential to infer the function of specific epigenetic alterations in cancer. These integrative algorithms are also a focus of this review. Future studies using these emerging technologies will elucidate how alterations in the cancer epigenome cooperate with genetic aberrations during tumor initiation and progression. This deeper understanding is essential to future studies with epigenetics biomarkers and precision medicine using emerging epigenetic therapies.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Daria A Gaykalova
- Corresponding authors: Daria A. Gaykalova, Otolaryngology - Head and Neck Surgery, The Johns Hopkins University School of Medicine, 1550 Orleans Street, Rm 574, CRBII Baltimore, MD 21231, USA. Tel.: +1 410 614 2745; Fax: +1 410 614 1411; E-mail: ; Elana J. Fertig, Assistant Professor of Oncology, Division of Biostatistics and Bioinformatics, Johns Hopkins University, 550 N Broadway, 1101 E Baltimore, MD 21205, USA. Tel.: +1 410 955 4268; Fax: +1 410 955 0859; E-mail:
| | | |
Collapse
|
7
|
Misra BB, Langefeld CD, Olivier M, Cox LA. Integrated Omics: Tools, Advances, and Future Approaches. J Mol Endocrinol 2018; 62:JME-18-0055. [PMID: 30006342 DOI: 10.1530/jme-18-0055] [Citation(s) in RCA: 239] [Impact Index Per Article: 34.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 07/02/2018] [Accepted: 07/12/2018] [Indexed: 12/13/2022]
Abstract
With the rapid adoption of high-throughput omic approaches to analyze biological samples such as genomics, transcriptomics, proteomics, and metabolomics, each analysis can generate tera- to peta-byte sized data files on a daily basis. These data file sizes, together with differences in nomenclature among these data types, make the integration of these multi-dimensional omics data into biologically meaningful context challenging. Variously named as integrated omics, multi-omics, poly-omics, trans-omics, pan-omics, or shortened to just 'omics', the challenges include differences in data cleaning, normalization, biomolecule identification, data dimensionality reduction, biological contextualization, statistical validation, data storage and handling, sharing, and data archiving. The ultimate goal is towards the holistic realization of a 'systems biology' understanding of the biological question in hand. Commonly used approaches in these efforts are currently limited by the 3 i's - integration, interpretation, and insights. Post integration, these very large datasets aim to yield unprecedented views of cellular systems at exquisite resolution for transformative insights into processes, events, and diseases through various computational and informatics frameworks. With the continued reduction in costs and processing time for sample analyses, and increasing types of omics datasets generated such as glycomics, lipidomics, microbiomics, and phenomics, an increasing number of scientists in this interdisciplinary domain of bioinformatics face these challenges. We discuss recent approaches, existing tools, and potential caveats in the integration of omics datasets for development of standardized analytical pipelines that could be adopted by the global omics research community.
Collapse
Affiliation(s)
- Biswapriya B Misra
- B Misra, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Carl D Langefeld
- C Langefeld, Biostatistical Sciences, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Michael Olivier
- M Olivier, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Laura A Cox
- L Cox, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| |
Collapse
|
8
|
Kagohara LT, Stein-O'Brien GL, Kelley D, Flam E, Wick HC, Danilova LV, Easwaran H, Favorov AV, Qian J, Gaykalova DA, Fertig EJ. Epigenetic regulation of gene expression in cancer: techniques, resources and analysis. Brief Funct Genomics 2018. [PMID: 28968850 DOI: 10.1101/114025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2023] Open
Abstract
Cancer is a complex disease, driven by aberrant activity in numerous signaling pathways in even individual malignant cells. Epigenetic changes are critical mediators of these functional changes that drive and maintain the malignant phenotype. Changes in DNA methylation, histone acetylation and methylation, noncoding RNAs, posttranslational modifications are all epigenetic drivers in cancer, independent of changes in the DNA sequence. These epigenetic alterations were once thought to be crucial only for the malignant phenotype maintenance. Now, epigenetic alterations are also recognized as critical for disrupting essential pathways that protect the cells from uncontrolled growth, longer survival and establishment in distant sites from the original tissue. In this review, we focus on DNA methylation and chromatin structure in cancer. The precise functional role of these alterations is an area of active research using emerging high-throughput approaches and bioinformatics analysis tools. Therefore, this review also describes these high-throughput measurement technologies, public domain databases for high-throughput epigenetic data in tumors and model systems and bioinformatics algorithms for their analysis. Advances in bioinformatics data that combine these epigenetic data with genomics data are essential to infer the function of specific epigenetic alterations in cancer. These integrative algorithms are also a focus of this review. Future studies using these emerging technologies will elucidate how alterations in the cancer epigenome cooperate with genetic aberrations during tumor initiation and progression. This deeper understanding is essential to future studies with epigenetics biomarkers and precision medicine using emerging epigenetic therapies.
Collapse
|
9
|
Loo LH, Bougen-Zhukov NM, Tan WLC. Early spatiotemporal-specific changes in intermediate signals are predictive of cytotoxic sensitivity to TNFα and co-treatments. Sci Rep 2017; 7:43541. [PMID: 28272488 PMCID: PMC5341104 DOI: 10.1038/srep43541] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 01/27/2017] [Indexed: 12/18/2022] Open
Abstract
Signaling pathways can generate different cellular responses to the same cytotoxic agents. Current quantitative models for predicting these differential responses are usually based on large numbers of intracellular gene products or signals at different levels of signaling cascades. Here, we report a study to predict cellular sensitivity to tumor necrosis factor alpha (TNFα) using high-throughput cellular imaging and machine-learning methods. We measured and compared 1170 protein phosphorylation events in a panel of human lung cancer cell lines based on different signals, subcellular regions, and time points within one hour of TNFα treatment. We found that two spatiotemporal-specific changes in an intermediate signaling protein, p90 ribosomal S6 kinase (RSK), are sufficient to predict the TNFα sensitivity of these cell lines. Our models could also predict the combined effects of TNFα and other kinase inhibitors, many of which are not known to target RSK directly. Therefore, early spatiotemporal-specific changes in intermediate signals are sufficient to represent the complex cellular responses to these perturbations. Our study provides a general framework for the development of rapid, signaling-based cytotoxicity screens that may be used to predict cellular sensitivity to a cytotoxic agent, or identify co-treatments that may sensitize or desensitize cells to the agent.
Collapse
Affiliation(s)
- Lit-Hsin Loo
- Bioinformatics Institute, Agency for Science, Technology and Research, 30 Biopolis Street, #07-01 Matrix, Singapore 138671, Singapore
| | - Nicola Michelle Bougen-Zhukov
- Bioinformatics Institute, Agency for Science, Technology and Research, 30 Biopolis Street, #07-01 Matrix, Singapore 138671, Singapore
| | - Wei-Ling Cecilia Tan
- Bioinformatics Institute, Agency for Science, Technology and Research, 30 Biopolis Street, #07-01 Matrix, Singapore 138671, Singapore
| |
Collapse
|
10
|
Arneson D, Shu L, Tsai B, Barrere-Cain R, Sun C, Yang X. Multidimensional Integrative Genomics Approaches to Dissecting Cardiovascular Disease. Front Cardiovasc Med 2017; 4:8. [PMID: 28289683 PMCID: PMC5327355 DOI: 10.3389/fcvm.2017.00008] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Accepted: 02/09/2017] [Indexed: 12/19/2022] Open
Abstract
Elucidating the mechanisms of complex diseases such as cardiovascular disease (CVD) remains a significant challenge due to multidimensional alterations at molecular, cellular, tissue, and organ levels. To better understand CVD and offer insights into the underlying mechanisms and potential therapeutic strategies, data from multiple omics types (genomics, epigenomics, transcriptomics, metabolomics, proteomics, microbiomics) from both humans and model organisms have become available. However, individual omics data types capture only a fraction of the molecular mechanisms. To address this challenge, there have been numerous efforts to develop integrative genomics methods that can leverage multidimensional information from diverse data types to derive comprehensive molecular insights. In this review, we summarize recent methodological advances in multidimensional omics integration, exemplify their applications in cardiovascular research, and pinpoint challenges and future directions in this incipient field.
Collapse
Affiliation(s)
- Douglas Arneson
- Department of Integrative Biology and Physiology, University of California Los Angeles, Los Angeles, CA, USA; Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Le Shu
- Department of Integrative Biology and Physiology, University of California Los Angeles, Los Angeles, CA, USA; Molecular, Cellular, and Integrative Physiology Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Brandon Tsai
- Department of Integrative Biology and Physiology, University of California Los Angeles , Los Angeles, CA , USA
| | - Rio Barrere-Cain
- Department of Integrative Biology and Physiology, University of California Los Angeles , Los Angeles, CA , USA
| | - Christine Sun
- Department of Integrative Biology and Physiology, University of California Los Angeles , Los Angeles, CA , USA
| | - Xia Yang
- Department of Integrative Biology and Physiology, University of California Los Angeles, Los Angeles, CA, USA; Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA; Molecular, Cellular, and Integrative Physiology Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA; Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA; Molecular Biology Institute, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
11
|
Castellani GC, Menichetti G, Garagnani P, Giulia Bacalini M, Pirazzini C, Franceschi C, Collino S, Sala C, Remondini D, Giampieri E, Mosca E, Bersanelli M, Vitali S, Valle IFD, Liò P, Milanesi L. Systems medicine of inflammaging. Brief Bioinform 2016; 17:527-40. [PMID: 26307062 PMCID: PMC4870395 DOI: 10.1093/bib/bbv062] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Revised: 06/29/2015] [Indexed: 12/30/2022] Open
Abstract
Systems Medicine (SM) can be defined as an extension of Systems Biology (SB) to Clinical-Epidemiological disciplines through a shifting paradigm, starting from a cellular, toward a patient centered framework. According to this vision, the three pillars of SM are Biomedical hypotheses, experimental data, mainly achieved by Omics technologies and tailored computational, statistical and modeling tools. The three SM pillars are highly interconnected, and their balancing is crucial. Despite the great technological progresses producing huge amount of data (Big Data) and impressive computational facilities, the Bio-Medical hypotheses are still of primary importance. A paradigmatic example of unifying Bio-Medical theory is the concept of Inflammaging. This complex phenotype is involved in a large number of pathologies and patho-physiological processes such as aging, age-related diseases and cancer, all sharing a common inflammatory pathogenesis. This Biomedical hypothesis can be mapped into an ecological perspective capable to describe by quantitative and predictive models some experimentally observed features, such as microenvironment, niche partitioning and phenotype propagation. In this article we show how this idea can be supported by computational methods useful to successfully integrate, analyze and model large data sets, combining cross-sectional and longitudinal information on clinical, environmental and omics data of healthy subjects and patients to provide new multidimensional biomarkers capable of distinguishing between different pathological conditions, e.g. healthy versus unhealthy state, physiological versus pathological aging.
Collapse
|
12
|
Identifying genetic modulators of the connectivity between transcription factors and their transcriptional targets. Proc Natl Acad Sci U S A 2016; 113:E1835-43. [PMID: 26966232 DOI: 10.1073/pnas.1517140113] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Regulation of gene expression by transcription factors (TFs) is highly dependent on genetic background and interactions with cofactors. Identifying specific context factors is a major challenge that requires new approaches. Here we show that exploiting natural variation is a potent strategy for probing functional interactions within gene regulatory networks. We developed an algorithm to identify genetic polymorphisms that modulate the regulatory connectivity between specific transcription factors and their target genes in vivo. As a proof of principle, we mapped connectivity quantitative trait loci (cQTLs) using parallel genotype and gene expression data for segregants from a cross between two strains of the yeast Saccharomyces cerevisiae We identified a nonsynonymous mutation in the DIG2 gene as a cQTL for the transcription factor Ste12p and confirmed this prediction empirically. We also identified three polymorphisms in TAF13 as putative modulators of regulation by Gcn4p. Our method has potential for revealing how genetic differences among individuals influence gene regulatory networks in any organism for which gene expression and genotype data are available along with information on binding preferences for transcription factors.
Collapse
|
13
|
Gietzelt M, Löpprich M, Karmen C, Knaup P, Ganzinger M. Models and Data Sources Used in Systems Medicine. A Systematic Literature Review. Methods Inf Med 2016; 55:107-13. [PMID: 26846174 DOI: 10.3414/me15-01-0151] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Accepted: 01/18/2016] [Indexed: 12/06/2024]
Abstract
BACKGROUND Systems medicine is a new approach for the development and selection of treatment strategies for patients with complex diseases. It is often referred to as the application of systems biology methods for decision making in patient care. For systems medicine computer applications, many different data sources have to be integrated and included into models. This is a challenging task for Medical Informatics since the approach exceeds traditional systems like Electronic Health Records. To prioritize research activities for systems medicine applications, it is necessary to get an overview over modelling methods and data sources already used in this field. OBJECTIVES We performed a systematic literature review with the objective to capture current use of 1) modelling methods and 2) data sources in systems medicine related research projects. METHODS We queried the MEDLINE and ScienceDirect databases for papers associated with the search term systems medicine and related terms. Papers were screened and assessed in full text in a two-step process according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement guidelines. RESULTS The queries returned 698 articles of which 34 papers were finally included into the study. A multitude of modelling approaches such as machine learning and network analysis was identified and classified. Since these approaches are also used in other domains, no methods specific for systems medicine could be identified. Omics data are the most widely used data types followed by clinical data. Most studies only include a rather limited number of data sources. CONCLUSIONS Currently, many different modelling approaches are used in systems medicine. Thus, highly flexible modular solutions are necessary for systems medicine clinical applications. However, the number of data sources included into the models is limited and most projects currently focus on prognosis. To leverage the potential of systems medicine further, it will be necessary to focus on treatment strategies for patients and consider a broader range of data.
Collapse
Affiliation(s)
| | | | | | | | - M Ganzinger
- Matthias Ganzinger, Heidelberg University, Institute of Medical Biometry and Informatics, Im Neuenheimer Feld 130.3, 69120 Heidelberg, Germany, E-mail:
| |
Collapse
|
14
|
Bersanelli M, Mosca E, Remondini D, Giampieri E, Sala C, Castellani G, Milanesi L. Methods for the integration of multi-omics data: mathematical aspects. BMC Bioinformatics 2016; 17 Suppl 2:15. [PMID: 26821531 PMCID: PMC4959355 DOI: 10.1186/s12859-015-0857-9] [Citation(s) in RCA: 246] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Methods for the integrative analysis of multi-omics data are required to draw a more complete and accurate picture of the dynamics of molecular systems. The complexity of biological systems, the technological limits, the large number of biological variables and the relatively low number of biological samples make the analysis of multi-omics datasets a non-trivial problem. RESULTS AND CONCLUSIONS We review the most advanced strategies for integrating multi-omics datasets, focusing on mathematical and methodological aspects.
Collapse
Affiliation(s)
- Matteo Bersanelli
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy. .,Institute of Biomedical Technologies - CNR, Via Fratelli Cervi 93, Segrate MI, 20090, Italy.
| | - Ettore Mosca
- Institute of Biomedical Technologies - CNR, Via Fratelli Cervi 93, Segrate MI, 20090, Italy.
| | - Daniel Remondini
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Enrico Giampieri
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Claudia Sala
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Gastone Castellani
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Luciano Milanesi
- Institute of Biomedical Technologies - CNR, Via Fratelli Cervi 93, Segrate MI, 20090, Italy.
| |
Collapse
|
15
|
Chen BJ, Litvin O, Ungar L, Pe’er D. Context Sensitive Modeling of Cancer Drug Sensitivity. PLoS One 2015; 10:e0133850. [PMID: 26274927 PMCID: PMC4537214 DOI: 10.1371/journal.pone.0133850] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2015] [Accepted: 07/03/2015] [Indexed: 12/19/2022] Open
Abstract
Recent screening of drug sensitivity in large panels of cancer cell lines provides a valuable resource towards developing algorithms that predict drug response. Since more samples provide increased statistical power, most approaches to prediction of drug sensitivity pool multiple cancer types together without distinction. However, pan-cancer results can be misleading due to the confounding effects of tissues or cancer subtypes. On the other hand, independent analysis for each cancer-type is hampered by small sample size. To balance this trade-off, we present CHER (Contextual Heterogeneity Enabled Regression), an algorithm that builds predictive models for drug sensitivity by selecting predictive genomic features and deciding which ones should-and should not-be shared across different cancers, tissues and drugs. CHER provides significantly more accurate models of drug sensitivity than comparable elastic-net-based models. Moreover, CHER provides better insight into the underlying biological processes by finding a sparse set of shared and type-specific genomic features.
Collapse
Affiliation(s)
- Bo-Juen Chen
- Department of Biomedical Informatics, Columbia University, New York, New York, 10032, United States of America
- Department of Biological Sciences, Department of Systems Biology, Columbia University, New York, New York, 10027, United States of America
| | - Oren Litvin
- Department of Biological Sciences, Department of Systems Biology, Columbia University, New York, New York, 10027, United States of America
| | - Lyle Ungar
- Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, 19104, United States of America
| | - Dana Pe’er
- Department of Biological Sciences, Department of Systems Biology, Columbia University, New York, New York, 10027, United States of America
- * E-mail:
| |
Collapse
|
16
|
Predicting the phenotypic values of physiological traits using SNP genotype and gene expression data in mice. PLoS One 2014; 9:e115532. [PMID: 25541966 PMCID: PMC4277360 DOI: 10.1371/journal.pone.0115532] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2014] [Accepted: 11/25/2014] [Indexed: 01/22/2023] Open
Abstract
Predicting phenotypes using genome-wide genetic variation and gene expression data is useful in several fields, such as human biology and medicine, as well as in crop and livestock breeding. However, for phenotype prediction using gene expression data for mammals, studies remain scarce, as the available data on gene expression profiling are currently limited. By integrating a few sources of relevant data that are available in mice, this study investigated the accuracy of phenotype prediction for several physiological traits. Gene expression data from two tissues as well as single nucleotide polymorphisms (SNPs) were used. For the studied traits, the variance of the effects of the expression levels was more likely to differ among the genes than were the effects of SNPs. For the glucose concentration, the total cholesterol amount, and the total tidal volume, the accuracy by cross validation tended to be higher when the gene expression data rather than the SNP genotype data were used, and a statistically significant increase in the accuracy was obtained when the gene expression data from the liver were used alone or jointly with the SNP genotype data. For these traits, there were no additional gains in accuracy from using the gene expression data of both the liver and lung compared to that of individual use. The accuracy of prediction using genes that were selected differently was examined; the use of genes with a higher tissue specificity tended to result in an accuracy that was similar to or greater than that associated with the use of all of the available genes for traits such as the glucose concentration and total cholesterol amount. Although relatively few animals were evaluated, the current results suggest that gene expression levels could be used as explanatory variables. However, further studies are essential to confirm our findings using additional animal samples.
Collapse
|
17
|
Prediction of dynamical drug sensitivity and resistance by module network rewiring-analysis based on transcriptional profiling. Drug Resist Updat 2014; 17:64-76. [PMID: 25156319 DOI: 10.1016/j.drup.2014.08.002] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Revealing functional reorganization or module rewiring between modules at network levels during drug treatment is important to systematically understand therapies and drug responses. The present article proposed a novel model of module network rewiring to characterize functional reorganization of a complex biological system, and described a new framework named as module network rewiring-analysis (MNR) for systematically studying dynamical drug sensitivity and resistance during drug treatment. MNR was used to investigate functional reorganization or rewiring on the module network, rather than molecular network or individual molecules. Our experiments on expression data of patients with Hepatitis C virus infection receiving Interferon therapy demonstrated that consistent module genes derived by MNR could be directly used to reveal new genotypes relevant to drug sensitivity, unlike the other differential analyses of gene expressions. Our results showed that functional connections and reconnections among consistent modules bridged by biological paths were necessary for achieving effective responses of a drug. The hierarchical structures of the temporal module network can be considered as spatio-temporal biomarkers to monitor the efficacy, efficiency, toxicity, and resistance of the therapy. Our study indicates that MNR is a useful tool to identify module biomarkers and further predict dynamical drug sensitivity and resistance, characterize complex dynamic processes for therapy response, and provide biologically systematic clues for pharmacogenomic applications.
Collapse
|
18
|
Harnessing natural sequence variation to dissect posttranscriptional regulatory networks in yeast. G3-GENES GENOMES GENETICS 2014; 4:1539-53. [PMID: 24938291 PMCID: PMC4132183 DOI: 10.1534/g3.114.012039] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Understanding how genomic variation influences phenotypic variation through the molecular networks of the cell is one of the central challenges of biology. Transcriptional regulation has received much attention, but equally important is the posttranscriptional regulation of mRNA stability. Here we applied a systems genetics approach to dissect posttranscriptional regulatory networks in the budding yeast Saccharomyces cerevisiae. Quantitative sequence-to-affinity models were built from high-throughput in vivo RNA binding protein (RBP) binding data for 15 yeast RBPs. Integration of these models with genome-wide mRNA expression data allowed us to estimate protein-level RBP regulatory activity for individual segregants from a genetic cross between two yeast strains. Treating these activities as a quantitative trait, we mapped trans-acting loci (activity quantitative trait loci, or aQTLs) that act via posttranscriptional regulation of transcript stability. We predicted and experimentally confirmed that a coding polymorphism at the IRA2 locus modulates Puf4p activity. Our results also indicate that Puf3p activity is modulated by distinct loci, depending on whether it acts via the 5′ or the 3′ untranslated region of its target mRNAs. Together, our results validate a general strategy for dissecting the connectivity between posttranscriptional regulators and their upstream signaling pathways.
Collapse
|
19
|
Kristensen VN, Lingjærde OC, Russnes HG, Vollan HKM, Frigessi A, Børresen-Dale AL. Principles and methods of integrative genomic analyses in cancer. Nat Rev Cancer 2014; 14:299-313. [PMID: 24759209 DOI: 10.1038/nrc3721] [Citation(s) in RCA: 249] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Combined analyses of molecular data, such as DNA copy-number alteration, mRNA and protein expression, point to biological functions and molecular pathways being deregulated in multiple cancers. Genomic, metabolomic and clinical data from various solid cancers and model systems are emerging and can be used to identify novel patient subgroups for tailored therapy and monitoring. The integrative genomics methodologies that are used to interpret these data require expertise in different disciplines, such as biology, medicine, mathematics, statistics and bioinformatics, and they can seem daunting. The objectives, methods and computational tools of integrative genomics that are available to date are reviewed here, as is their implementation in cancer research.
Collapse
Affiliation(s)
- Vessela N Kristensen
- 1] Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway. [2] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway. [3] Department of Clinical Molecular Oncology, Division of Medicine, Akershus University Hospital, 1478 Ahus, Norway
| | - Ole Christian Lingjærde
- 1] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway. [2] Division for Biomedical Informatics, Department of Computer Science, University of Oslo, 0316 Oslo, Norway
| | - Hege G Russnes
- 1] Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway. [2] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway. [3] Department of Pathology, Oslo University Hospital, 0450 Oslo, Norway
| | - Hans Kristian M Vollan
- 1] Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway. [2] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway. [3] Department of Oncology, Division of Cancer, Surgery and Transplantation, Oslo University Hospital, 0450 Oslo, Norway
| | - Arnoldo Frigessi
- 1] Statistics for Innovation, Norwegian Computing Center, 0314 Oslo, Norway. [2] Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, PO Box 1122 Blindern, 0317 Oslo, Norway
| | - Anne-Lise Børresen-Dale
- 1] Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway. [2] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway
| |
Collapse
|
20
|
Designing of promiscuous inhibitors against pancreatic cancer cell lines. Sci Rep 2014; 4:4668. [PMID: 24728108 PMCID: PMC3985076 DOI: 10.1038/srep04668] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Accepted: 03/17/2014] [Indexed: 01/02/2023] Open
Abstract
Pancreatic cancer remains the most devastating disease with worst prognosis. There is a pressing need to accelerate the drug discovery process to identify new effective drug candidates against pancreatic cancer. We have developed QSAR models for predicting promiscuous inhibitors using the pharmacological data. Our models achieved maximum Pearson correlation coefficient of 0.86, when evaluated on 10-fold cross-validation. Our models have also successfully validated the drug-to-oncogene relationship and further we used these models to screen FDA approved drugs and tested them in vitro. We have integrated these models in a webserver named as DiPCell, which will be useful for screening and designing novel promiscuous drug molecules. We have also identified the most and least effective drugs for pancreatic cancer cell lines. On the other side, we have identified resistant pancreatic cancer cell lines, which need investigative scanner on them to put light on resistant mechanism in pancreatic cancer.
Collapse
|
21
|
Seoane JA, Day INM, Gaunt TR, Campbell C. A pathway-based data integration framework for prediction of disease progression. ACTA ACUST UNITED AC 2013; 30:838-45. [PMID: 24162466 PMCID: PMC3957070 DOI: 10.1093/bioinformatics/btt610] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Motivation: Within medical research there is an increasing trend toward deriving multiple types of data from the same individual. The most effective prognostic prediction methods should use all available data, as this maximizes the amount of information used. In this article, we consider a variety of learning strategies to boost prediction performance based on the use of all available data. Implementation: We consider data integration via the use of multiple kernel learning supervised learning methods. We propose a scheme in which feature selection by statistical score is performed separately per data type and by pathway membership. We further consider the introduction of a confidence measure for the class assignment, both to remove some ambiguously labeled datapoints from the training data and to implement a cautious classifier that only makes predictions when the associated confidence is high. Results: We use the METABRIC dataset for breast cancer, with prediction of survival at 2000 days from diagnosis. Predictive accuracy is improved by using kernels that exclusively use those genes, as features, which are known members of particular pathways. We show that yet further improvements can be made by using a range of additional kernels based on clinical covariates such as Estrogen Receptor (ER) status. Using this range of measures to improve prediction performance, we show that the test accuracy on new instances is nearly 80%, though predictions are only made on 69.2% of the patient cohort. Availability:https://github.com/jseoane/FSMKL Contact:J.Seoane@bristol.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- José A Seoane
- MRC Centre for Causal Analyses in Translational Epidemiology, MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Clifton BS8 2BN, UK and Intelligent Systems Laboratory, University of Bristol, Bristol BS8 1UB, UK
| | | | | | | |
Collapse
|
22
|
Gagneur J, Stegle O, Zhu C, Jakob P, Tekkedil MM, Aiyar RS, Schuon AK, Pe'er D, Steinmetz LM. Genotype-environment interactions reveal causal pathways that mediate genetic effects on phenotype. PLoS Genet 2013; 9:e1003803. [PMID: 24068968 PMCID: PMC3778020 DOI: 10.1371/journal.pgen.1003803] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Accepted: 07/30/2013] [Indexed: 01/28/2023] Open
Abstract
Unraveling the molecular processes that lead from genotype to phenotype is crucial for the understanding and effective treatment of genetic diseases. Knowledge of the causative genetic defect most often does not enable treatment; therefore, causal intermediates between genotype and phenotype constitute valuable candidates for molecular intervention points that can be therapeutically targeted. Mapping genetic determinants of gene expression levels (also known as expression quantitative trait loci or eQTL studies) is frequently used for this purpose, yet distinguishing causation from correlation remains a significant challenge. Here, we address this challenge using extensive, multi-environment gene expression and fitness profiling of hundreds of genetically diverse yeast strains, in order to identify truly causal intermediate genes that condition fitness in a given environment. Using functional genomics assays, we show that the predictive power of eQTL studies for inferring causal intermediate genes is poor unless performed across multiple environments. Surprisingly, although the effects of genotype on fitness depended strongly on environment, causal intermediates could be most reliably predicted from genetic effects on expression present in all environments. Our results indicate a mechanism explaining this apparent paradox, whereby immediate molecular consequences of genetic variation are shared across environments, and environment-dependent phenotypic effects result from downstream integration of environmental signals. We developed a statistical model to predict causal intermediates that leverages this insight, yielding over 400 transcripts, for the majority of which we experimentally validated their role in conditioning fitness. Our findings have implications for the design and analysis of clinical omics studies aimed at discovering personalized targets for molecular intervention, suggesting that inferring causation in a single cellular context can benefit from molecular profiling in multiple contexts.
Collapse
Affiliation(s)
- Julien Gagneur
- Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Chenchen Zhu
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Petra Jakob
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Manu M. Tekkedil
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Raeka S. Aiyar
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Ann-Kathrin Schuon
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Dana Pe'er
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
| | - Lars M. Steinmetz
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
- Stanford Genome Technology Center, Palo Alto, California, United States of America
- * E-mail:
| |
Collapse
|
23
|
Kim YA, Przytycka TM. Bridging the Gap between Genotype and Phenotype via Network Approaches. Front Genet 2013; 3:227. [PMID: 23755063 PMCID: PMC3668153 DOI: 10.3389/fgene.2012.00227] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2012] [Accepted: 10/10/2012] [Indexed: 11/15/2022] Open
Abstract
In the last few years we have witnessed tremendous progress in detecting associations between genetic variations and complex traits. While genome-wide association studies have been able to discover genomic regions that may influence many common human diseases, these discoveries created an urgent need for methods that extend the knowledge of genotype-phenotype relationships to the level of the molecular mechanisms behind them. To address this emerging need, computational approaches increasingly utilize a pathway-centric perspective. These new methods often utilize known or predicted interactions between genes and/or gene products. In this review, we survey recently developed network based methods that attempt to bridge the genotype-phenotype gap. We note that although these methods help narrow the gap between genotype and phenotype relationships, these approaches alone cannot provide the precise details of underlying mechanisms and current research is still far from closing the gap.
Collapse
Affiliation(s)
- Yoo-Ah Kim
- National Center for Biotechnology Information, National Institutes of Health, National Library of Medicine Bethesda, MD, USA
| | | |
Collapse
|
24
|
Xie L, Ng C, Ali T, Valencia R, Ferreira BL, Xue V, Tanweer M, Zhou D, Haddad GG, Bourne PE, Xie L. Multiscale modeling of the causal functional roles of nsSNPs in a genome-wide association study: application to hypoxia. BMC Genomics 2013; 14 Suppl 3:S9. [PMID: 23819581 PMCID: PMC3665574 DOI: 10.1186/1471-2164-14-s3-s9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND It is a great challenge of modern biology to determine the functional roles of non-synonymous Single Nucleotide Polymorphisms (nsSNPs) on complex phenotypes. Statistical and machine learning techniques establish correlations between genotype and phenotype, but may fail to infer the biologically relevant mechanisms. The emerging paradigm of Network-based Association Studies aims to address this problem of statistical analysis. However, a mechanistic understanding of how individual molecular components work together in a system requires knowledge of molecular structures, and their interactions. RESULTS To address the challenge of understanding the genetic, molecular, and cellular basis of complex phenotypes, we have, for the first time, developed a structural systems biology approach for genome-wide multiscale modeling of nsSNPs--from the atomic details of molecular interactions to the emergent properties of biological networks. We apply our approach to determine the functional roles of nsSNPs associated with hypoxia tolerance in Drosophila melanogaster. The integrated view of the functional roles of nsSNP at both molecular and network levels allows us to identify driver mutations and their interactions (epistasis) in H, Rad51D, Ulp1, Wnt5, HDAC4, Sol, Dys, GalNAc-T2, and CG33714 genes, all of which are involved in the up-regulation of Notch and Gurken/EGFR signaling pathways. Moreover, we find that a large fraction of the driver mutations are neither located in conserved functional sites, nor responsible for structural stability, but rather regulate protein activity through allosteric transitions, protein-protein interactions, or protein-nucleic acid interactions. This finding should impact future Genome-Wide Association Studies. CONCLUSIONS Our studies demonstrate that the consolidation of statistical, structural, and network views of biomolecules and their interactions can provide new insight into the functional role of nsSNPs in Genome-Wide Association Studies, in a way that neither the knowledge of molecular structures nor biological networks alone could achieve. Thus, multiscale modeling of nsSNPs may prove to be a powerful tool for establishing the functional roles of sequence variants in a wide array of applications.
Collapse
Affiliation(s)
- Li Xie
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Menden MP, Iorio F, Garnett M, McDermott U, Benes CH, Ballester PJ, Saez-Rodriguez J. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS One 2013; 8:e61318. [PMID: 23646105 PMCID: PMC3640019 DOI: 10.1371/journal.pone.0061318] [Citation(s) in RCA: 290] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2012] [Accepted: 03/07/2013] [Indexed: 12/24/2022] Open
Abstract
Predicting the response of a specific cancer to a therapy is a major goal in modern oncology that should ultimately lead to a personalised treatment. High-throughput screenings of potentially active compounds against a panel of genomically heterogeneous cancer cell lines have unveiled multiple relationships between genomic alterations and drug responses. Various computational approaches have been proposed to predict sensitivity based on genomic features, while others have used the chemical properties of the drugs to ascertain their effect. In an effort to integrate these complementary approaches, we developed machine learning models to predict the response of cancer cell lines to drug treatment, quantified through IC50 values, based on both the genomic features of the cell lines and the chemical properties of the considered drugs. Models predicted IC50 values in a 8-fold cross-validation and an independent blind test with coefficient of determination R2 of 0.72 and 0.64 respectively. Furthermore, models were able to predict with comparable accuracy (R2 of 0.61) IC50s of cell lines from a tissue not used in the training stage. Our in silico models can be used to optimise the experimental design of drug-cell screenings by estimating a large proportion of missing IC50 values rather than experimentally measuring them. The implications of our results go beyond virtual drug screening design: potentially thousands of drugs could be probed in silico to systematically test their potential efficacy as anti-tumour agents based on their structure, thus providing a computational framework to identify new drug repositioning opportunities as well as ultimately be useful for personalized medicine by linking the genomic traits of patients to drug sensitivity.
Collapse
Affiliation(s)
- Michael P. Menden
- European Bioinformatics Institute, Wellcome Trust Genome Campus–Cambridge, Cambridge, United Kingdom
| | - Francesco Iorio
- European Bioinformatics Institute, Wellcome Trust Genome Campus–Cambridge, Cambridge, United Kingdom
- Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus-Cambridge, Cambridge, United Kingdom
| | - Mathew Garnett
- Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus-Cambridge, Cambridge, United Kingdom
| | - Ultan McDermott
- Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus-Cambridge, Cambridge, United Kingdom
| | - Cyril H. Benes
- Center for Molecular Therapeutics, Massachusetts General Hospital Cancer Center and Harvard Medical School, Charlestown, Massachusetts, United States of America
| | - Pedro J. Ballester
- European Bioinformatics Institute, Wellcome Trust Genome Campus–Cambridge, Cambridge, United Kingdom
- * E-mail: (PJB); (JS-R)
| | - Julio Saez-Rodriguez
- European Bioinformatics Institute, Wellcome Trust Genome Campus–Cambridge, Cambridge, United Kingdom
- * E-mail: (PJB); (JS-R)
| |
Collapse
|
26
|
Chen Z, Zhang W. Integrative analysis using module-guided random forests reveals correlated genetic factors related to mouse weight. PLoS Comput Biol 2013; 9:e1002956. [PMID: 23505362 PMCID: PMC3591263 DOI: 10.1371/journal.pcbi.1002956] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2012] [Accepted: 01/14/2013] [Indexed: 01/07/2023] Open
Abstract
Complex traits such as obesity are manifestations of intricate interactions of multiple genetic factors. However, such relationships are difficult to identify. Thanks to the recent advance in high-throughput technology, a large amount of data has been collected for various complex traits, including obesity. These data often measure different biological aspects of the traits of interest, including genotypic variations at the DNA level and gene expression alterations at the RNA level. Integration of such heterogeneous data provides promising opportunities to understand the genetic components and possibly genetic architecture of complex traits. In this paper, we propose a machine learning based method, module-guided Random Forests (mgRF), to integrate genotypic and gene expression data to investigate genetic factors and molecular mechanism underlying complex traits. mgRF is an augmented Random Forests method enhanced by a network analysis for identifying multiple correlated variables of different types. We applied mgRF to genetic markers and gene expression data from a cohort of F2 female mouse intercross. mgRF outperformed several existing methods in our extensive comparison. Our new approach has an improved performance when combining both genotypic and gene expression data compared to using either one of the two types of data alone. The resulting predictive variables identified by mgRF provide information of perturbed pathways that are related to body weight. More importantly, the results uncovered intricate interactions among genetic markers and genes that have been overlooked if only one type of data was examined. Our results shed light on genetic mechanisms of obesity and our approach provides a promising complementary framework to the “genetics of gene expression” analysis for integrating genotypic and gene expression information for analyzing complex traits. Obesity has become a perilous global epidemic that can lead to complex diseases, such as diabetes and cardiovascular diseases. Much effort has been devoted to the studies of the genetic mechanisms that pillow the manifestation of obesity. Although a large quantity of experimental data has been accumulated lately using high-throughput techniques, our understanding of genetic mechanisms of obesity is still limited. The proposed method is motivated to address three critical issues that have impeded the existing methods. The first is the curse of dimensionality in selecting a subset of genetic elements related to the traits of interest from a large number of candidates. The second is genetic multiplicity underlying non-Mendelian traits, in which multiple genes are in interplay. The third issue is the integration of data from multiple sources in light of genetic multiplicity and curse of dimensionality. Here, we propose a new method, which augments the Random Forests method with a network-based analysis, to integrate genotypic and gene expression information and identify correlated multiple genetic elements underlying mouse weight. Our results shed light on complex genetic interactions underlying obesity, which can form viable hypotheses worthy of further investigation.
Collapse
Affiliation(s)
- Zheng Chen
- Department of Computer Science and Engineering, Washington University, St. Louis, Missouri, United States of America
| | - Weixiong Zhang
- Department of Computer Science and Engineering, Washington University, St. Louis, Missouri, United States of America
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
27
|
Leveraging models of cell regulation and GWAS data in integrative network-based association studies. Nat Genet 2012; 44:841-7. [PMID: 22836096 DOI: 10.1038/ng.2355] [Citation(s) in RCA: 190] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
28
|
Safi M, Lilien RH. Efficient a Priori Identification of Drug Resistant Mutations Using Dead-End Elimination and MM-PBSA. J Chem Inf Model 2012; 52:1529-41. [DOI: 10.1021/ci200626m] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Maria Safi
- Department of Computer Science, University of Toronto,
Toronto, Ontario M5S 3G4, Canada
| | - Ryan H. Lilien
- Department of Computer Science, University of Toronto,
Toronto, Ontario M5S 3G4, Canada
| |
Collapse
|
29
|
Ma H, Zhao H. iFad: an integrative factor analysis model for drug-pathway association inference. ACTA ACUST UNITED AC 2012; 28:1911-8. [PMID: 22581178 DOI: 10.1093/bioinformatics/bts285] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
MOTIVATION Pathway-based drug discovery considers the therapeutic effects of compounds in the global physiological environment. This approach has been gaining popularity in recent years because the target pathways and mechanism of action for many compounds are still unknown, and there are also some unexpected off-target effects. Therefore, the inference of drug-pathway associations is a crucial step to fully realize the potential of system-based pharmacological research. Transcriptome data offer valuable information on drug-pathway targets because the pathway activities may be reflected through gene expression levels. Hence, it is of great interest to jointly analyze the drug sensitivity and gene expression data from the same set of samples to investigate the gene-pathway-drug-pathway associations. RESULTS We have developed iFad, a Bayesian sparse factor analysis model to jointly analyze the paired gene expression and drug sensitivity datasets measured across the same panel of samples. The model enables direct incorporation of prior knowledge regarding gene-pathway and/or drug-pathway associations to aid the discovery of new association relationships. We use a collapsed Gibbs sampling algorithm for inference. Satisfactory performance of the proposed model was found for both simulated datasets and real data collected on the NCI-60 cell lines. Our results suggest that iFad is a promising approach for the identification of drug targets. This model also provides a general statistical framework for pathway-based integrative analysis of other types of -omics data. AVAILABILITY The R package 'iFad' and real NCI-60 dataset used are available at http://bioinformatics.med.yale.edu/group.
Collapse
Affiliation(s)
- Haisu Ma
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
| | | |
Collapse
|
30
|
Dos Santos SC, Teixeira MC, Cabrito TR, Sá-Correia I. Yeast toxicogenomics: genome-wide responses to chemical stresses with impact in environmental health, pharmacology, and biotechnology. Front Genet 2012; 3:63. [PMID: 22529852 PMCID: PMC3329712 DOI: 10.3389/fgene.2012.00063] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2012] [Accepted: 04/03/2012] [Indexed: 01/20/2023] Open
Abstract
The emerging transdisciplinary field of Toxicogenomics aims to study the cell response to a given toxicant at the genome, transcriptome, proteome, and metabolome levels. This approach is expected to provide earlier and more sensitive biomarkers of toxicological responses and help in the delineation of regulatory risk assessment. The use of model organisms to gather such genomic information, through the exploitation of Omics and Bioinformatics approaches and tools, together with more focused molecular and cellular biology studies are rapidly increasing our understanding and providing an integrative view on how cells interact with their environment. The use of the model eukaryote Saccharomyces cerevisiae in the field of Toxicogenomics is discussed in this review. Despite the limitations intrinsic to the use of such a simple single cell experimental model, S. cerevisiae appears to be very useful as a first screening tool, limiting the use of animal models. Moreover, it is also one of the most interesting systems to obtain a truly global understanding of the toxicological response and resistance mechanisms, being in the frontline of systems biology research and developments. The impact of the knowledge gathered in the yeast model, through the use of Toxicogenomics approaches, is highlighted here by its use in prediction of toxicological outcomes of exposure to pesticides and pharmaceutical drugs, but also by its impact in biotechnology, namely in the development of more robust crops and in the improvement of yeast strains as cell factories.
Collapse
Affiliation(s)
- Sandra C Dos Santos
- Institute for Biotechnology and Bioengineering, Centre for Biological and Chemical Engineering, Instituto Superior Técnico, Technical University of Lisbon Lisbon, Portugal
| | | | | | | |
Collapse
|
31
|
Loh PR, Tucker G, Berger B. Phenotype prediction using regularized regression on genetic data in the DREAM5 Systems Genetics B Challenge. PLoS One 2011; 6:e29095. [PMID: 22216175 PMCID: PMC3247233 DOI: 10.1371/journal.pone.0029095] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2011] [Accepted: 11/21/2011] [Indexed: 01/12/2023] Open
Abstract
A major goal of large-scale genomics projects is to enable the use of data from high-throughput experimental methods to predict complex phenotypes such as disease susceptibility. The DREAM5 Systems Genetics B Challenge solicited algorithms to predict soybean plant resistance to the pathogen Phytophthora sojae from training sets including phenotype, genotype, and gene expression data. The challenge test set was divided into three subcategories, one requiring prediction based on only genotype data, another on only gene expression data, and the third on both genotype and gene expression data. Here we present our approach, primarily using regularized regression, which received the best-performer award for subchallenge B2 (gene expression only). We found that despite the availability of 941 genotype markers and 28,395 gene expression features, optimal models determined by cross-validation experiments typically used fewer than ten predictors, underscoring the importance of strong regularization in noisy datasets with far more features than samples. We also present substantial analysis of the training and test setup of the challenge, identifying high variance in performance on the gold standard test sets.
Collapse
Affiliation(s)
- Po-Ru Loh
- Department of Mathematics and Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - George Tucker
- Department of Mathematics and Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Bonnie Berger
- Department of Mathematics and Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| |
Collapse
|
32
|
Floratos A, Honig B, Pe'er D, Califano A. Using systems and structure biology tools to dissect cellular phenotypes. J Am Med Inform Assoc 2011; 19:171-5. [PMID: 22081223 DOI: 10.1136/amiajnl-2011-000490] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
The Center for the Multiscale Analysis of Genetic Networks (MAGNet, http://magnet.c2b2.columbia.edu) was established in 2005, with the mission of providing the biomedical research community with Structural and Systems Biology algorithms and software tools for the dissection of molecular interactions and for the interaction-based elucidation of cellular phenotypes. Over the last 7 years, MAGNet investigators have developed many novel analysis methodologies, which have led to important biological discoveries, including understanding the role of the DNA shape in protein-DNA binding specificity and the discovery of genes causally related to the presentation of malignant phenotypes, including lymphoma, glioma, and melanoma. Software tools implementing these methodologies have been broadly adopted by the research community and are made freely available through geWorkbench, the Center's integrated analysis platform. Additionally, MAGNet has been instrumental in organizing and developing key conferences and meetings focused on the emerging field of systems biology and regulatory genomics, with special focus on cancer-related research.
Collapse
Affiliation(s)
- Aris Floratos
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | | | | | | |
Collapse
|
33
|
Xie L, Xie L, Kinnings SL, Bourne PE. Novel computational approaches to polypharmacology as a means to define responses to individual drugs. Annu Rev Pharmacol Toxicol 2011; 52:361-79. [PMID: 22017683 DOI: 10.1146/annurev-pharmtox-010611-134630] [Citation(s) in RCA: 152] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Polypharmacology, which focuses on designing therapeutics to target multiple receptors, has emerged as a new paradigm in drug discovery. Polypharmacological effects are an attribute of most, if not all, drug molecules. The efficacy and toxicity of drugs, whether designed as single- or multitarget therapeutics, result from complex interactions between pharmacodynamic, pharmacokinetic, genetic, epigenetic, and environmental factors. Ultimately, to predict a drug response phenotype, it is necessary to understand the change in information flow through cellular networks resulting from dynamic drug-target interactions and the impact that this has on the complete biological system. Although such is a future objective, we review recent progress and challenges in computational techniques that enable the prediction and analysis of in vitro and in vivo drug-response phenotypes.
Collapse
Affiliation(s)
- Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, New York 10065, USA.
| | | | | | | |
Collapse
|
34
|
Principles and strategies for developing network models in cancer. Cell 2011; 144:864-73. [PMID: 21414479 DOI: 10.1016/j.cell.2011.03.001] [Citation(s) in RCA: 150] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2011] [Revised: 02/28/2011] [Accepted: 02/28/2011] [Indexed: 12/13/2022]
Abstract
The flood of genome-wide data generated by high-throughput technologies currently provides biologists with an unprecedented opportunity: to manipulate, query, and reconstruct functional molecular networks of cells. Here, we outline three underlying principles and six strategies to infer network models from genomic data. Then, using cancer as an example, we describe experimental and computational approaches to infer "differential" networks that can identify genes and processes driving disease phenotypes. In conclusion, we discuss how a network-level understanding of cancer can be used to predict drug response and guide therapeutics.
Collapse
|
35
|
Chipman KC, Singh AK. Using stochastic causal trees to augment Bayesian networks for modeling eQTL datasets. BMC Bioinformatics 2011; 12:7. [PMID: 21211042 PMCID: PMC3032670 DOI: 10.1186/1471-2105-12-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2010] [Accepted: 01/06/2011] [Indexed: 11/10/2022] Open
Abstract
Background The combination of genotypic and genome-wide expression data arising from segregating populations offers an unprecedented opportunity to model and dissect complex phenotypes. The immense potential offered by these data derives from the fact that genotypic variation is the sole source of perturbation and can therefore be used to reconcile changes in gene expression programs with the parental genotypes. To date, several methodologies have been developed for modeling eQTL data. These methods generally leverage genotypic data to resolve causal relationships among gene pairs implicated as associates in the expression data. In particular, leading studies have augmented Bayesian networks with genotypic data, providing a powerful framework for learning and modeling causal relationships. While these initial efforts have provided promising results, one major drawback associated with these methods is that they are generally limited to resolving causal orderings for transcripts most proximal to the genomic loci. In this manuscript, we present a probabilistic method capable of learning the causal relationships between transcripts at all levels in the network. We use the information provided by our method as a prior for Bayesian network structure learning, resulting in enhanced performance for gene network reconstruction. Results Using established protocols to synthesize eQTL networks and corresponding data, we show that our method achieves improved performance over existing leading methods. For the goal of gene network reconstruction, our method achieves improvements in recall ranging from 20% to 90% across a broad range of precision levels and for datasets of varying sample sizes. Additionally, we show that the learned networks can be utilized for expression quantitative trait loci mapping, resulting in upwards of 10-fold increases in recall over traditional univariate mapping. Conclusions Using the information from our method as a prior for Bayesian network structure learning yields large improvements in accuracy for the tasks of gene network reconstruction and expression quantitative trait loci mapping. In particular, our method is effective for establishing causal relationships between transcripts located both proximally and distally from genomic loci.
Collapse
Affiliation(s)
- Kyle C Chipman
- Biomolecular Science and Engineering Program, UC Santa Barbara, Santa Barbara, CA, USA.
| | | |
Collapse
|
36
|
Konopka G. Functional genomics of the brain: uncovering networks in the CNS using a systems approach. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2010; 3:628-48. [PMID: 21197665 DOI: 10.1002/wsbm.139] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The central nervous system (CNS) is undoubtedly the most complex human organ system in terms of its diverse functions, cellular composition, and connections. Attempts to capture this diversity experimentally were the foundation on which the field of neurobiology was built. Until now though, techniques were either painstakingly slow or insufficient in capturing this heterogeneity. In addition, the combination of multiple layers of information needed for a complete picture of neuronal diversity from the epigenome to the proteome requires an even more complex compilation of data. In this era of high-throughput genomics though, the ability to isolate and profile neurons and brain tissue has increased tremendously and now requires less effort. Both microarrays and next-generation sequencing have identified neuronal transcriptomes and signaling networks involved in normal brain development, as well as in disease. However, the expertise needed to organize and prioritize the resultant data remains substantial. A combination of supervised organization and unsupervised analyses are needed to fully appreciate the underlying structure in these datasets. When utilized effectively, these analyses have yielded striking insights into a number of fundamental questions in neuroscience on topics ranging from the evolution of the human brain to neuropsychiatric and neurodegenerative disorders. Future studies will incorporate these analyses with behavioral and physiological data from patients to more efficiently move toward personalized therapeutics.
Collapse
Affiliation(s)
- Genevieve Konopka
- Department of Neurology, University of California, Los Angeles, CA, USA.
| |
Collapse
|
37
|
Visscher PM, Goddard ME. Systems genetics: the added value of gene expression. HFSP JOURNAL 2010; 4:6-10. [PMID: 20676303 DOI: 10.2976/1.3292182] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2009] [Indexed: 11/19/2022]
Abstract
Understanding causal relationships between genotypes and phenotypes is a long-standing aim in genetics. In addition to high-throughput technologies that allow the measurement of many DNA variants it is possible to measure gene expression in specific tissues using array technology. "Systems genetics" is an emerging discipline that combines dense data on genotypes, gene expression, and outcome phenotypes to answer fundamental questions about causal pathways from genotype to phenotype. A recent paper by Chen et al. [Mol. Syst. Biol. 5, 310 (2009)] addressed the question of whether relative levels of mRNA expression help to elucidate causal paths from genotype to phenotype, using drug resistance in yeast as a model. The authors show that data on genetic markers and on gene expression, measured in a drug-free environment, can be combined to predict the growth of a yeast strain in the presence of a drug. They argue that their prediction can be used to identify causal pathways and for a subset of the genes used in prediction, the authors demonstrate that these genes cause an effect on drug sensitivity by deleting the gene or overexpressing it or swapping alleles between strains of yeast. This approach can also be applied to other species, including humans, and may become a tool in the study of personalized medicine.
Collapse
|