1
|
Zhang L, Woltering I, Holzner M, Brandhofer M, Schaefer CC, Bushati G, Ebert S, Yang B, Muenchhoff M, Hellmuth JC, Scherer C, Wichmann C, Effinger D, Hübner M, El Bounkari O, Scheiermann P, Bernhagen J, Hoffmann A. CD74 is a functional MIF receptor on activated CD4 + T cells. Cell Mol Life Sci 2024; 81:296. [PMID: 38992165 PMCID: PMC11335222 DOI: 10.1007/s00018-024-05338-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 06/04/2024] [Accepted: 06/27/2024] [Indexed: 07/13/2024]
Abstract
Next to its classical role in MHC II-mediated antigen presentation, CD74 was identified as a high-affinity receptor for macrophage migration inhibitory factor (MIF), a pleiotropic cytokine and major determinant of various acute and chronic inflammatory conditions, cardiovascular diseases and cancer. Recent evidence suggests that CD74 is expressed in T cells, but the functional relevance of this observation is poorly understood. Here, we characterized the regulation of CD74 expression and that of the MIF chemokine receptors during activation of human CD4+ T cells and studied links to MIF-induced T-cell migration, function, and COVID-19 disease stage. MIF receptor profiling of resting primary human CD4+ T cells via flow cytometry revealed high surface expression of CXCR4, while CD74, CXCR2 and ACKR3/CXCR7 were not measurably expressed. However, CD4+ T cells constitutively expressed CD74 intracellularly, which upon T-cell activation was significantly upregulated, post-translationally modified by chondroitin sulfate and could be detected on the cell surface, as determined by flow cytometry, Western blot, immunohistochemistry, and re-analysis of available RNA-sequencing and proteomic data sets. Applying 3D-matrix-based live cell-imaging and receptor pathway-specific inhibitors, we determined a causal involvement of CD74 and CXCR4 in MIF-induced CD4+ T-cell migration. Mechanistically, proximity ligation assay visualized CD74/CXCR4 heterocomplexes on activated CD4+ T cells, which were significantly diminished after MIF treatment, pointing towards a MIF-mediated internalization process. Lastly, in a cohort of 30 COVID-19 patients, CD74 surface expression was found to be significantly upregulated on CD4+ and CD8+ T cells in patients with severe compared to patients with only mild disease course. Together, our study characterizes the MIF receptor network in the course of T-cell activation and reveals CD74 as a novel functional MIF receptor and MHC II-independent activation marker of primary human CD4+ T cells.
Collapse
Affiliation(s)
- Lin Zhang
- Division of Vascular Biology, Institute for Stroke and Dementia Research (ISD), LMU University Hospital (LMU Klinikum), Ludwig-Maximilians-Universität (LMU) München, Feodor-Lynen-Straße 17, 81377, Munich, Germany
| | - Iris Woltering
- Division of Vascular Biology, Institute for Stroke and Dementia Research (ISD), LMU University Hospital (LMU Klinikum), Ludwig-Maximilians-Universität (LMU) München, Feodor-Lynen-Straße 17, 81377, Munich, Germany
| | - Mathias Holzner
- Division of Vascular Biology, Institute for Stroke and Dementia Research (ISD), LMU University Hospital (LMU Klinikum), Ludwig-Maximilians-Universität (LMU) München, Feodor-Lynen-Straße 17, 81377, Munich, Germany
| | - Markus Brandhofer
- Division of Vascular Biology, Institute for Stroke and Dementia Research (ISD), LMU University Hospital (LMU Klinikum), Ludwig-Maximilians-Universität (LMU) München, Feodor-Lynen-Straße 17, 81377, Munich, Germany
| | - Carl-Christian Schaefer
- Division of Vascular Biology, Institute for Stroke and Dementia Research (ISD), LMU University Hospital (LMU Klinikum), Ludwig-Maximilians-Universität (LMU) München, Feodor-Lynen-Straße 17, 81377, Munich, Germany
| | - Genta Bushati
- Division of Vascular Biology, Institute for Stroke and Dementia Research (ISD), LMU University Hospital (LMU Klinikum), Ludwig-Maximilians-Universität (LMU) München, Feodor-Lynen-Straße 17, 81377, Munich, Germany
| | - Simon Ebert
- Division of Vascular Biology, Institute for Stroke and Dementia Research (ISD), LMU University Hospital (LMU Klinikum), Ludwig-Maximilians-Universität (LMU) München, Feodor-Lynen-Straße 17, 81377, Munich, Germany
| | - Bishan Yang
- Division of Vascular Biology, Institute for Stroke and Dementia Research (ISD), LMU University Hospital (LMU Klinikum), Ludwig-Maximilians-Universität (LMU) München, Feodor-Lynen-Straße 17, 81377, Munich, Germany
| | - Maximilian Muenchhoff
- Max von Pettenkofer Institute and Gene Center, Virology, National Reference Center for Retroviruses, Ludwig-Maximilians-Universität (LMU) Munich, Munich, Germany
- German Center for Infection Research (DZIF), Partner Site Munich, Munich, Germany
- COVID-19 Registry of the LMU Munich (CORKUM), LMU University Hospital, Ludwig-Maximilians-Universität (LMU) Munich, Munich, Germany
| | - Johannes C Hellmuth
- COVID-19 Registry of the LMU Munich (CORKUM), LMU University Hospital, Ludwig-Maximilians-Universität (LMU) Munich, Munich, Germany
- Department of Medicine III, LMU University Hospital, Ludwig-Maximilians-Universität (LMU) Munich, Munich, Germany
| | - Clemens Scherer
- COVID-19 Registry of the LMU Munich (CORKUM), LMU University Hospital, Ludwig-Maximilians-Universität (LMU) Munich, Munich, Germany
- Department of Medicine I, LMU University Hospital, Ludwig-Maximilians-Universität (LMU) Munich, Munich, Germany
| | - Christian Wichmann
- Division of Transfusion Medicine, Cell Therapeutics and Haemostaseology, LMU University Hospital, Ludwig-Maximilians-Universität (LMU) Munich, Munich, Germany
| | - David Effinger
- Department of Anaesthesiology, LMU University Hospital, Ludwig-Maximilians-Universität (LMU) Munich, Marchioninistraße 15, 81377, Munich, Germany
- Walter Brendel Centre of Experimental Medicine, Ludwig-Maximilians-Universität (LMU) Munich, Munich, Germany
| | - Max Hübner
- Department of Anaesthesiology, LMU University Hospital, Ludwig-Maximilians-Universität (LMU) Munich, Marchioninistraße 15, 81377, Munich, Germany
- Walter Brendel Centre of Experimental Medicine, Ludwig-Maximilians-Universität (LMU) Munich, Munich, Germany
| | - Omar El Bounkari
- Division of Vascular Biology, Institute for Stroke and Dementia Research (ISD), LMU University Hospital (LMU Klinikum), Ludwig-Maximilians-Universität (LMU) München, Feodor-Lynen-Straße 17, 81377, Munich, Germany
| | - Patrick Scheiermann
- Department of Anaesthesiology, LMU University Hospital, Ludwig-Maximilians-Universität (LMU) Munich, Marchioninistraße 15, 81377, Munich, Germany
| | - Jürgen Bernhagen
- Division of Vascular Biology, Institute for Stroke and Dementia Research (ISD), LMU University Hospital (LMU Klinikum), Ludwig-Maximilians-Universität (LMU) München, Feodor-Lynen-Straße 17, 81377, Munich, Germany.
- German Centre of Cardiovascular Research (DZHK), Partner Site Munich Heart Alliance, Munich, Germany.
| | - Adrian Hoffmann
- Division of Vascular Biology, Institute for Stroke and Dementia Research (ISD), LMU University Hospital (LMU Klinikum), Ludwig-Maximilians-Universität (LMU) München, Feodor-Lynen-Straße 17, 81377, Munich, Germany.
- Department of Anaesthesiology, LMU University Hospital, Ludwig-Maximilians-Universität (LMU) Munich, Marchioninistraße 15, 81377, Munich, Germany.
- German Centre of Cardiovascular Research (DZHK), Partner Site Munich Heart Alliance, Munich, Germany.
| |
Collapse
|
2
|
Moro G, Masseroli M. Gene function finding through cross-organism ensemble learning. BioData Min 2021; 14:14. [PMID: 33579334 PMCID: PMC7879670 DOI: 10.1186/s13040-021-00239-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 01/10/2021] [Indexed: 11/12/2022] Open
Abstract
Background Structured biological information about genes and proteins is a valuable resource to improve discovery and understanding of complex biological processes via machine learning algorithms. Gene Ontology (GO) controlled annotations describe, in a structured form, features and functions of genes and proteins of many organisms. However, such valuable annotations are not always reliable and sometimes are incomplete, especially for rarely studied organisms. Here, we present GeFF (Gene Function Finder), a novel cross-organism ensemble learning method able to reliably predict new GO annotations of a target organism from GO annotations of another source organism evolutionarily related and better studied. Results Using a supervised method, GeFF predicts unknown annotations from random perturbations of existing annotations. The perturbation consists in randomly deleting a fraction of known annotations in order to produce a reduced annotation set. The key idea is to train a supervised machine learning algorithm with the reduced annotation set to predict, namely to rebuild, the original annotations. The resulting prediction model, in addition to accurately rebuilding the original known annotations for an organism from their perturbed version, also effectively predicts new unknown annotations for the organism. Moreover, the prediction model is also able to discover new unknown annotations in different target organisms without retraining.We combined our novel method with different ensemble learning approaches and compared them to each other and to an equivalent single model technique. We tested the method with five different organisms using their GO annotations: Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum. The outcomes demonstrate the effectiveness of the cross-organism ensemble approach, which can be customized with a trade-off between the desired number of predicted new annotations and their precision.A Web application to browse both input annotations used and predicted ones, choosing the ensemble prediction method to use, is publicly available at http://tiny.cc/geff/. Conclusions Our novel cross-organism ensemble learning method provides reliable predicted novel gene annotations, i.e., functions, ranked according to an associated likelihood value. They are very valuable both to speed the annotation curation, focusing it on the prioritized new annotations predicted, and to complement known annotations available.
Collapse
Affiliation(s)
- Gianluca Moro
- DISI - University of Bologna, Via dell'Università, Cesena (FC), Italy.
| | - Marco Masseroli
- DEIB, Politecnico di Milano, Piazza L. Da Vinci 32, Milan, 20133, Italy
| |
Collapse
|
3
|
Choe EK, Lee S, Kim SY, Shivakumar M, Park KJ, Chai YJ, Kim D. Prognostic Effect of Inflammatory Genes on Stage I-III Colorectal Cancer-Integrative Analysis of TCGA Data. Cancers (Basel) 2021; 13:cancers13040751. [PMID: 33670198 PMCID: PMC7916934 DOI: 10.3390/cancers13040751] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Revised: 02/05/2021] [Accepted: 02/07/2021] [Indexed: 12/24/2022] Open
Abstract
Simple Summary Research interest in the role of inflammation in the progression and prognosis of colorectal cancer (CRC) is growing. In this study, we evaluated the expression and DNA methylation levels of inflammation-related genes in CRC tissues using the TCGA-COREAD dataset by integratively combining multi-omics features using machine learning. Statistical analysis was additionally performed to allow for interpretable, understandable, and clinically practical results. An integrative model combining expression, methylation, and clinical features had the highest performance. In multivariate analysis, the methylation levels of CEP250, RAB21, and TNPO3 were significantly associated with overall survival. Our study results implicate the importance of integrating expression and methylation information along with clinical information in the prediction of survival. CEP250, RAB21, and TNPO3 in the prediction model might have a crucial role in CRC prognosis and further improve our understanding of potential mechanisms linking inflammatory reactions and CRC progression. Abstract Background inflammatory status indicators have been reported as prognostic biomarkers of colorectal cancer (CRC). However, since inflammatory interactions with the colon involve various modes of action, the biological mechanism linking inflammation and CRC prognosis has not been fully elucidated. We comprehensively evaluated the predictive roles of the expression and methylation levels of inflammation-related genes for CRC prognosis and their pathophysiological associations. Method. An integrative analysis of 247 patients with stage I-III CRC from The Cancer Genome Atlas was conducted. Lasso-penalized Cox proportional hazards regression (Lasso-Cox) and statistical Cox proportional hazard regression (CPH) were used for the analysis. Results. Models to predict overall survival were designed with respective combinations of clinical variables, including age, sex, stage, gene expression, and methylation. An integrative model combining expression, methylation, and clinical features performed better (median C-index = 0.756) than the model with clinical features alone (median C-index = 0.726). Based on multivariate CPH with features from the best model, the methylation levels of CEP250, RAB21, and TNPO3 were significantly associated with overall survival. They did not share any biological process in functional networks. The 5-year survival rate was 29.8% in the low methylation group of CEP250 and 79.1% in the high methylation group (p < 0.001). Conclusion. Our study results implicate the importance of integrating expression and methylation information along with clinical information in the prediction of survival. CEP250, RAB21, and TNPO3 in the prediction model might have a crucial role in CRC prognosis and further improve our understanding of potential mechanisms linking inflammatory reactions and CRC progression.
Collapse
Affiliation(s)
- Eun Kyung Choe
- Department of Surgery, Seoul National University Hospital Healthcare System Gangnam Center, Seoul 06236, Korea;
- Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-6116, USA; (S.Y.K.); (M.S.)
- Department of Surgery, Seoul National University College of Medicine, Seoul 03080, Korea;
| | - Sangwoo Lee
- Department of Future Convergence, Cyber University of Korea, Seoul 03051, Korea;
| | - So Yeon Kim
- Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-6116, USA; (S.Y.K.); (M.S.)
- Department of Software and Computer Engineering, Ajou University, Suwon 16499, Korea
| | - Manu Shivakumar
- Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-6116, USA; (S.Y.K.); (M.S.)
| | - Kyu Joo Park
- Department of Surgery, Seoul National University College of Medicine, Seoul 03080, Korea;
| | - Young Jun Chai
- Department of Surgery, Seoul Metropolitan Government—Seoul National University Boramae Medical Center, Seoul 07061, Korea;
| | - Dokyoon Kim
- Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-6116, USA; (S.Y.K.); (M.S.)
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104-6116, USA
- Correspondence: ; Tel.: +1-215-573-5336
| |
Collapse
|
4
|
Parvandeh S, McKinney BA. EpistasisRank and EpistasisKatz: interaction network centrality methods that integrate prior knowledge networks. Bioinformatics 2020; 35:2329-2331. [PMID: 30481259 PMCID: PMC7963083 DOI: 10.1093/bioinformatics/bty965] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 11/09/2018] [Accepted: 11/26/2018] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION An important challenge in gene expression analysis is to improve hub gene selection to enrich for biological relevance or improve classification accuracy for a given phenotype. In order to incorporate phenotypic context into co-expression, we recently developed an epistasis-expression network centrality method that blends the importance of gene-gene interactions (epistasis) and main effects of genes. Further blending of prior knowledge from functional interactions has the potential to enrich for relevant genes and stabilize classification. RESULTS We develop two new expression-epistasis centrality methods that incorporate interaction prior knowledge. The first extends our SNPrank (EpistasisRank) method by incorporating a gene-wise prior knowledge vector. This prior knowledge vector informs the centrality algorithm of the inclination of a gene to be involved in interactions by incorporating functional interaction information from the Integrative Multi-species Prediction database. The second method extends Katz centrality to expression-epistasis networks (EpistasisKatz), extends the Katz bias to be a gene-wise vector of main effects and extends the Katz attenuation constant prefactor to be a prior-knowledge vector for interactions. Using independent microarray studies of major depressive disorder, we find that including prior knowledge in network centrality feature selection stabilizes the training classification and reduces over-fitting. AVAILABILITY AND IMPLEMENTATION Methods and examples provided at https://github.com/insilico/Rinbix and https://github.com/insilico/PriorKnowledgeEpistasisRank. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Brett A McKinney
- Tandy School of Computer Science.,Department of Mathematics, University of Tulsa, Tulsa, OK, USA
| |
Collapse
|
5
|
Shen L, Thompson PM. Brain Imaging Genomics: Integrated Analysis and Machine Learning. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2020; 108:125-162. [PMID: 31902950 PMCID: PMC6941751 DOI: 10.1109/jproc.2019.2947272] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Brain imaging genomics is an emerging data science field, where integrated analysis of brain imaging and genomics data, often combined with other biomarker, clinical and environmental data, is performed to gain new insights into the phenotypic, genetic and molecular characteristics of the brain as well as their impact on normal and disordered brain function and behavior. It has enormous potential to contribute significantly to biomedical discoveries in brain science. Given the increasingly important role of statistical and machine learning in biomedicine and rapidly growing literature in brain imaging genomics, we provide an up-to-date and comprehensive review of statistical and machine learning methods for brain imaging genomics, as well as a practical discussion on method selection for various biomedical applications.
Collapse
Affiliation(s)
- Li Shen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Paul M Thompson
- Imaging Genetics Center, Mark & Mary Stevens Institute for Neuroimaging & Informatics, Keck School of Medicine, University of Southern California, Los Angeles, CA 90232, USA
| |
Collapse
|
6
|
Ubanako P, Xelwa N, Ntwasa M. LPS induces inflammatory chemokines via TLR-4 signalling and enhances the Warburg Effect in THP-1 cells. PLoS One 2019; 14:e0222614. [PMID: 31560702 PMCID: PMC6764657 DOI: 10.1371/journal.pone.0222614] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Accepted: 09/03/2019] [Indexed: 01/04/2023] Open
Abstract
The Warburg Effect has emerged as a potential drug target because, in some cancer cell lines, it is sufficient to subvert it in order to kill cancer cells. It has also been shown that the Warburg Effect occurs in innate immune cells upon infection. Innate immune cells play critical roles in the tumour microenvironment but the Warburg Effect is not fully understood in monocytes. Furthermore, it is important to understand the impact of infections on key players in the tumour microenvironment because inflammatory conditions often precede carcinogenesis and mutated oncogenes induce inflammation. We investigated the metabolic programme in the acute monocytic leukaemia cell line, THP-1 in the presence and absence of lipopolysaccharide, mimicking bacterial infections. We found that stimulation of THP-1 cells by LPS induces a subset of pro-inflammatory chemokines and enhances the Warburg Effect. Surprisingly, perturbation of the Warburg Effect in these cells does not lead to cell death in contrast to what was observed in non-myeloid cancer cell lines in a previous study. These findings indicate that the Warburg Effect and inflammation are activated by bacterial lipopolysaccharide and may have a profound influence on the microenvironment.
Collapse
Affiliation(s)
- Philemon Ubanako
- School of Molecular & Cell Biology, University of the Witwatersrand, Johannesburg, Republic of South Africa
| | - Ntombikayise Xelwa
- School of Molecular & Cell Biology, University of the Witwatersrand, Johannesburg, Republic of South Africa
| | - Monde Ntwasa
- Department of Life & Consumer Sciences, University of South Africa, Florida, Johannesburg, Republic of South Africa
| |
Collapse
|
7
|
Guala D, Ogris C, Müller N, Sonnhammer ELL. Genome-wide functional association networks: background, data & state-of-the-art resources. Brief Bioinform 2019; 21:1224-1237. [PMID: 31281921 PMCID: PMC7373183 DOI: 10.1093/bib/bbz064] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 04/29/2019] [Accepted: 05/04/2019] [Indexed: 02/06/2023] Open
Abstract
The vast amount of experimental data from recent advances in the field of high-throughput biology begs for integration into more complex data structures such as genome-wide functional association networks. Such networks have been used for elucidation of the interplay of intra-cellular molecules to make advances ranging from the basic science understanding of evolutionary processes to the more translational field of precision medicine. The allure of the field has resulted in rapid growth of the number of available network resources, each with unique attributes exploitable to answer different biological questions. Unfortunately, the high volume of network resources makes it impossible for the intended user to select an appropriate tool for their particular research question. The aim of this paper is to provide an overview of the underlying data and representative network resources as well as to mention methods of integration, allowing a customized approach to resource selection. Additionally, this report will provide a primer for researchers venturing into the field of network integration.
Collapse
Affiliation(s)
- Dimitri Guala
- Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Box 1031, 17121 Solna, Sweden
| | - Christoph Ogris
- Computational Cell Maps, Institute of Computational Biology, Helmholtz Center Munich, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Nikola Müller
- Computational Cell Maps, Institute of Computational Biology, Helmholtz Center Munich, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Erik L L Sonnhammer
- Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Box 1031, 17121 Solna, Sweden
| |
Collapse
|
8
|
Ashtiani M, Nickchi P, Jahangiri-Tazehkand S, Safari A, Mirzaie M, Jafari M. IMMAN: an R/Bioconductor package for Interolog protein network reconstruction, mapping and mining analysis. BMC Bioinformatics 2019; 20:73. [PMID: 30755155 PMCID: PMC6373071 DOI: 10.1186/s12859-019-2659-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Accepted: 01/28/2019] [Indexed: 12/15/2022] Open
Abstract
Background Reconstruction of protein-protein interaction networks (PPIN) has been riddled with controversy for decades. Particularly, false-negative and -positive interactions make this progress even more complicated. Also, lack of a standard PPIN limits us in the comparison studies and results in the incompatible outcomes. Using an evolution-based concept, i.e. interolog which refers to interacting orthologous protein sets, pave the way toward an optimal benchmark. Results Here, we provide an R package, IMMAN, as a tool for reconstructing Interolog Protein Network (IPN) by integrating several Protein-protein Interaction Networks (PPINs). Users can unify different PPINs to mine conserved common networks among species. IMMAN is designed to retrieve IPNs with different degrees of conservation to engage prediction analysis of protein functions according to their networks. Conclusions IPN consists of evolutionarily conserved nodes and their related edges regarding low false positive rates, which can be considered as a gold standard network in the contexts of biological network analysis regarding to those PPINs which is derived from. Electronic supplementary material The online version of this article (10.1186/s12859-019-2659-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Minoo Ashtiani
- School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Payman Nickchi
- School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Soheil Jahangiri-Tazehkand
- School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.,Department of Computer Science, Shahid Beheshti University, Tehran, Iran
| | - Abdollah Safari
- Department of Statistics and Actuarial Science, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada.
| | - Mehdi Mirzaie
- Department of Applied Mathematics, Faculty of Mathematical Sciences, Tarbiat Modares University, Tehran, Iran.
| | - Mohieddin Jafari
- School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran. .,Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
9
|
Kacsoh BZ, Barton S, Jiang Y, Zhou N, Mooney SD, Friedberg I, Radivojac P, Greene CS, Bosco G. New Drosophila Long-Term Memory Genes Revealed by Assessing Computational Function Prediction Methods. G3 (BETHESDA, MD.) 2019; 9:251-267. [PMID: 30463884 PMCID: PMC6325913 DOI: 10.1534/g3.118.200867] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Accepted: 11/20/2018] [Indexed: 01/26/2023]
Abstract
A major bottleneck to our understanding of the genetic and molecular foundation of life lies in the ability to assign function to a gene and, subsequently, a protein. Traditional molecular and genetic experiments can provide the most reliable forms of identification, but are generally low-throughput, making such discovery and assignment a daunting task. The bottleneck has led to an increasing role for computational approaches. The Critical Assessment of Functional Annotation (CAFA) effort seeks to measure the performance of computational methods. In CAFA3, we performed selected screens, including an effort focused on long-term memory. We used homology and previous CAFA predictions to identify 29 key Drosophila genes, which we tested via a long-term memory screen. We identify 11 novel genes that are involved in long-term memory formation and show a high level of connectivity with previously identified learning and memory genes. Our study provides first higher-order behavioral assay and organism screen used for CAFA assessments and revealed previously uncharacterized roles of multiple genes as possible regulators of neuronal plasticity at the boundary of information acquisition and memory formation.
Collapse
Affiliation(s)
- Balint Z Kacsoh
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, 03755, USA
| | - Stephen Barton
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, 03755, USA
| | - Yuxiang Jiang
- Department of Computer Science, Indiana University, Bloomington, IN
| | - Naihui Zhou
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, Iowa 50011
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, Iowa 50011
| | - Predrag Radivojac
- College of Computer and Information Science, Northeastern University, Boston, MA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, PA, 19104
| | - Giovanni Bosco
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, 03755, USA
| |
Collapse
|
10
|
Insulin Signaling Regulates Oocyte Quality Maintenance with Age via Cathepsin B Activity. Curr Biol 2018; 28:753-760.e4. [PMID: 29478855 DOI: 10.1016/j.cub.2018.01.052] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2017] [Revised: 12/15/2017] [Accepted: 01/15/2018] [Indexed: 01/02/2023]
Abstract
A decline in female reproduction is one of the earliest hallmarks of aging in many animals, including invertebrates and mammals [1-4]. The insulin/insulin-like growth factor-1 signaling (IIS) pathway has a conserved role in regulating longevity [5] and also controls reproductive aging [2, 6]. Although IIS transcriptional targets that regulate somatic aging have been characterized [7, 8], it was not known whether the same mechanisms influence reproductive aging. We previously showed that Caenorhabditis elegans daf-2 IIS receptor mutants extend reproductive span by maintaining oocyte quality with age [6], but IIS targets in oocytes had not been identified. Here, we compared the transcriptomes of aged daf-2(-) and wild-type oocytes, and distinguished IIS targets in oocytes from soma-specific targets. Remarkably, IIS appears to regulate reproductive and somatic aging through largely distinct mechanisms, although the binding motif for longevity factor PQM-1 [8] was also overrepresented in oocyte targets. Reduction of oocyte-specific IIS targets decreased reproductive span extension and oocyte viability of daf-2(-) worms, and pqm-1 is required for daf-2(-)'s long reproductive span. Cathepsin-B-like gene expression and activity levels were reduced in aged daf-2(-) oocytes, and RNAi against cathepsin-B-like W07B8.4 improved oocyte quality maintenance and extended reproductive span. Importantly, adult-only pharmacological inhibition of cathepsin B proteases reduced age-dependent deterioration in oocyte quality, even when treatment was initiated in mid-reproduction. This suggests that it is possible to pharmacologically slow age-related reproductive decline through mid-life intervention. Oocyte-specific IIS target genes thereby revealed potential therapeutic targets for maintaining reproductive health with age.
Collapse
|
11
|
Harrington LX, Way GP, Doherty JA, Greene CS. Functional network community detection can disaggregate and filter multiple underlying pathways in enrichment analyses. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:157-167. [PMID: 29218878 PMCID: PMC5760988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Differential expression experiments or other analyses often end in a list of genes. Pathway enrichment analysis is one method to discern important biological signals and patterns from noisy expression data. However, pathway enrichment analysis may perform suboptimally in situations where there are multiple implicated pathways - such as in the case of genes that define subtypes of complex diseases. Our simulation study shows that in this setting, standard overrepresentation analysis identifies many false positive pathways along with the true positives. These false positives hamper investigators' attempts to glean biological insights from enrichment analysis. We develop and evaluate an approach that combines community detection over functional networks with pathway enrichment to reduce false positives. Our simulation study demonstrates that a large reduction in false positives can be obtained with a small decrease in power. Though we hypothesized that multiple communities might underlie previously described subtypes of high-grade serous ovarian cancer and applied this approach, our results do not support this hypothesis. In summary, applying community detection before enrichment analysis may ease interpretation for complex gene sets that represent multiple distinct pathways.
Collapse
Affiliation(s)
- Lia X Harrington
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Hanover 03784, USA,
| | | | | | | |
Collapse
|
12
|
Kacsoh BZ, Greene CS, Bosco G. Machine Learning Analysis Identifies Drosophila Grunge/Atrophin as an Important Learning and Memory Gene Required for Memory Retention and Social Learning. G3 (BETHESDA, MD.) 2017; 7:3705-3718. [PMID: 28889104 PMCID: PMC5677163 DOI: 10.1534/g3.117.300172] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Accepted: 08/07/2017] [Indexed: 12/12/2022]
Abstract
High-throughput experiments are becoming increasingly common, and scientists must balance hypothesis-driven experiments with genome-wide data acquisition. We sought to predict novel genes involved in Drosophila learning and long-term memory from existing public high-throughput data. We performed an analysis using PILGRM, which analyzes public gene expression compendia using machine learning. We evaluated the top prediction alongside genes involved in learning and memory in IMP, an interface for functional relationship networks. We identified Grunge/Atrophin (Gug/Atro), a transcriptional repressor, histone deacetylase, as our top candidate. We find, through multiple, distinct assays, that Gug has an active role as a modulator of memory retention in the fly and its function is required in the adult mushroom body. Depletion of Gug specifically in neurons of the adult mushroom body, after cell division and neuronal development is complete, suggests that Gug function is important for memory retention through regulation of neuronal activity, and not by altering neurodevelopment. Our study provides a previously uncharacterized role for Gug as a possible regulator of neuronal plasticity at the interface of memory retention and memory extinction.
Collapse
Affiliation(s)
- Balint Z Kacsoh
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire 03755
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| | - Giovanni Bosco
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire 03755
| |
Collapse
|
13
|
Taroni JN, Mahoney JM, Whitfield ML. The mechanistic implications of gene expression studies in SSc: Insights from Systems Biology. CURRENT TREATMENT OPTIONS IN RHEUMATOLOGY 2017. [PMID: 29520335 DOI: 10.1007/s40674-017-0072-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Jaclyn N Taroni
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover NH 03755
| | - J Matthew Mahoney
- Department of Neurological Sciences, University of Vermont College of Medicine, Burlington VT
| | - Michael L Whitfield
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover NH 03755.,Program in Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Hanover NH 03755
| |
Collapse
|
14
|
Kim D, Basile AO, Bang L, Horgusluoglu E, Lee S, Ritchie MD, Saykin AJ, Nho K. Knowledge-driven binning approach for rare variant association analysis: application to neuroimaging biomarkers in Alzheimer's disease. BMC Med Inform Decis Mak 2017; 17:61. [PMID: 28539126 PMCID: PMC5444041 DOI: 10.1186/s12911-017-0454-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Background Rapid advancement of next generation sequencing technologies such as whole genome sequencing (WGS) has facilitated the search for genetic factors that influence disease risk in the field of human genetics. To identify rare variants associated with human diseases or traits, an efficient genome-wide binning approach is needed. In this study we developed a novel biological knowledge-based binning approach for rare-variant association analysis and then applied the approach to structural neuroimaging endophenotypes related to late-onset Alzheimer’s disease (LOAD). Methods For rare-variant analysis, we used the knowledge-driven binning approach implemented in Bin-KAT, an automated tool, that provides 1) binning/collapsing methods for multi-level variant aggregation with a flexible, biologically informed binning strategy and 2) an option of performing unified collapsing and statistical rare variant analyses in one tool. A total of 750 non-Hispanic Caucasian participants from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort who had both WGS data and magnetic resonance imaging (MRI) scans were used in this study. Mean bilateral cortical thickness of the entorhinal cortex extracted from MRI scans was used as an AD-related neuroimaging endophenotype. SKAT was used for a genome-wide gene- and region-based association analysis of rare variants (MAF (minor allele frequency) < 0.05) and potential confounding factors (age, gender, years of education, intracranial volume (ICV) and MRI field strength) for entorhinal cortex thickness were used as covariates. Significant associations were determined using FDR adjustment for multiple comparisons. Results Our knowledge-driven binning approach identified 16 functional exonic rare variants in FANCC significantly associated with entorhinal cortex thickness (FDR-corrected p-value < 0.05). In addition, the approach identified 7 evolutionary conserved regions, which were mapped to FAF1, RFX7, LYPLAL1 and GOLGA3, significantly associated with entorhinal cortex thickness (FDR-corrected p-value < 0.05). In further analysis, the functional exonic rare variants in FANCC were also significantly associated with hippocampal volume and cerebrospinal fluid (CSF) Aβ1–42 (p-value < 0.05). Conclusions Our novel binning approach identified rare variants in FANCC as well as 7 evolutionary conserved regions significantly associated with a LOAD-related neuroimaging endophenotype. FANCC (fanconi anemia complementation group C) has been shown to modulate TLR and p38 MAPK-dependent expression of IL-1β in macrophages. Our results warrant further investigation in a larger independent cohort and demonstrate that the biological knowledge-driven binning approach is a powerful strategy to identify rare variants associated with AD and other complex disease.
Collapse
Affiliation(s)
- Dokyoon Kim
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA.,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Anna O Basile
- The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Lisa Bang
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA
| | - Emrin Horgusluoglu
- Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Seunggeun Lee
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Marylyn D Ritchie
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA.,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, USA
| | - Andrew J Saykin
- Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Kwangsik Nho
- Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA.
| |
Collapse
|
15
|
Greene CS, Himmelstein DS. Genetic Association-Guided Analysis of Gene Networks for the Study of Complex Traits. ACTA ACUST UNITED AC 2017; 9:179-84. [PMID: 27094199 DOI: 10.1161/circgenetics.115.001181] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 03/08/2016] [Indexed: 12/29/2022]
Affiliation(s)
- Casey S Greene
- From the Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia (C.S.G.); and Biological and Medical Informatics, University of California, San Francisco (D.S.H.).
| | - Daniel S Himmelstein
- From the Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia (C.S.G.); and Biological and Medical Informatics, University of California, San Francisco (D.S.H.)
| |
Collapse
|
16
|
D'Souza M, Sulakhe D, Wang S, Xie B, Hashemifar S, Taylor A, Dubchak I, Conrad Gilliam T, Maltsev N. Strategic Integration of Multiple Bioinformatics Resources for System Level Analysis of Biological Networks. Methods Mol Biol 2017; 1613:85-99. [PMID: 28849559 DOI: 10.1007/978-1-4939-7027-8_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Recent technological advances in genomics allow the production of biological data at unprecedented tera- and petabyte scales. Efficient mining of these vast and complex datasets for the needs of biomedical research critically depends on a seamless integration of the clinical, genomic, and experimental information with prior knowledge about genotype-phenotype relationships. Such experimental data accumulated in publicly available databases should be accessible to a variety of algorithms and analytical pipelines that drive computational analysis and data mining.We present an integrated computational platform Lynx (Sulakhe et al., Nucleic Acids Res 44:D882-D887, 2016) ( http://lynx.cri.uchicago.edu ), a web-based database and knowledge extraction engine. It provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization. It gives public access to the Lynx integrated knowledge base (LynxKB) and its analytical tools via user-friendly web services and interfaces. The Lynx service-oriented architecture supports annotation and analysis of high-throughput experimental data. Lynx tools assist the user in extracting meaningful knowledge from LynxKB and experimental data, and in the generation of weighted hypotheses regarding the genes and molecular mechanisms contributing to human phenotypes or conditions of interest. The goal of this integrated platform is to support the end-to-end analytical needs of various translational projects.
Collapse
Affiliation(s)
- Mark D'Souza
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA.
- Argonne National Laboratory, Building 221, Room: A142, 9700 South Cass Avenue, Argonne, IL, 60439, USA.
| | - Dinanath Sulakhe
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, 60637, USA
| | - Sheng Wang
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, 60637, USA
| | - Bing Xie
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Department of Computer Science, Illinois Institute of Technology, Chicago, IL, 60616, USA
| | - Somaye Hashemifar
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, 60637, USA
| | - Andrew Taylor
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
| | - Inna Dubchak
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America, Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - T Conrad Gilliam
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, 60637, USA
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, 60637, USA
| |
Collapse
|
17
|
Identifying gene-gene interactions that are highly associated with four quantitative lipid traits across multiple cohorts. Hum Genet 2016; 136:165-178. [PMID: 27848076 DOI: 10.1007/s00439-016-1738-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 10/07/2016] [Indexed: 10/20/2022]
Abstract
Genetic loci explain only 25-30 % of the heritability observed in plasma lipid traits. Epistasis, or gene-gene interactions may contribute to a portion of this missing heritability. Using the genetic data from five NHLBI cohorts of 24,837 individuals, we combined the use of the quantitative multifactor dimensionality reduction (QMDR) algorithm with two SNP-filtering methods to exhaustively search for SNP-SNP interactions that are associated with HDL cholesterol (HDL-C), LDL cholesterol (LDL-C), total cholesterol (TC) and triglycerides (TG). SNPs were filtered either on the strength of their independent effects (main effect filter) or the prior knowledge supporting a given interaction (Biofilter). After the main effect filter, QMDR identified 20 SNP-SNP models associated with HDL-C, 6 associated with LDL-C, 3 associated with TC, and 10 associated with TG (permutation P value <0.05). With the use of Biofilter, we identified 2 SNP-SNP models associated with HDL-C, 3 associated with LDL-C, 1 associated with TC and 8 associated with TG (permutation P value <0.05). In an independent dataset of 7502 individuals from the eMERGE network, we replicated 14 of the interactions identified after main effect filtering: 11 for HDL-C, 1 for LDL-C and 2 for TG. We also replicated 23 of the interactions found to be associated with TG after applying Biofilter. Prior knowledge supports the possible role of these interactions in the genetic etiology of lipid traits. This study also presents a computationally efficient pipeline for analyzing data from large genotyping arrays and detecting SNP-SNP interactions that are not primarily driven by strong main effects.
Collapse
|
18
|
Krishnan A, Taroni JN, Greene CS. Integrative Networks Illuminate Biological Factors Underlying Gene–Disease Associations. CURRENT GENETIC MEDICINE REPORTS 2016. [DOI: 10.1007/s40142-016-0102-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
19
|
McKinney BA, Lareau C, Oberg AL, Kennedy RB, Ovsyannikova IG, Poland GA. The Integration of Epistasis Network and Functional Interactions in a GWAS Implicates RXR Pathway Genes in the Immune Response to Smallpox Vaccine. PLoS One 2016; 11:e0158016. [PMID: 27513748 PMCID: PMC4981436 DOI: 10.1371/journal.pone.0158016] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Accepted: 06/08/2016] [Indexed: 11/24/2022] Open
Abstract
Although many diseases and traits show large heritability, few genetic variants have been found to strongly separate phenotype groups by genotype. Complex regulatory networks of variants and expression of multiple genes lead to small individual-variant effects and difficulty replicating the effect of any single variant in an affected pathway. Interaction network modeling of GWAS identifies effects ignored by univariate models, but population differences may still cause specific genes to not replicate. Integrative network models may help detect indirect effects of variants in the underlying biological pathway. In this study, we used gene-level functional interaction information from the Integrative Multi-species Prediction (IMP) tool to reveal important genes associated with a complex phenotype through evidence from epistasis networks and pathway enrichment. We test this method for augmenting variant-based network analyses with functional interactions by applying it to a smallpox vaccine immune response GWAS. The integrative analysis spotlights the role of genes related to retinoid X receptor alpha (RXRA), which has been implicated in a previous epistasis network analysis of smallpox vaccine.
Collapse
Affiliation(s)
- Brett A. McKinney
- Tandy School of Computer Science and Department of Mathematics, University of Tulsa, Tulsa, OK, United States of America
| | - Caleb Lareau
- Tandy School of Computer Science and Department of Mathematics, University of Tulsa, Tulsa, OK, United States of America
| | - Ann L. Oberg
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States of America
| | - Richard B. Kennedy
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, MN, United States of America
| | - Inna G. Ovsyannikova
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, MN, United States of America
| | - Gregory A. Poland
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, MN, United States of America
- * E-mail:
| |
Collapse
|
20
|
Molecular stratification and precision medicine in systemic sclerosis from genomic and proteomic data. Curr Opin Rheumatol 2016; 28:83-8. [PMID: 26555452 DOI: 10.1097/bor.0000000000000237] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
PURPOSE OF REVIEW The goal of this review is to summarize recent advances into the pathogenesis and treatment of systemic sclerosis (SSc) from genomic and proteomic studies. RECENT FINDINGS Intrinsic gene expression-driven molecular subtypes of SSc are reproducible across three independent datasets. These subsets are a consistent feature of SSc and are found in multiple end-target tissues, such as skin and esophagus. Intrinsic subsets as well as baseline levels of molecular target pathways are potentially predictive of clinical response to specific therapeutics, based on three recent clinical trials. A gene expression-based biomarker of modified Rodnan skin score, a measure of SSc skin severity, can be used as a surrogate outcome metric and has been validated in a recent trial. Proteome analyses have identified novel biomarkers of SSc that correlate with SSc clinical phenotypes. SUMMARY Integrating intrinsic gene expression subset data, baseline molecular pathway information, and serum biomarkers along with surrogate measures of modified Rodnan skin score provides molecular context in SSc clinical trials. With validation, these approaches could be used to match patients with the therapies from which they are most likely to benefit and thus increase the likelihood of clinical improvement.
Collapse
|
21
|
Abstract
“Big Data” has surpassed “systems biology” and “omics” as the hottest buzzword in the biological sciences, but is there any substance behind the hype? Certainly, we have learned about various aspects of cell and molecular biology from the many individual high-throughput data sets that have been published in the past 15–20 years. These data, although useful as individual data sets, can provide much more knowledge when interrogated with Big Data approaches, such as applying integrative methods that leverage the heterogeneous data compendia in their entirety. Here we discuss the benefits and challenges of such Big Data approaches in biology and how cell and molecular biologists can best take advantage of them.
Collapse
Affiliation(s)
- Kara Dolinski
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540
| | - Olga G Troyanskaya
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540 Department of Computer Science, Princeton University, Princeton, NJ 08540 Simons Center for Data Analysis, Simons Foundation, New York, NY 10010
| |
Collapse
|
22
|
Guan Y, Martini S, Mariani LH. Genes Caught In Flagranti: Integrating Renal Transcriptional Profiles With Genotypes and Phenotypes. Semin Nephrol 2016. [PMID: 26215861 DOI: 10.1016/j.semnephrol.2015.04.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
In the past decade, population genetics has gained tremendous success in identifying genetic variations that are statistically relevant to renal diseases and kidney function. However, it is challenging to interpret the functional relevance of the genetic variations found by population genetics studies. In this review, we discuss studies that integrate multiple levels of data, especially transcriptome profiles and phenotype data, to assign functional roles of genetic variations involved in kidney function. Furthermore, we introduce state-of-the-art machine learning algorithms, Bayesian networks, support vector machines, and Gaussian process regression, which have been applied successfully to integrating genetic, regulatory, and clinical information to predict clinical outcomes. These methods are likely to be deployed successfully in the nephrology field in the near future.
Collapse
Affiliation(s)
- Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI; Department of Internal Medicine, University of Michigan, Ann Arbor, MI; Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI
| | - Sebastian Martini
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI; Nephrologisches Zentrum, Medizinische Klinik und Poliklinik IV, Klinikum der Universität München, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Laura H Mariani
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI
| |
Collapse
|
23
|
Abstract
The laboratory mouse is the primary mammalian species used for studying alternative splicing events. Recent studies have generated computational models to predict functions for splice isoforms in the mouse. However, the functional relationship network, describing the probability of splice isoforms participating in the same biological process or pathway, has not yet been studied in the mouse. Here we describe a rich genome-wide resource of mouse networks at the isoform level, which was generated using a unique framework that was originally developed to infer isoform functions. This network was built through integrating heterogeneous genomic and protein data, including RNA-seq, exon array, protein docking and pseudo-amino acid composition. Through simulation and cross-validation studies, we demonstrated the accuracy of the algorithm in predicting isoform-level functional relationships. We showed that this network enables the users to reveal functional differences of the isoforms of the same gene, as illustrated by literature evidence with Anxa6 (annexin a6) as an example. We expect this work will become a useful resource for the mouse genetics community to understand gene functions. The network is publicly available at: http://guanlab.ccmb.med.umich.edu/isoformnetwork.
Collapse
|
24
|
An interaction quantitative trait loci tool implicates epistatic functional variants in an apoptosis pathway in smallpox vaccine eQTL data. Genes Immun 2016; 17:244-50. [PMID: 27052692 DOI: 10.1038/gene.2016.15] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Revised: 12/06/2015] [Accepted: 01/04/2016] [Indexed: 12/17/2022]
Abstract
Expression quantitative trait loci (eQTL) studies have functionalized nucleic acid variants through the regulation of gene expression. Although most eQTL studies only examine the effects of single variants on transcription, a more complex process of variant-variant interaction (epistasis) may regulate transcription. Herein, we describe a tool called interaction QTL (iQTL) designed to efficiently detect epistatic interactions that regulate gene expression. To maximize biological relevance and minimize the computational and hypothesis testing burden, iQTL restricts interactions such that one variant is within a user-defined proximity of the transcript (cis-regulatory). We apply iQTL to a data set of 183 smallpox vaccine study participants with genome-wide association study and gene expression data from unstimulated samples and samples stimulated by inactivated vaccinia virus. While computing only 0.15% of possible interactions, we identify 11 probe sets whose expression is regulated through a variant-variant interaction. We highlight the functional epistatic interactions among apoptosis-related genes, DIABLO, TRAPPC4 and FADD, in the context of smallpox vaccination. We also use an integrative network approach to characterize these iQTL interactions in a posterior network of known prior functional interactions. iQTL is an efficient, open-source tool to analyze variant interactions in eQTL studies, providing better understanding of the function of epistasis in immune response and other complex phenotypes.
Collapse
|
25
|
Tyler AL, Donahue LR, Churchill GA, Carter GW. Weak Epistasis Generally Stabilizes Phenotypes in a Mouse Intercross. PLoS Genet 2016; 12:e1005805. [PMID: 26828925 PMCID: PMC4734753 DOI: 10.1371/journal.pgen.1005805] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2015] [Accepted: 12/21/2015] [Indexed: 01/11/2023] Open
Abstract
The extent and strength of epistasis is commonly unresolved in genetic studies, and observed epistasis is often difficult to interpret in terms of biological consequences or overall genetic architecture. We investigated the prevalence and consequences of epistasis by analyzing four body composition phenotypes—body weight, body fat percentage, femoral density, and femoral circumference—in a large F2 intercross of B6-lit/lit and C3.B6-lit/lit mice. We used Combined Analysis of Pleiotropy and Epistasis (CAPE) to examine interactions for the four phenotypes simultaneously, which revealed an extensive directed network of genetic loci interacting with each other, circulating IGF1, and sex to influence these phenotypes. The majority of epistatic interactions had small effects relative to additive effects of individual loci, and tended to stabilize phenotypes towards the mean of the population rather than extremes. Interactive effects of two alleles inherited from one parental strain commonly resulted in phenotypes closer to the population mean than the additive effects from the two loci, and often much closer to the mean than either single-locus model. Alternatively, combinations of alleles inherited from different parent strains contribute to more extreme phenotypes not observed in either parental strain. This class of phenotype-stabilizing interactions has effects that are close to additive and are thus difficult to detect except in very large intercrosses. Nevertheless, we found these interactions to be useful in generating hypotheses for functional relationships between genetic loci. Our findings suggest that while epistasis is often weak and unlikely to account for a large proportion of heritable variance, even small-effect genetic interactions can facilitate hypotheses of underlying biology in well-powered studies. The role of statistical epistasis in the genetic architecture of complex traits has been of great interest to the genetics community since Fisher introduced the concept in 1918. However, assessing epistasis in human and model organism populations has been impeded by limited statistical power. To mitigate this limitation, we analyzed bone and body composition traits in an unusually large mouse intercross population of over 2000 mice, paired with a recently-developed computational approach that leverages information to detect interactions across multiple phenotypes. We discovered a large network of highly significant genetic interactions between variants that influence complex body composition traits. Although epistasis was abundant, the interaction network was dominated by epistasis that stabilizes phenotypes by reducing phenotypic deviation from the parent strains. Nevertheless, the observed network provides an overview of genetic architecture and specific hypotheses of how QTL combine to affect phenotypes. These findings suggest that epistatic effects are generally of lesser magnitude than main QTL effects, and therefore are unlikely to account for major components of variance, but also reinforce genetic interaction analysis as a potent tool for dissecting the biology of complex traits.
Collapse
Affiliation(s)
- Anna L. Tyler
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Leah Rae Donahue
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | | | - Gregory W. Carter
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
- * E-mail:
| |
Collapse
|
26
|
Li HD, Omenn GS, Guan Y. A proteogenomic approach to understand splice isoform functions through sequence and expression-based computational modeling. Brief Bioinform 2016; 17:1024-1031. [PMID: 26740460 DOI: 10.1093/bib/bbv109] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 11/03/2015] [Indexed: 01/23/2023] Open
Abstract
The products of multi-exon genes are a mixture of alternatively spliced isoforms, from which the translated proteins can have similar, different or even opposing functions. It is therefore essential to differentiate and annotate functions for individual isoforms. Computational approaches provide an efficient complement to expensive and time-consuming experimental studies. The input data of these methods range from DNA sequence, to RNA selection pressure, to expressed sequence tags, to full-length complementary DNA, to exon array, to RNA-seq expression, to proteomic data. Notably, RNA-seq technology generates quantitative profiling of transcript expression at the genome scale, with an unprecedented amount of expression data available for developing isoform function prediction methods. Integrative analysis of these data at different molecular levels enables a proteogenomic approach to systematically interrogate isoform functions. Here, we briefly review the state-of-the-art methods according to their input data sources, discuss their advantages and limitations and point out potential ways to improve prediction accuracies.
Collapse
|
27
|
Greene CS, Foster JA, Stanton BA, Hogan DA, Bromberg Y. COMPUTATIONAL APPROACHES TO STUDY MICROBES AND MICROBIOMES. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 21:557-567. [PMID: 26776218 PMCID: PMC4832978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Technological advances are making large-scale measurements of microbial communities commonplace. These newly acquired datasets are allowing researchers to ask and answer questions about the composition of microbial communities, the roles of members in these communities, and how genes and molecular pathways are regulated in individual community members and communities as a whole to effectively respond to diverse and changing environments. In addition to providing a more comprehensive survey of the microbial world, this new information allows for the development of computational approaches to model the processes underlying microbial systems. We anticipate that the field of computational microbiology will continue to grow rapidly in the coming years. In this manuscript we highlight both areas of particular interest in microbiology as well as computational approaches that begin to address these challenges.
Collapse
Affiliation(s)
| | - James A. Foster
- Institute of Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID 83844 USA
| | - Bruce A. Stanton
- Department of Microbiology and Immunology, The Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Deborah A. Hogan
- Department of Microbiology and Immunology, The Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Yana Bromberg
- Biochemistry and Microbiology, School of Environmental and Biological Sciences, Rutgers University, New Brunswick, NJ 08901, USA, Institute for Advanced Study, Technische Universität München Garching, Germany
| |
Collapse
|
28
|
De R, Hu T, Moore JH, Gilbert-Diamond D. Characterizing gene-gene interactions in a statistical epistasis network of twelve candidate genes for obesity. BioData Min 2015; 8:45. [PMID: 26715945 PMCID: PMC4693412 DOI: 10.1186/s13040-015-0077-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Accepted: 12/15/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Recent findings have reemphasized the importance of epistasis, or gene-gene interactions, as a contributing factor to the unexplained heritability of obesity. Network-based methods such as statistical epistasis networks (SEN), present an intuitive framework to address the computational challenge of studying pairwise interactions between thousands of genetic variants. In this study, we aimed to analyze pairwise interactions that are associated with Body Mass Index (BMI) between SNPs from twelve genes robustly associated with obesity (BDNF, ETV5, FAIM2, FTO, GNPDA2, KCTD15, MC4R, MTCH2, NEGR1, SEC16B, SH2B1, and TMEM18). METHODS We used information gain measures to identify all SNP-SNP interactions among and between these genes that were related to obesity (BMI > 30 kg/m(2)) within the Framingham Heart Study Cohort; interactions exceeding a certain threshold were used to build an SEN. We also quantified whether interactions tend to occur more between SNPs from the same gene (dyadicity) or between SNPs from different genes (heterophilicity). RESULTS We identified a highly connected SEN of 709 SNPs and 1241 SNP-SNP interactions. Combining the SEN framework with dyadicity and heterophilicity analyses, we found 1 dyadic gene (TMEM18, P-value = 0.047) and 3 heterophilic genes (KCTD15, P-value = 0.045; SH2B1, P-value = 0.003; and TMEM18, P-value = 0.001). We also identified a lncRNA SNP (rs4358154) as a key node within the SEN using multiple network measures. CONCLUSION This study presents an analytical framework to characterize the global landscape of genetic interactions from genome-wide arrays and also to discover nodes of potential biological significance within the identified network.
Collapse
Affiliation(s)
- Rishika De
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, NH USA
| | - Ting Hu
- Department of Computer Science, Memorial University, St. John's, NL Canada
| | - Jason H Moore
- Institute for Biomedical Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA USA
| | | |
Collapse
|
29
|
De R, Verma SS, Drenos F, Holzinger ER, Holmes MV, Hall MA, Crosslin DR, Carrell DS, Hakonarson H, Jarvik G, Larson E, Pacheco JA, Rasmussen-Torvik LJ, Moore CB, Asselbergs FW, Moore JH, Ritchie MD, Keating BJ, Gilbert-Diamond D. Identifying gene-gene interactions that are highly associated with Body Mass Index using Quantitative Multifactor Dimensionality Reduction (QMDR). BioData Min 2015; 8:41. [PMID: 26674805 PMCID: PMC4678717 DOI: 10.1186/s13040-015-0074-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 12/04/2015] [Indexed: 11/22/2022] Open
Abstract
Background Despite heritability estimates of 40–70 % for obesity, less than 2 % of its variation is explained by Body Mass Index (BMI) associated loci that have been identified so far. Epistasis, or gene-gene interactions are a plausible source to explain portions of the missing heritability of BMI. Methods Using genotypic data from 18,686 individuals across five study cohorts – ARIC, CARDIA, FHS, CHS, MESA – we filtered SNPs (Single Nucleotide Polymorphisms) using two parallel approaches. SNPs were filtered either on the strength of their main effects of association with BMI, or on the number of knowledge sources supporting a specific SNP-SNP interaction in the context of BMI. Filtered SNPs were specifically analyzed for interactions that are highly associated with BMI using QMDR (Quantitative Multifactor Dimensionality Reduction). QMDR is a nonparametric, genetic model-free method that detects non-linear interactions associated with a quantitative trait. Results We identified seven novel, epistatic models with a Bonferroni corrected p-value of association < 0.1. Prior experimental evidence helps explain the plausible biological interactions highlighted within our results and their relationship with obesity. We identified interactions between genes involved in mitochondrial dysfunction (POLG2), cholesterol metabolism (SOAT2), lipid metabolism (CYP11B2), cell adhesion (EZR), cell proliferation (MAP2K5), and insulin resistance (IGF1R). Moreover, we found an 8.8 % increase in the variance in BMI explained by these seven SNP-SNP interactions, beyond what is explained by the main effects of an index FTO SNP and the SNPs within these interactions. We also replicated one of these interactions and 58 proxy SNP-SNP models representing it in an independent dataset from the eMERGE study. Conclusion This study highlights a novel approach for discovering gene-gene interactions by combining methods such as QMDR with traditional statistics. Electronic supplementary material The online version of this article (doi:10.1186/s13040-015-0074-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rishika De
- Computational Genetics Laboratory, Department of Genetics, Geisel School of Medicine at Dartmouth, Dartmouth-Hitchcock Medical Center, 706 Rubin Building, HB7937, One Medical Center Dr, Lebanon, NH 03756 USA
| | - Shefali S Verma
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, 512 Wartik Laboratory, The Pennsylvania State University, University Park, PA 16802 USA
| | - Fotios Drenos
- Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London, 5 University Street, London, WC1E 6JF UK
| | - Emily R Holzinger
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, 512 Wartik Laboratory, The Pennsylvania State University, University Park, PA 16802 USA
| | - Michael V Holmes
- Division of Transplant Surgery, Perelman School of Medicine, University of Pennsylvania, 3400 Spruce Street, 2 Dulles Pvln, Philadelphia, PA 19104 USA
| | - Molly A Hall
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, 512 Wartik Laboratory, The Pennsylvania State University, University Park, PA 16802 USA
| | - David R Crosslin
- Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, WA 98195-5065 USA
| | - David S Carrell
- Group Health Research Institute, Metropolitan Park East, 1730 Minor Avenue, Suite 1600, Seattle, WA 98101-1448 USA
| | - Hakon Hakonarson
- The Joseph Stokes Jr. Research Institute, The Children's Hospital of Philadelphia, Office 1016 Abramson Building, Room 1216E, 3615 Civic Center Blvd, Philadelphia, PA 19104 USA
| | - Gail Jarvik
- Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, WA 98195-5065 USA ; Division of Medical Genetics, Department of Medicine, University of Washington, Health Sciences Building, K-253B, Medical Genetics, Box 357720, Seattle, WA 98195-7720 USA
| | - Eric Larson
- Group Health Research Institute, Metropolitan Park East, 1730 Minor Avenue, Suite 1600, Seattle, WA 98101-1448 USA
| | - Jennifer A Pacheco
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, 303 E. Superior Street, Lurie 7-125, Chicago, IL 60611 USA
| | - Laura J Rasmussen-Torvik
- Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, 680 N Lake Shore Drive, Suite 1400, Chicago, IL 60611 USA
| | - Carrie B Moore
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, 512 Wartik Laboratory, The Pennsylvania State University, University Park, PA 16802 USA ; Center for Human Genetics Research, Vanderbilt University School of Medicine, 519 Light Hall, Nashville, TN 37232 USA
| | - Folkert W Asselbergs
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Room E03.511, P.O. Box 85500, 3508 GA Utrecht, The Netherlands ; Institute of Cardiovascular Science, University College London, London, UK ; Durrer Center for Cardiogenetic Research, ICIN-Netherlands Heart Institute, Utrecht, The Netherlands
| | - Jason H Moore
- Institute for Biomedical Informatics, The Perelman School of Medicine, University of Pennsylvania, 1418 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104-6021 USA
| | - Marylyn D Ritchie
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, 512 Wartik Laboratory, The Pennsylvania State University, University Park, PA 16802 USA
| | - Brendan J Keating
- The Joseph Stokes Jr. Research Institute, The Children's Hospital of Philadelphia, Office 1016 Abramson Building, Room 1216E, 3615 Civic Center Blvd, Philadelphia, PA 19104 USA ; University Medical Center Utrecht, Utrecht, The Netherlands
| | - Diane Gilbert-Diamond
- Institute for Quantitative Biomedical Sciences at Dartmouth, Hanover, NH USA ; Department of Epidemiology, Geisel School of Medicine at Dartmouth, One Medical Center Drive, 7927 Rubin Building, Lebanon, NH 03756 USA
| |
Collapse
|
30
|
Lareau CA, White BC, Montgomery CG, McKinney BA. dcVar: a method for identifying common variants that modulate differential correlation structures in gene expression data. Front Genet 2015; 6:312. [PMID: 26539209 PMCID: PMC4609883 DOI: 10.3389/fgene.2015.00312] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 10/02/2015] [Indexed: 11/26/2022] Open
Abstract
Recent studies have implicated the role of differential co-expression or correlation structure in gene expression data to help explain phenotypic differences. However, few attempts have been made to characterize the function of variants based on their role in regulating differential co-expression. Here, we describe a statistical methodology that identifies pairs of transcripts that display differential correlation structure conditioned on genotypes of variants that regulate co-expression. Additionally, we present a user-friendly, computationally efficient tool, dcVar, that can be applied to expression quantitative trait loci (eQTL) or RNA-Seq datasets to infer differential co-expression variants (dcVars). We apply dcVar to the HapMap3 eQTL dataset and demonstrate the utility of this methodology at uncovering novel function of variants of interest with examples from a height genome-wide association and cancer drug resistance. We provide evidence that differential correlation structure is a valuable intermediate molecular phenotype for further characterizing the function of variants identified in GWAS and related studies.
Collapse
Affiliation(s)
- Caleb A Lareau
- Tandy School of Computer Science - Department of Mathematics, University of Tulsa Tulsa, OK, USA ; Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation Oklahoma City, OK, USA
| | - Bill C White
- Tandy School of Computer Science - Department of Mathematics, University of Tulsa Tulsa, OK, USA
| | - Courtney G Montgomery
- Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation Oklahoma City, OK, USA
| | - Brett A McKinney
- Tandy School of Computer Science - Department of Mathematics, University of Tulsa Tulsa, OK, USA ; Laureate Institute for Brain Research Tulsa, OK, USA
| |
Collapse
|
31
|
Gonzalez GH, Tahsin T, Goodale BC, Greene AC, Greene CS. Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery. Brief Bioinform 2015; 17:33-42. [PMID: 26420781 PMCID: PMC4719073 DOI: 10.1093/bib/bbv087] [Citation(s) in RCA: 103] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Indexed: 02/06/2023] Open
Abstract
Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine.
Collapse
|
32
|
Frasca M, Bertoni A, Valentini G. UNIPred: Unbalance-Aware Network Integration and Prediction of Protein Functions. J Comput Biol 2015; 22:1057-74. [PMID: 26402488 DOI: 10.1089/cmb.2014.0110] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
The proper integration of multiple sources of data and the unbalance between annotated and unannotated proteins represent two of the main issues of the automated function prediction (AFP) problem. Most of supervised and semisupervised learning algorithms for AFP proposed in literature do not jointly consider these items, with a negative impact on both sensitivity and precision performances, due to the unbalance between annotated and unannotated proteins that characterize the majority of functional classes and to the specific and complementary information content embedded in each available source of data. We propose UNIPred (unbalance-aware network integration and prediction of protein functions), an algorithm that properly combines different biomolecular networks and predicts protein functions using parametric semisupervised neural models. The algorithm explicitly takes into account the unbalance between unannotated and annotated proteins both to construct the integrated network and to predict protein annotations for each functional class. Full-genome and ontology-wide experiments with three eukaryotic model organisms show that the proposed method compares favorably with state-of-the-art learning algorithms for AFP.
Collapse
Affiliation(s)
- Marco Frasca
- DI - Department of Computer Science, University of Milan , Milan, Italy
| | - Alberto Bertoni
- DI - Department of Computer Science, University of Milan , Milan, Italy
| | - Giorgio Valentini
- DI - Department of Computer Science, University of Milan , Milan, Italy
| |
Collapse
|
33
|
Gorenshteyn D, Zaslavsky E, Fribourg M, Park CY, Wong AK, Tadych A, Hartmann BM, Albrecht RA, García-Sastre A, Kleinstein SH, Troyanskaya OG, Sealfon SC. Interactive Big Data Resource to Elucidate Human Immune Pathways and Diseases. Immunity 2015; 43:605-14. [PMID: 26362267 DOI: 10.1016/j.immuni.2015.08.014] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Revised: 04/24/2015] [Accepted: 06/25/2015] [Indexed: 12/21/2022]
Abstract
Many functionally important interactions between genes and proteins involved in immunological diseases and processes are unknown. The exponential growth in public high-throughput data offers an opportunity to expand this knowledge. To unlock human-immunology-relevant insight contained in the global biomedical research effort, including all public high-throughput datasets, we performed immunological-pathway-focused Bayesian integration of a comprehensive, heterogeneous compendium comprising 38,088 genome-scale experiments. The distillation of this knowledge into immunological networks of functional relationships between molecular entities (ImmuNet), and tools to mine this resource, are accessible to the public at http://immunet.princeton.edu. The predictive capacity of ImmuNet, established by rigorous statistical validation, is easily accessed by experimentalists to generate data-driven hypotheses. We demonstrate the power of this approach through the identification of unique host-virus interaction responses, and we show how ImmuNet complements genetic studies by predicting disease-associated genes. ImmuNet should be widely beneficial for investigating the mechanisms of the human immune system and immunological diseases.
Collapse
Affiliation(s)
- Dmitriy Gorenshteyn
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Elena Zaslavsky
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Miguel Fribourg
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Christopher Y Park
- New York Genome Center, 101 Avenue of the Americas, New York, NY 10013, USA
| | - Aaron K Wong
- Simons Center for Data Analysis, Simons Foundation, New York, NY 10010, USA
| | - Alicja Tadych
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Boris M Hartmann
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Randy A Albrecht
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Adolfo García-Sastre
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Division of Infectious Diseases, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Steven H Kleinstein
- Departments of Pathology and Immunobiology, Yale School of Medicine, New Haven, CT 06520, USA; Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
| | - Olga G Troyanskaya
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA; Simons Center for Data Analysis, Simons Foundation, New York, NY 10010, USA; Department of Computer Science, Princeton University, Princeton, NJ 08540, USA.
| | - Stuart C Sealfon
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| |
Collapse
|
34
|
Johnson ME, Pioli PA, Whitfield ML. Gene expression profiling offers insights into the role of innate immune signaling in SSc. Semin Immunopathol 2015; 37:501-9. [PMID: 26223504 PMCID: PMC4722533 DOI: 10.1007/s00281-015-0512-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Accepted: 07/02/2015] [Indexed: 12/22/2022]
Abstract
Systemic sclerosis (SSc) is characterized by inflammation, vascular dysfunction, and ultimately fibrosis. Progress in understanding disease pathogenesis and developing effective disease treatments has been hampered by an incomplete understanding of SSc heterogeneity. To clarify this, we have used genomic approaches to identify distinct patient subsets based on gene expression patterns in SSc skin and other end-target organs. Here, we review what is known about the gene expression-based subsets in SSc, currently defined as the inflammatory, fibroproliferative, limited, and normal-like subsets. The inflammatory subset of patients is characterized by infiltrating immune cells that include T cells, macrophages, and possibly dendritic cells, although little is known about the mediators these cells secrete and the pathways that govern cell activation. Prior studies have suggested a role for pathogens as a trigger of immune responses in SSc, and recent data have identified viral and mycobiome components as potential environmental triggers. We present a model based on analyses of gene expression data and a review of the literature, which suggests that the gene expression subsets observed in patients possibly represent distinct, interconnected molecular states of disease, to which an innate immune response is central that results in the generation of clinical disease.
Collapse
Affiliation(s)
- Michael E. Johnson
- Department of Genetics, Geisel School of Medicine at Dartmouth, 7400 Remsen, Hanover, NH 03755, USA
| | - Patricia A. Pioli
- Department of Obstetrics and Gynecology, Geisel School of Medicine at Dartmouth, One Medical Center Drive, Lebanon, NH 03756, USA
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, One Medical Center Drive, Lebanon, NH 03756, USA
| | - Michael L. Whitfield
- Department of Genetics, Geisel School of Medicine at Dartmouth, 7400 Remsen, Hanover, NH 03755, USA
| |
Collapse
|
35
|
Zhu F, Panwar B, Guan Y. Algorithms for modeling global and context-specific functional relationship networks. Brief Bioinform 2015; 17:686-95. [PMID: 26254431 DOI: 10.1093/bib/bbv065] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Indexed: 02/07/2023] Open
Abstract
Functional genomics has enormous potential to facilitate our understanding of normal and disease-specific physiology. In the past decade, intensive research efforts have been focused on modeling functional relationship networks, which summarize the probability of gene co-functionality relationships. Such modeling can be based on either expression data only or heterogeneous data integration. Numerous methods have been deployed to infer the functional relationship networks, while most of them target the global (non-context-specific) functional relationship networks. However, it is expected that functional relationships consistently reprogram under different tissues or biological processes. Thus, advanced methods have been developed targeting tissue-specific or developmental stage-specific networks. This article brings together the state-of-the-art functional relationship network modeling methods, emphasizes the need for heterogeneous genomic data integration and context-specific network modeling and outlines future directions for functional relationship networks.
Collapse
|
36
|
Weiss TL, Zieselman A, Hill DP, Diamond SG, Shen L, Saykin AJ, Moore JH. The role of visualization and 3-D printing in biological data mining. BioData Min 2015; 8:22. [PMID: 26246856 PMCID: PMC4526295 DOI: 10.1186/s13040-015-0056-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Accepted: 07/30/2015] [Indexed: 11/14/2022] Open
Abstract
Background Biological data mining is a powerful tool that can provide a wealth of information about patterns of genetic and genomic biomarkers of health and disease. A potential disadvantage of data mining is volume and complexity of the results that can often be overwhelming. It is our working hypothesis that visualization methods can greatly enhance our ability to make sense of data mining results. More specifically, we propose that 3-D printing has an important role to play as a visualization technology in biological data mining. We provide here a brief review of 3-D printing along with a case study to illustrate how it might be used in a research setting. Results We present as a case study a genetic interaction network associated with grey matter density, an endophenotype for late onset Alzheimer’s disease, as a physical model constructed with a 3-D printer. The synergy or interaction effects of multiple genetic variants were represented through a color gradient of the physical connections between nodes. The digital gene-gene interaction network was then 3-D printed to generate a physical network model. Conclusions The physical 3-D gene-gene interaction network provided an easily manipulated, intuitive and creative way to visualize the synergistic relationships between the genetic variants and grey matter density in patients with late onset Alzheimer’s disease. We discuss the advantages and disadvantages of this novel method of biological data mining visualization.
Collapse
Affiliation(s)
- Talia L Weiss
- Department of Genetics, Institute for Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College, Hanover, NH 03755 USA
| | - Amanda Zieselman
- Department of Genetics, Institute for Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College, Hanover, NH 03755 USA
| | - Douglas P Hill
- Department of Genetics, Institute for Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College, Hanover, NH 03755 USA
| | | | - Li Shen
- Center for Neuroimaging and Indiana Alzheimer's Disease Center, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202 USA
| | - Andrew J Saykin
- Center for Neuroimaging and Indiana Alzheimer's Disease Center, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202 USA
| | - Jason H Moore
- Department of Genetics, Institute for Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College, Hanover, NH 03755 USA ; Division of Informatics, Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-6021 USA
| | | |
Collapse
|
37
|
Gui J, Greene CS, Sullivan C, Taylor W, Moore JH, Kim C. Testing multiple hypotheses through IMP weighted FDR based on a genetic functional network with application to a new zebrafish transcriptome study. BioData Min 2015; 8:17. [PMID: 26097506 PMCID: PMC4474579 DOI: 10.1186/s13040-015-0050-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Accepted: 06/08/2015] [Indexed: 11/10/2022] Open
Abstract
In genome-wide studies, hundreds of thousands of hypothesis tests are performed simultaneously. Bonferroni correction and False Discovery Rate (FDR) can effectively control type I error but often yield a high false negative rate. We aim to develop a more powerful method to detect differentially expressed genes. We present a Weighted False Discovery Rate (WFDR) method that incorporate biological knowledge from genetic networks. We first identify weights using Integrative Multi-species Prediction (IMP) and then apply the weights in WFDR to identify differentially expressed genes through an IMP-WFDR algorithm. We performed a gene expression experiment to identify zebrafish genes that change expression in the presence of arsenic during a systemic Pseudomonas aeruginosa infection. Zebrafish were exposed to arsenic at 10 parts per billion and/or infected with P. aeruginosa. Appropriate controls were included. We then applied IMP-WFDR during the analysis of differentially expressed genes. We compared the mRNA expression for each group and found over 200 differentially expressed genes and several enriched pathways including defense response pathways, arsenic response pathways, and the Notch signaling pathway.
Collapse
Affiliation(s)
- Jiang Gui
- Department of Biomedical Data Science, Geisel school of medicine, Dartmouth College, Hanover, NH USA.,Dartmouth-Hitchcock Medical Center, 883 Rubin Bldg, HB7927, One Medical Center Dr., Lebanon, NH USA
| | - Casey S Greene
- Department of Genetics, Geisel school of medicine, Dartmouth College, Hanover, NH USA
| | - Con Sullivan
- Department of Molecular and Biomedical Sciences, University of Maine, Orono, ME USA.,Graduate School of Biomedical Science and Engineeering, University of Maine, Orono, ME USA
| | - Walter Taylor
- Department of Genetics, Geisel school of medicine, Dartmouth College, Hanover, NH USA
| | - Jason H Moore
- Department of Biostatistics and Epidemiology, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA USA
| | - Carol Kim
- Department of Molecular and Biomedical Sciences, University of Maine, Orono, ME USA.,Graduate School of Biomedical Science and Engineeering, University of Maine, Orono, ME USA
| |
Collapse
|
38
|
Frasca M, Bassis S, Valentini G. Learning node labels with multi-category Hopfield networks. Neural Comput Appl 2015. [DOI: 10.1007/s00521-015-1965-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
39
|
Wong AK, Krishnan A, Yao V, Tadych A, Troyanskaya OG. IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks. Nucleic Acids Res 2015; 43:W128-33. [PMID: 25969450 PMCID: PMC4489318 DOI: 10.1093/nar/gkv486] [Citation(s) in RCA: 65] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2015] [Accepted: 05/02/2015] [Indexed: 01/08/2023] Open
Abstract
IMP (Integrative Multi-species Prediction), originally released in 2012, is an interactive web server that enables molecular biologists to interpret experimental results and to generate hypotheses in the context of a large cross-organism compendium of functional predictions and networks. The system provides biologists with a framework to analyze their candidate gene sets in the context of functional networks, expanding or refining their sets using functional relationships predicted from integrated high-throughput data. IMP 2.0 integrates updated prior knowledge and data collections from the last three years in the seven supported organisms (Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Caenorhabditis elegans, and Saccharomyces cerevisiae) and extends function prediction coverage to include human disease. IMP identifies homologs with conserved functional roles for disease knowledge transfer, allowing biologists to analyze disease contexts and predictions across all organisms. Additionally, IMP 2.0 implements a new flexible platform for experts to generate custom hypotheses about biological processes or diseases, making sophisticated data-driven methods easily accessible to researchers. IMP does not require any registration or installation and is freely available for use at http://imp.princeton.edu.
Collapse
Affiliation(s)
- Aaron K Wong
- Department of Computer Science, Princeton University, Princeton, NJ 08540, USA Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA Simons Center for Data Analysis, Simons Foundation, NY 10010, USA
| | - Arjun Krishnan
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | - Victoria Yao
- Department of Computer Science, Princeton University, Princeton, NJ 08540, USA Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | - Alicja Tadych
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | - Olga G Troyanskaya
- Department of Computer Science, Princeton University, Princeton, NJ 08540, USA Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA Simons Center for Data Analysis, Simons Foundation, NY 10010, USA
| |
Collapse
|
40
|
Li HD, Omenn GS, Guan Y. MIsoMine: a genome-scale high-resolution data portal of expression, function and networks at the splice isoform level in the mouse. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav045. [PMID: 25953081 PMCID: PMC4423410 DOI: 10.1093/database/bav045] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 04/15/2015] [Indexed: 12/22/2022]
Abstract
Products of multiexon genes, especially in higher organisms, are a mixture of isoforms with different or even opposing functions, and therefore need to be treated separately. However, most studies and available resources such as Gene Ontology provide only gene-level function annotations, and therefore lose the differential information at the isoform level. Here we report MIsoMine, a high-resolution portal to multiple levels of functional information of alternatively spliced isoforms in the mouse. This data portal provides tissue-specific expression patterns and co-expression networks, along with such previously published functional genomic data as protein domains, predicted isoform-level functions and functional relationships. The core utility of MIsoMine is allowing users to explore a preprocessed, quality-controlled set of RNA-seq data encompassing diverse tissues and cell lineages. Tissue-specific co-expression networks were established, allowing a 2D ranking of isoforms and tissues by co-expression patterns. The results of the multiple isoforms of the same gene are presented in parallel to facilitate direct comparison, with cross-talking to prioritized functions at the isoform level. MIsoMine provides the first isoform-level resolution effort at genome-scale. We envision that this data portal will be a valuable resource for exploring functional genomic data, and will complement the existing functionalities of the mouse genome informatics database and the gene expression database for the laboratory mouse. Database URL: http://guanlab.ccmb.med.umich.edu/misomine/
Collapse
Affiliation(s)
- Hong-Dong Li
- Department of Computational Medicine and Bioinformatics, Department of Internal Medicine and Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA
| | - Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, Department of Internal Medicine and Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA Department of Computational Medicine and Bioinformatics, Department of Internal Medicine and Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, Department of Internal Medicine and Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA Department of Computational Medicine and Bioinformatics, Department of Internal Medicine and Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA Department of Computational Medicine and Bioinformatics, Department of Internal Medicine and Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
41
|
Goya J, Wong AK, Yao V, Krishnan A, Homilius M, Troyanskaya OG. FNTM: a server for predicting functional networks of tissues in mouse. Nucleic Acids Res 2015; 43:W182-7. [PMID: 25940632 PMCID: PMC4489275 DOI: 10.1093/nar/gkv443] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 04/24/2015] [Indexed: 12/11/2022] Open
Abstract
Functional Networks of Tissues in Mouse (FNTM) provides biomedical researchers with tissue-specific predictions of functional relationships between proteins in the most widely used model organism for human disease, the laboratory mouse. Users can explore FNTM-predicted functional relationships for their tissues and genes of interest or examine gene function and interaction predictions across multiple tissues, all through an interactive, multi-tissue network browser. FNTM makes predictions based on integration of a variety of functional genomic data, including over 13 000 gene expression experiments, and prior knowledge of gene function. FNTM is an ideal starting point for clinical and translational researchers considering a mouse model for their disease of interest, researchers already working with mouse models who are interested in discovering new genes related to their pathways or phenotypes of interest, and biologists working with other organisms to explore the functional relationships of their genes of interest in specific mouse tissue contexts. FNTM predicts tissue-specific functional relationships in 200 tissues, does not require any registration or installation and is freely available for use at http://fntm.princeton.edu.
Collapse
Affiliation(s)
- Jonathan Goya
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | - Aaron K Wong
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA Simons Center for Data Analysis, Simons Foundation, NY 10010, USA Department of Computer Science, Princeton University, Princeton, NJ 08540, USA
| | - Victoria Yao
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA Department of Computer Science, Princeton University, Princeton, NJ 08540, USA
| | - Arjun Krishnan
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | - Max Homilius
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA Department of Computer Science, Princeton University, Princeton, NJ 08540, USA
| | - Olga G Troyanskaya
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA Simons Center for Data Analysis, Simons Foundation, NY 10010, USA Department of Computer Science, Princeton University, Princeton, NJ 08540, USA
| |
Collapse
|
42
|
Zhu F, Shi L, Engel JD, Guan Y. Regulatory network inferred using expression data of small sample size: application and validation in erythroid system. Bioinformatics 2015; 31:2537-44. [PMID: 25840044 DOI: 10.1093/bioinformatics/btv186] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 03/27/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Modeling regulatory networks using expression data observed in a differentiation process may help identify context-specific interactions. The outcome of the current algorithms highly depends on the quality and quantity of a single time-course dataset, and the performance may be compromised for datasets with a limited number of samples. RESULTS In this work, we report a multi-layer graphical model that is capable of leveraging many publicly available time-course datasets, as well as a cell lineage-specific data with small sample size, to model regulatory networks specific to a differentiation process. First, a collection of network inference methods are used to predict the regulatory relationships in individual public datasets. Then, the inferred directional relationships are weighted and integrated together by evaluating against the cell lineage-specific dataset. To test the accuracy of this algorithm, we collected a time-course RNA-Seq dataset during human erythropoiesis to infer regulatory relationships specific to this differentiation process. The resulting erythroid-specific regulatory network reveals novel regulatory relationships activated in erythropoiesis, which were further validated by genome-wide TR4 binding studies using ChIP-seq. These erythropoiesis-specific regulatory relationships were not identifiable by single dataset-based methods or context-independent integrations. Analysis of the predicted targets reveals that they are all closely associated with hematopoietic lineage differentiation.
Collapse
Affiliation(s)
- Fan Zhu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Lihong Shi
- State Key Laboratory of Experimental Hematology, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | | | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA, Department of Internal Medicine, and Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
43
|
Abstract
Background We develop a new concept that reflects how genes are connected based on microarray data using the coefficient of determination (the squared Pearson correlation coefficient). Our gene rank combines a priori knowledge about gene connectivity, say, from the Gene Ontology (GO) database, and the microarray expression data at hand, called the microarray enriched gene rank, or simply gene rank (GR). GR, similarly to Google PageRank, is defined in a recursive fashion and is computed as the left maximum eigenvector of a stochastic matrix derived from microarray expression data. An efficient algorithm is devised that allows computation of GR for 50 thousand genes with 500 samples within minutes on a personal computer using the public domain statistical package R. Results Computation of GR is illustrated with several microarray data sets. In particular, we apply GR (1) to answer whether bad genes are more connected than good genes in relation with cancer patient survival, (2) to associate gene connectivity with cluster/subtypes in ovarian cancer tumors, and to determine whether gene connectivity changes (3) from organ to organ within the same organism and (4) between organisms. Conclusions We have shown by examples that findings based on GR confirm biological expectations. GR may be used for hypothesis generation on gene pathways. It may be used for a homogeneous sample or for comparison of gene connectivity among cases and controls, or in longitudinal setting.
Collapse
Affiliation(s)
- Eugene Demidenko
- Department of Biomedical Data Science, Institute for Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Hanover, 03755 NH USA
| |
Collapse
|
44
|
Mahoney JM, Taroni J, Martyanov V, Wood TA, Greene CS, Pioli PA, Hinchcliff ME, Whitfield ML. Systems level analysis of systemic sclerosis shows a network of immune and profibrotic pathways connected with genetic polymorphisms. PLoS Comput Biol 2015; 11:e1004005. [PMID: 25569146 PMCID: PMC4288710 DOI: 10.1371/journal.pcbi.1004005] [Citation(s) in RCA: 100] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Accepted: 10/27/2014] [Indexed: 12/15/2022] Open
Abstract
Systemic sclerosis (SSc) is a rare systemic autoimmune disease characterized by skin and organ fibrosis. The pathogenesis of SSc and its progression are poorly understood. The SSc intrinsic gene expression subsets (inflammatory, fibroproliferative, normal-like, and limited) are observed in multiple clinical cohorts of patients with SSc. Analysis of longitudinal skin biopsies suggests that a patient's subset assignment is stable over 6-12 months. Genetically, SSc is multi-factorial with many genetic risk loci for SSc generally and for specific clinical manifestations. Here we identify the genes consistently associated with the intrinsic subsets across three independent cohorts, show the relationship between these genes using a gene-gene interaction network, and place the genetic risk loci in the context of the intrinsic subsets. To identify gene expression modules common to three independent datasets from three different clinical centers, we developed a consensus clustering procedure based on mutual information of partitions, an information theory concept, and performed a meta-analysis of these genome-wide gene expression datasets. We created a gene-gene interaction network of the conserved molecular features across the intrinsic subsets and analyzed their connections with SSc-associated genetic polymorphisms. The network is composed of distinct, but interconnected, components related to interferon activation, M2 macrophages, adaptive immunity, extracellular matrix remodeling, and cell proliferation. The network shows extensive connections between the inflammatory- and fibroproliferative-specific genes. The network also shows connections between these subset-specific genes and 30 SSc-associated polymorphic genes including STAT4, BLK, IRF7, NOTCH4, PLAUR, CSK, IRAK1, and several human leukocyte antigen (HLA) genes. Our analyses suggest that the gene expression changes underlying the SSc subsets may be long-lived, but mechanistically interconnected and related to a patients underlying genetic risk.
Collapse
Affiliation(s)
- J. Matthew Mahoney
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hannover, New Hampshire, United States of America
| | - Jaclyn Taroni
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hannover, New Hampshire, United States of America
| | - Viktor Martyanov
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hannover, New Hampshire, United States of America
| | - Tammara A. Wood
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hannover, New Hampshire, United States of America
| | - Casey S. Greene
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hannover, New Hampshire, United States of America
| | - Patricia A. Pioli
- Department of Obstetrics and Gynecology, Geisel School of Medicine at Dartmouth, Hannover, New Hampshire, United States of America
| | - Monique E. Hinchcliff
- Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Michael L. Whitfield
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hannover, New Hampshire, United States of America
| |
Collapse
|
45
|
Functional Splicing Network Reveals Extensive Regulatory Potential of the Core Spliceosomal Machinery. Mol Cell 2015; 57:7-22. [DOI: 10.1016/j.molcel.2014.10.030] [Citation(s) in RCA: 126] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2014] [Revised: 09/24/2014] [Accepted: 10/31/2014] [Indexed: 12/12/2022]
|
46
|
Tan J, Ung M, Cheng C, Greene CS. Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2015; 20:132-143. [PMID: 25592575 DOI: 10.1142/9789814644730_0014] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Big data bring new opportunities for methods that efficiently summarize and automatically extract knowledge from such compendia. While both supervised learning algorithms and unsupervised clustering algorithms have been successfully applied to biological data, they are either dependent on known biology or limited to discerning the most significant signals in the data. Here we present denoising autoencoders (DAs), which employ a data-defined learning objective independent of known biology, as a method to identify and extract complex patterns from genomic data. We evaluate the performance of DAs by applying them to a large collection of breast cancer gene expression data. Results show that DAs successfully construct features that contain both clinical and molecular information. There are features that represent tumor or normal samples, estrogen receptor (ER) status, and molecular subtypes. Features constructed by the autoencoder generalize to an independent dataset collected using a distinct experimental platform. By integrating data from ENCODE for feature interpretation, we discover a feature representing ER status through association with key transcription factors in breast cancer. We also identify a feature highly predictive of patient survival and it is enriched by FOXM1 signaling pathway. The features constructed by DAs are often bimodally distributed with one peak near zero and another near one, which facilitates discretization. In summary, we demonstrate that DAs effectively extract key biological principles from gene expression data and summarize them into constructed features with convenient properties.
Collapse
Affiliation(s)
- Jie Tan
- Department of Genetics, Institute for Quantitative Biomedical Sciences, Norris Cotton Cancer Center, The Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | | | | | | |
Collapse
|
47
|
Abstract
One of the challenges of understanding the genetic basis of complex phenotypes is explaining variability not attributable to individual genes. While most existing methods that investigate variant mutations or differential gene expression focus on individual effects, a complex system of gene interactions (epistasis) and pathways is likely needed to explain phenotypic variation. Herein, we examine methods for treating the interactions in these biological data sets as edges in a network model of the phenotype and review relevant network theory methods for analyzing network structure and identifying important genes. In particular, we review methods for detecting community structure, describing the statistical properties of networks, and computing network centrality of genes that may reveal insights missed by individual genetic effects. We also discuss available tools to facilitate the construction and visualization of epistasis networks of GWAS data.
Collapse
Affiliation(s)
- Caleb A Lareau
- Department of Mathematics, University of Tulsa, 800 S. Tucker Drive, Rayzor Hall 2145, Tulsa, OK, 74104, USA
| | | |
Collapse
|
48
|
TAN JIE, UNG MATTHEW, CHENG CHAO, GREENE CASEYS. Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2015; 20:132-43. [PMID: 25592575 PMCID: PMC4299935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Big data bring new opportunities for methods that efficiently summarize and automatically extract knowledge from such compendia. While both supervised learning algorithms and unsupervised clustering algorithms have been successfully applied to biological data, they are either dependent on known biology or limited to discerning the most significant signals in the data. Here we present denoising autoencoders (DAs), which employ a data-defined learning objective independent of known biology, as a method to identify and extract complex patterns from genomic data. We evaluate the performance of DAs by applying them to a large collection of breast cancer gene expression data. Results show that DAs successfully construct features that contain both clinical and molecular information. There are features that represent tumor or normal samples, estrogen receptor (ER) status, and molecular subtypes. Features constructed by the autoencoder generalize to an independent dataset collected using a distinct experimental platform. By integrating data from ENCODE for feature interpretation, we discover a feature representing ER status through association with key transcription factors in breast cancer. We also identify a feature highly predictive of patient survival and it is enriched by FOXM1 signaling pathway. The features constructed by DAs are often bimodally distributed with one peak near zero and another near one, which facilitates discretization. In summary, we demonstrate that DAs effectively extract key biological principles from gene expression data and summarize them into constructed features with convenient properties.
Collapse
Affiliation(s)
- JIE TAN
- Department of Genetics Institute for Quantitative Biomedical Sciences Norris Cotton Cancer Center The Geisel School of Medicine at Dartmouth Hanover, NH 03755, USA
| | - MATTHEW UNG
- Department of Genetics Institute for Quantitative Biomedical Sciences Norris Cotton Cancer Center The Geisel School of Medicine at Dartmouth Hanover, NH 03755, USA
| | - CHAO CHENG
- Department of Genetics Institute for Quantitative Biomedical Sciences Norris Cotton Cancer Center The Geisel School of Medicine at Dartmouth Hanover, NH 03755, USA
| | - CASEY S GREENE
- Department of Genetics Institute for Quantitative Biomedical Sciences Norris Cotton Cancer Center The Geisel School of Medicine at Dartmouth Hanover, NH 03755, USA
| |
Collapse
|
49
|
Zigo M, Dorosh A, Pohlová A, Jonáková V, Šulc M, Maňásková-Postlerová P. Panel of monoclonal antibodies to sperm surface proteins as a tool for monitoring localization and identification of sperm–zona pellucida receptors. Cell Tissue Res 2014; 359:895-908. [DOI: 10.1007/s00441-014-2072-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2014] [Accepted: 11/17/2014] [Indexed: 02/01/2023]
|
50
|
Park CY, Krishnan A, Zhu Q, Wong AK, Lee YS, Troyanskaya OG. Tissue-aware data integration approach for the inference of pathway interactions in metazoan organisms. ACTA ACUST UNITED AC 2014; 31:1093-101. [PMID: 25431329 DOI: 10.1093/bioinformatics/btu786] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2014] [Accepted: 11/20/2014] [Indexed: 11/12/2022]
Abstract
MOTIVATION Leveraging the large compendium of genomic data to predict biomedical pathways and specific mechanisms of protein interactions genome-wide in metazoan organisms has been challenging. In contrast to unicellular organisms, biological and technical variation originating from diverse tissues and cell-lineages is often the largest source of variation in metazoan data compendia. Therefore, a new computational strategy accounting for the tissue heterogeneity in the functional genomic data is needed to accurately translate the vast amount of human genomic data into specific interaction-level hypotheses. RESULTS We developed an integrated, scalable strategy for inferring multiple human gene interaction types that takes advantage of data from diverse tissue and cell-lineage origins. Our approach specifically predicts both the presence of a functional association and also the most likely interaction type among human genes or its protein products on a whole-genome scale. We demonstrate that directly incorporating tissue contextual information improves the accuracy of our predictions, and further, that such genome-wide results can be used to significantly refine regulatory interactions from primary experimental datasets (e.g. ChIP-Seq, mass spectrometry). AVAILABILITY AND IMPLEMENTATION An interactive website hosting all of our interaction predictions is publically available at http://pathwaynet.princeton.edu. Software was implemented using the open-source Sleipnir library, which is available for download at https://bitbucket.org/libsleipnir/libsleipnir.bitbucket.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher Y Park
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA and Simons Center for Data Analysis, Simons Foundation, New York, NY, 10010, USA Department of Computer Science, Princeton University, Princeton, NJ 08544, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA and Simons Center for Data Analysis, Simons Foundation, New York, NY, 10010, USA
| | - Arjun Krishnan
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA and Simons Center for Data Analysis, Simons Foundation, New York, NY, 10010, USA
| | - Qian Zhu
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA and Simons Center for Data Analysis, Simons Foundation, New York, NY, 10010, USA Department of Computer Science, Princeton University, Princeton, NJ 08544, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA and Simons Center for Data Analysis, Simons Foundation, New York, NY, 10010, USA
| | - Aaron K Wong
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA and Simons Center for Data Analysis, Simons Foundation, New York, NY, 10010, USA Department of Computer Science, Princeton University, Princeton, NJ 08544, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA and Simons Center for Data Analysis, Simons Foundation, New York, NY, 10010, USA
| | - Young-Suk Lee
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA and Simons Center for Data Analysis, Simons Foundation, New York, NY, 10010, USA Department of Computer Science, Princeton University, Princeton, NJ 08544, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA and Simons Center for Data Analysis, Simons Foundation, New York, NY, 10010, USA
| | - Olga G Troyanskaya
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA and Simons Center for Data Analysis, Simons Foundation, New York, NY, 10010, USA Department of Computer Science, Princeton University, Princeton, NJ 08544, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA and Simons Center for Data Analysis, Simons Foundation, New York, NY, 10010, USA Department of Computer Science, Princeton University, Princeton, NJ 08544, USA, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA and Simons Center for Data Analysis, Simons Foundation, New York, NY, 10010, USA
| |
Collapse
|