1
|
Henao JD, Lauber M, Azevedo M, Grekova A, Theis F, List M, Ogris C, Schubert B. Multi-omics regulatory network inference in the presence of missing data. Brief Bioinform 2023; 24:bbad309. [PMID: 37670505 PMCID: PMC10516394 DOI: 10.1093/bib/bbad309] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 05/06/2023] [Accepted: 05/29/2023] [Indexed: 09/07/2023] Open
Abstract
A key problem in systems biology is the discovery of regulatory mechanisms that drive phenotypic behaviour of complex biological systems in the form of multi-level networks. Modern multi-omics profiling techniques probe these fundamental regulatory networks but are often hampered by experimental restrictions leading to missing data or partially measured omics types for subsets of individuals due to cost restrictions. In such scenarios, in which missing data is present, classical computational approaches to infer regulatory networks are limited. In recent years, approaches have been proposed to infer sparse regression models in the presence of missing information. Nevertheless, these methods have not been adopted for regulatory network inference yet. In this study, we integrated regression-based methods that can handle missingness into KiMONo, a Knowledge guided Multi-Omics Network inference approach, and benchmarked their performance on commonly encountered missing data scenarios in single- and multi-omics studies. Overall, two-step approaches that explicitly handle missingness performed best for a wide range of random- and block-missingness scenarios on imbalanced omics-layers dimensions, while methods implicitly handling missingness performed best on balanced omics-layers dimensions. Our results show that robust multi-omics network inference in the presence of missing data with KiMONo is feasible and thus allows users to leverage available multi-omics data to its full extent.
Collapse
Affiliation(s)
- Juan D Henao
- Helmholtz Zentrum München, Computational Health Department, Ingolstädter Landstraße 1, 85764 Munich, Germany, Member of the German Center for Lung Research (DZL)
| | - Michael Lauber
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 3, 85354 Freising
| | - Manuel Azevedo
- Helmholtz Zentrum München, Computational Health Department, Ingolstädter Landstraße 1, 85764 Munich, Germany, Member of the German Center for Lung Research (DZL)
| | - Anastasiia Grekova
- Helmholtz Zentrum München, Computational Health Department, Ingolstädter Landstraße 1, 85764 Munich, Germany, Member of the German Center for Lung Research (DZL)
| | - Fabian Theis
- Helmholtz Zentrum München, Computational Health Department, Ingolstädter Landstraße 1, 85764 Munich, Germany, Member of the German Center for Lung Research (DZL)
- Department of Mathematics, Technical University of Munich, 85748 Garching bei München, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 3, 85354 Freising
| | - Christoph Ogris
- Helmholtz Zentrum München, Computational Health Department, Ingolstädter Landstraße 1, 85764 Munich, Germany, Member of the German Center for Lung Research (DZL)
| | - Benjamin Schubert
- Helmholtz Zentrum München, Computational Health Department, Ingolstädter Landstraße 1, 85764 Munich, Germany, Member of the German Center for Lung Research (DZL)
- Department of Mathematics, Technical University of Munich, 85748 Garching bei München, Germany
| |
Collapse
|
2
|
Complexities of JC Polyomavirus Receptor-Dependent and -Independent Mechanisms of Infection. Viruses 2022; 14:v14061130. [PMID: 35746603 PMCID: PMC9228512 DOI: 10.3390/v14061130] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 05/19/2022] [Accepted: 05/20/2022] [Indexed: 02/05/2023] Open
Abstract
JC polyomavirus (JCPyV) is a small non-enveloped virus that establishes lifelong, persistent infection in most of the adult population. Immune-competent patients are generally asymptomatic, but immune-compromised and immune-suppressed patients are at risk for the neurodegenerative disease progressive multifocal leukoencephalopathy (PML). Studies with purified JCPyV found it undergoes receptor-dependent infectious entry requiring both lactoseries tetrasaccharide C (LSTc) attachment and 5-hydroxytryptamine type 2 entry receptors. Subsequent work discovered the major targets of JCPyV infection in the central nervous system (oligodendrocytes and astrocytes) do not express the required attachment receptor at detectable levels, virus could not bind these cells in tissue sections, and viral quasi-species harboring recurrent mutations in the binding pocket for attachment. While several research groups found evidence JCPyV can use novel receptors for infection, it was also discovered that extracellular vesicles (EVs) can mediate receptor independent JCPyV infection. Recent work also found JCPyV associated EVs include both exosomes and secretory autophagosomes. EVs effectively present a means of immune evasion and increased tissue tropism that complicates viral studies and anti-viral therapeutics. This review focuses on JCPyV infection mechanisms and EV associated and outlines key areas of study necessary to understand the interplay between virus and extracellular vesicles.
Collapse
|
3
|
Martínez-García M, Hernández-Lemus E. Data Integration Challenges for Machine Learning in Precision Medicine. Front Med (Lausanne) 2022; 8:784455. [PMID: 35145977 PMCID: PMC8821900 DOI: 10.3389/fmed.2021.784455] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 12/28/2021] [Indexed: 12/19/2022] Open
Abstract
A main goal of Precision Medicine is that of incorporating and integrating the vast corpora on different databases about the molecular and environmental origins of disease, into analytic frameworks, allowing the development of individualized, context-dependent diagnostics, and therapeutic approaches. In this regard, artificial intelligence and machine learning approaches can be used to build analytical models of complex disease aimed at prediction of personalized health conditions and outcomes. Such models must handle the wide heterogeneity of individuals in both their genetic predisposition and their social and environmental determinants. Computational approaches to medicine need to be able to efficiently manage, visualize and integrate, large datasets combining structure, and unstructured formats. This needs to be done while constrained by different levels of confidentiality, ideally doing so within a unified analytical architecture. Efficient data integration and management is key to the successful application of computational intelligence approaches to medicine. A number of challenges arise in the design of successful designs to medical data analytics under currently demanding conditions of performance in personalized medicine, while also subject to time, computational power, and bioethical constraints. Here, we will review some of these constraints and discuss possible avenues to overcome current challenges.
Collapse
Affiliation(s)
- Mireya Martínez-García
- Clinical Research Division, National Institute of Cardiology ‘Ignacio Chávez’, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine (INMEGEN), Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autnoma de Mexico, Mexico City, Mexico
| |
Collapse
|
4
|
Fabris F, Palmer D, de Magalhães JP, Freitas AA. Comparing enrichment analysis and machine learning for identifying gene properties that discriminate between gene classes. Brief Bioinform 2021; 21:803-814. [PMID: 30895300 DOI: 10.1093/bib/bbz028] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 02/18/2019] [Accepted: 02/19/2019] [Indexed: 01/08/2023] Open
Abstract
Biologists very often use enrichment methods based on statistical hypothesis tests to identify gene properties that are significantly over-represented in a given set of genes of interest, by comparison with a 'background' set of genes. These enrichment methods, although based on rigorous statistical foundations, are not always the best single option to identify patterns in biological data. In many cases, one can also use classification algorithms from the machine-learning field. Unlike enrichment methods, classification algorithms are designed to maximize measures of predictive performance and are capable of analysing combinations of gene properties, instead of one property at a time. In practice, however, the majority of studies use either enrichment or classification methods (rather than both), and there is a lack of literature discussing the pros and cons of both types of method. The goal of this paper is to compare and contrast enrichment and classification methods, offering two contributions. First, we discuss the (to some extent complementary) advantages and disadvantages of both types of methods for identifying gene properties that discriminate between gene classes. Second, we provide a set of high-level recommendations for using enrichment and classification methods. Overall, by highlighting the strengths and the weaknesses of both types of methods we argue that both should be used in bioinformatics analyses.
Collapse
Affiliation(s)
- Fabio Fabris
- School of Computing, University of Kent, Kent, CT2 7NF, UK
| | - Daniel Palmer
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - João Pedro de Magalhães
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - Alex A Freitas
- School of Computing, University of Kent, Kent, CT2 7NF, UK
| |
Collapse
|
5
|
Chang SM, Yang M, Lu W, Huang YJ, Huang Y, Hung H, Miecznikowski JC, Lu TP, Tzeng JY. Gene-Set Integrative Analysis of Multi-Omics Data Using Tensor-based Association Test. Bioinformatics 2021; 37:2259-2265. [PMID: 33674827 PMCID: PMC8388036 DOI: 10.1093/bioinformatics/btab125] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 12/30/2020] [Accepted: 02/24/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Facilitated by technological advances and the decrease in costs, it is feasible to gather subject data from several omics platforms. Each platform assesses different molecular events, and the challenge lies in efficiently analyzing these data to discover novel disease genes or mechanisms. A common strategy is to regress the outcomes on all omics variables in a gene set. However, this approach suffers from problems associated with high-dimensional inference. RESULTS We introduce a tensor-based framework for variable-wise inference in multi-omics analysis. By accounting for the matrix structure of an individual's multi-omics data, the proposed tensor methods incorporate the relationship among omics effects, reduce the number of parameters, and boost the modeling efficiency. We derive the variable-specific tensor test and enhance computational efficiency of tensor modeling. Using simulations and data applications on the Cancer Cell Line Encyclopedia (CCLE), we demonstrate our method performs favorably over baseline methods and will be useful for gaining biological insights in multi-omics analysis. AVAILABILITY AND IMPLEMENTATION R function and instruction are available from the authors' website: https://www4.stat.ncsu.edu/∼jytzeng/Software/TR.omics/TRinstruction.pdf. SUPPLEMENTARY INFORMATION Supplementary materials are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sheng-Mao Chang
- Department of Statistics, National Cheng Kung University, Tainan, Taiwan
| | - Meng Yang
- Department of Statistics, North Carolina State University, Raleigh NC, 27695, USA
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh NC, 27695, USA
| | - Yu-Jyun Huang
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Yueyang Huang
- Bioinformatics Research Center, North Carolina State University, Raleigh NC, 27695, USA
| | - Hung Hung
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | | | - Tzu-Pin Lu
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Jung-Ying Tzeng
- Department of Statistics, National Cheng Kung University, Tainan, Taiwan.,Department of Statistics, North Carolina State University, Raleigh NC, 27695, USA.,Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan.,Bioinformatics Research Center, North Carolina State University, Raleigh NC, 27695, USA
| |
Collapse
|
6
|
Chitoiu L, Dobranici A, Gherghiceanu M, Dinescu S, Costache M. Multi-Omics Data Integration in Extracellular Vesicle Biology-Utopia or Future Reality? Int J Mol Sci 2020; 21:ijms21228550. [PMID: 33202771 PMCID: PMC7697477 DOI: 10.3390/ijms21228550] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 11/10/2020] [Accepted: 11/11/2020] [Indexed: 12/15/2022] Open
Abstract
Extracellular vesicles (EVs) are membranous structures derived from the endosomal system or generated by plasma membrane shedding. Due to their composition of DNA, RNA, proteins, and lipids, EVs have garnered a lot of attention as an essential mechanism of cell-to-cell communication, with various implications in physiological and pathological processes. EVs are not only a highly heterogeneous population by means of size and biogenesis, but they are also a source of diverse, functionally rich biomolecules. Recent advances in high-throughput processing of biological samples have facilitated the development of databases comprised of characteristic genomic, transcriptomic, proteomic, metabolomic, and lipidomic profiles for EV cargo. Despite the in-depth approach used to map functional molecules in EV-mediated cellular cross-talk, few integrative methods have been applied to analyze the molecular interplay in these targeted delivery systems. New perspectives arise from the field of systems biology, where accounting for heterogeneity may lead to finding patterns in an apparently random pool of data. In this review, we map the biological and methodological causes of heterogeneity in EV multi-omics data and present current applications or possible statistical methods for integrating such data while keeping track of the current bottlenecks in the field.
Collapse
Affiliation(s)
- Leona Chitoiu
- Ultrastructural Pathology and Bioimaging Laboratory, ‘Victor Babeș’ National Institute of Pathology, Bucharest 050096, Romania; (L.C.); (M.G.)
| | - Alexandra Dobranici
- Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest 050095, Romania; (A.D.); (M.C.)
| | - Mihaela Gherghiceanu
- Ultrastructural Pathology and Bioimaging Laboratory, ‘Victor Babeș’ National Institute of Pathology, Bucharest 050096, Romania; (L.C.); (M.G.)
- Department of Cellular, Molecular Biology and Histology, ‘Carol Davila’ University of Medicine and Pharmacy, Bucharest 050474, Romania
| | - Sorina Dinescu
- Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest 050095, Romania; (A.D.); (M.C.)
- Research Institute of the University of Bucharest, University of Bucharest, Bucharest 050663, Romania
- Correspondence:
| | - Marieta Costache
- Department of Biochemistry and Molecular Biology, University of Bucharest, Bucharest 050095, Romania; (A.D.); (M.C.)
- Research Institute of the University of Bucharest, University of Bucharest, Bucharest 050663, Romania
| |
Collapse
|
7
|
Abstract
Over the last several years, next-generation sequencing and its recent push toward single-cell resolution have transformed the landscape of immunology research by revealing novel complexities about all components of the immune system. With the vast amounts of diverse data currently being generated, and with the methods of analyzing and combining diverse data improving as well, integrative systems approaches are becoming more powerful. Previous integrative approaches have combined multiple data types and revealed ways that the immune system, both as a whole and as individual parts, is affected by genetics, the microbiome, and other factors. In this review, we explore the data types that are available for studying immunology with an integrative systems approach, as well as the current strategies and challenges for conducting such analyses.
Collapse
Affiliation(s)
- Silvia Pineda
- Bakar Computational Health Sciences Institute, University of California, San Francisco, California 94158, USA
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre, 28029 Madrid, Spain
| | - Daniel G. Bunis
- Bakar Computational Health Sciences Institute, University of California, San Francisco, California 94158, USA
| | - Idit Kosti
- Bakar Computational Health Sciences Institute, University of California, San Francisco, California 94158, USA
- Department of Pediatrics, University of California, San Francisco, California 94143, USA
| | - Marina Sirota
- Bakar Computational Health Sciences Institute, University of California, San Francisco, California 94158, USA
- Department of Pediatrics, University of California, San Francisco, California 94143, USA
| |
Collapse
|
8
|
Tiong KL, Yeang CH. MGSEA - a multivariate Gene set enrichment analysis. BMC Bioinformatics 2019; 20:145. [PMID: 30885118 PMCID: PMC6421703 DOI: 10.1186/s12859-019-2716-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Accepted: 03/06/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene Set Enrichment Analysis (GSEA) is a powerful tool to identify enriched functional categories of informative biomarkers. Canonical GSEA takes one-dimensional feature scores derived from the data of one platform as inputs. Numerous extensions of GSEA handling multimodal OMIC data are proposed, yet none of them explicitly captures combinatorial relations of feature scores from multiple platforms. RESULTS We propose multivariate GSEA (MGSEA) to capture combinatorial relations of gene set enrichment among multiple platform features. MGSEA successfully captures designed feature relations from simulated data. By applying it to the scores of delineating breast cancer and glioblastoma multiforme (GBM) subtypes from The Cancer Genome Atlas (TCGA) datasets of CNV, DNA methylation and mRNA expressions, we find that breast cancer and GBM data yield both similar and distinct outcomes. Among the enriched functional categories, subtype-specific biomarkers are dominated by mRNA expression in many functional categories in both cancer types and also by CNV in many functional categories in breast cancer. The enriched functional categories belonging to distinct combinatorial patterns are involved different oncogenic processes: cell proliferation (such as cell cycle control, estrogen responses, MYC and E2F targets) for mRNA expression in breast cancer, invasion and metastasis (such as cell adhesion and epithelial-mesenchymal transition (EMT)) for CNV in breast cancer, and diverse processes (such as immune and inflammatory responses, cell adhesion, angiogenesis, and EMT) for mRNA expression in GBM. These observations persist in two external datasets (Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) for breast cancer and Repository for Molecular Brain Neoplasia Data (REMBRANDT) for GBM) and are consistent with knowledge of cancer subtypes. We further compare the characteristics of MGSEA with several extensions of GSEA and point out the pros and cons of each method. CONCLUSIONS We demonstrated the utility of MGSEA by inferring the combinatorial relations of multiple platforms for cancer subtype delineation in three multi-OMIC datasets: TCGA, METABRIC and REMBRANDT. The inferred combinatorial patterns are consistent with the current knowledge and also reveal novel insights about cancer subtypes. MGSEA can be further applied to any genotype-phenotype association problems with multimodal OMIC data.
Collapse
Affiliation(s)
- Khong-Loon Tiong
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | | |
Collapse
|
9
|
Dihazi H, Asif AR, Beißbarth T, Bohrer R, Feussner K, Feussner I, Jahn O, Lenz C, Majcherczyk A, Schmidt B, Schmitt K, Urlaub H, Valerius O. Integrative omics - from data to biology. Expert Rev Proteomics 2018; 15:463-466. [DOI: 10.1080/14789450.2018.1476143] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Hassan Dihazi
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Nephrology and Rheumatology, University Medical Center Göttingen, University of Göttingen, Göttingen, Germany
| | - Abdul R. Asif
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Institute for Clinical Chemistry/UMG-Laborateries, University Medical Center Göttingen, University of Göttingen, Göttingen, Germany
| | - Tim Beißbarth
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Department of Medical Statistics, University Medical Center Göttingen, University of Göttingen, Göttingen, Germany
| | - Rainer Bohrer
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Gesellschaft für Wissenschaftlische Datenverarbeitung mbH, Göttingen, Germany
| | - Kirstin Feussner
- Göttingen Metabolomics and Lipidomics Platform (GMLP), Göttingen, Germany
- Department of Plant Biochemistry, Albrecht-von-Haller-Institute for Plant Sciences, University of Göttingen, Göttingen, Germany
| | - Ivo Feussner
- Göttingen Metabolomics and Lipidomics Platform (GMLP), Göttingen, Germany
- Department of Plant Biochemistry, Albrecht-von-Haller-Institute for Plant Sciences, University of Göttingen, Göttingen, Germany
| | - Olaf Jahn
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Proteomics Group, Max Planck Institute of Experimental Medicine, Göttingen, Germany
| | - Christof Lenz
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Institute for Clinical Chemistry/UMG-Laborateries, University Medical Center Göttingen, University of Göttingen, Göttingen, Germany
- Bioanalytical Mass Spectrometry, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Andrzej Majcherczyk
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Büsgen-Institute, Section Molecular Wood Biotechnology and Technical Mycology, University of Göttingen, Göttingen, Germany
| | - Bernhard Schmidt
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Department of Cellular Biochemistry, University Medical Center Göttingen, Göttingen, Germany
| | - Kerstin Schmitt
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Institute for Microbiology and Genetics, University of Göttingen, Göttingen, Germany
| | - Henning Urlaub
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Institute for Clinical Chemistry/UMG-Laborateries, University Medical Center Göttingen, University of Göttingen, Göttingen, Germany
- Bioanalytical Mass Spectrometry, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Oliver Valerius
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Institute for Microbiology and Genetics, University of Göttingen, Göttingen, Germany
| |
Collapse
|
10
|
|
11
|
Preusse M, Marr C, Saunders S, Maticzka D, Lickert H, Backofen R, Theis F. SimiRa: A tool to identify coregulation between microRNAs and RNA-binding proteins. RNA Biol 2016; 12:998-1009. [PMID: 26383775 PMCID: PMC4615630 DOI: 10.1080/15476286.2015.1068496] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
microRNAs and microRNA-independent RNA-binding proteins are 2 classes of post-transcriptional regulators that have been shown to cooperate in gene-expression regulation. We compared the genome-wide target sets of microRNAs and RBPs identified by recent CLIP-Seq technologies, finding that RBPs have distinct target sets and favor gene interaction network hubs. To identify microRNAs and RBPs with a similar functional context, we developed simiRa, a tool that compares enriched functional categories such as pathways and GO terms. We applied simiRa to the known functional cooperation between Pumilio family proteins and miR-221/222 in the regulation of tumor supressor gene p27 and show that the cooperation is reflected by similar enriched categories but not by target genes. SimiRa also predicts possible cooperation of microRNAs and RBPs beyond direct interaction on the target mRNA for the nuclear RBP TAF15. To further facilitate research into cooperation of microRNAs and RBPs, we made simiRa available as a web tool that displays the functional neighborhood and similarity of microRNAs and RBPs: http://vsicb-simira.helmholtz-muenchen.de.
Collapse
Affiliation(s)
- Martin Preusse
- a Helmholtz Zentrum München - German Research Center for Environmental Health; Institute of Computational Biology ; Neuherberg , Germany.,b Helmholtz Zentrum München - German Research Center for Environmental Health; Institute of Diabetes and Regeneration Research ; Neuherberg , Germany
| | - Carsten Marr
- a Helmholtz Zentrum München - German Research Center for Environmental Health; Institute of Computational Biology ; Neuherberg , Germany
| | - Sita Saunders
- c Bioinformatics; Department of Computer Science; University of Freiburg ; Freiburg , Germany
| | - Daniel Maticzka
- c Bioinformatics; Department of Computer Science; University of Freiburg ; Freiburg , Germany
| | - Heiko Lickert
- b Helmholtz Zentrum München - German Research Center for Environmental Health; Institute of Diabetes and Regeneration Research ; Neuherberg , Germany.,d Medical Faculty; Technische Universität München ; Munich , Germany
| | - Rolf Backofen
- c Bioinformatics; Department of Computer Science; University of Freiburg ; Freiburg , Germany.,e BIOSS Center for Biological Signaling Studies; Cluster of Excellence; University of Freiburg ; Freiburg , Germany
| | - Fabian Theis
- b Helmholtz Zentrum München - German Research Center for Environmental Health; Institute of Diabetes and Regeneration Research ; Neuherberg , Germany.,f Technische Universität München; Center for Mathematics; Chair of Mathematical Modeling of Biological Systems ; Garching , Germany
| |
Collapse
|
12
|
Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A. A survey of best practices for RNA-seq data analysis. Genome Biol 2016; 17:13. [PMID: 26813401 PMCID: PMC4728800 DOI: 10.1186/s13059-016-0881-8] [Citation(s) in RCA: 1379] [Impact Index Per Article: 172.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics.
Collapse
Affiliation(s)
- Ana Conesa
- Institute for Food and Agricultural Sciences, Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, 32603, USA. .,Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012, Valencia, Spain.
| | - Pedro Madrigal
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. .,Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery, University of Cambridge, Cambridge, CB2 0SZ, UK.
| | - Sonia Tarazona
- Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012, Valencia, Spain.,Department of Applied Statistics, Operations Research and Quality, Universidad Politécnica de Valencia, 46020, Valencia, Spain
| | - David Gomez-Cabrero
- Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, Karolinska University Hospital, 171 77, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, 17177, Stockholm, Sweden.,Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176, Stockholm, Sweden.,Science for Life Laboratory, 17121, Solna, Sweden
| | - Alejandra Cervera
- Systems Biology Laboratory, Institute of Biomedicine and Genome-Scale Biology Research Program, University of Helsinki, 00014, Helsinki, Finland
| | - Andrew McPherson
- School of Computing Science, Simon Fraser University, Burnaby, V5A 1S6, BC, Canada
| | - Michał Wojciech Szcześniak
- Department of Bioinformatics, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University in Poznań, 61-614, Poznań, Poland
| | - Daniel J Gaffney
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Laura L Elo
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
| | - Xuegong Zhang
- Key Lab of Bioinformatics/Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing, 100084, China.,School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697-2300, USA. .,Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, 92697, USA.
| |
Collapse
|
13
|
He H, Lin D, Zhang J, Wang Y, Deng HW. Biostatistics, Data Mining and Computational Modeling. TRANSLATIONAL BIOINFORMATICS 2016. [DOI: 10.1007/978-94-017-7543-4_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
14
|
Alyass A, Turcotte M, Meyre D. From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med Genomics 2015; 8:33. [PMID: 26112054 PMCID: PMC4482045 DOI: 10.1186/s12920-015-0108-y] [Citation(s) in RCA: 224] [Impact Index Per Article: 24.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 06/15/2015] [Indexed: 02/07/2023] Open
Abstract
Recent advances in high-throughput technologies have led to the emergence of systems biology as a holistic science to achieve more precise modeling of complex diseases. Many predict the emergence of personalized medicine in the near future. We are, however, moving from two-tiered health systems to a two-tiered personalized medicine. Omics facilities are restricted to affluent regions, and personalized medicine is likely to widen the growing gap in health systems between high and low-income countries. This is mirrored by an increasing lag between our ability to generate and analyze big data. Several bottlenecks slow-down the transition from conventional to personalized medicine: generation of cost-effective high-throughput data; hybrid education and multidisciplinary teams; data storage and processing; data integration and interpretation; and individual and global economic relevance. This review provides an update of important developments in the analysis of big data and forward strategies to accelerate the global transition to personalized medicine.
Collapse
Affiliation(s)
- Akram Alyass
- Department of Clinical Epidemiology and Biostatistics, McMaster University, 1280 Main Street West, Hamilton, ON, Canada.
| | - Michelle Turcotte
- Department of Clinical Epidemiology and Biostatistics, McMaster University, 1280 Main Street West, Hamilton, ON, Canada.
| | - David Meyre
- Department of Clinical Epidemiology and Biostatistics, McMaster University, 1280 Main Street West, Hamilton, ON, Canada.
- Department of Pathology and Molecular Medicine, McMaster University, 1280 Main Street West, Hamilton, ON, Canada.
| |
Collapse
|
15
|
Anděl M, Kléma J, Krejčík Z. Network-constrained forest for regularized classification of omics data. Methods 2015; 83:88-97. [PMID: 25872185 DOI: 10.1016/j.ymeth.2015.04.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Revised: 04/01/2015] [Accepted: 04/02/2015] [Indexed: 12/28/2022] Open
Abstract
Contemporary molecular biology deals with wide and heterogeneous sets of measurements to model and understand underlying biological processes including complex diseases. Machine learning provides a frequent approach to build such models. However, the models built solely from measured data often suffer from overfitting, as the sample size is typically much smaller than the number of measured features. In this paper, we propose a random forest-based classifier that reduces this overfitting with the aid of prior knowledge in the form of a feature interaction network. We illustrate the proposed method in the task of disease classification based on measured mRNA and miRNA profiles complemented by the interaction network composed of the miRNA-mRNA target relations and mRNA-mRNA interactions corresponding to the interactions between their encoded proteins. We demonstrate that the proposed network-constrained forest employs prior knowledge to increase learning bias and consequently to improve classification accuracy, stability and comprehensibility of the resulting model. The experiments are carried out in the domain of myelodysplastic syndrome that we are concerned about in the long term. We validate our approach in the public domain of ovarian carcinoma, with the same data form. We believe that the idea of a network-constrained forest can straightforwardly be generalized towards arbitrary omics data with an available and non-trivial feature interaction network. The proposed method is publicly available in terms of miXGENE system (http://mixgene.felk.cvut.cz), the workflow that implements the myelodysplastic syndrome experiments is presented as a dedicated case study.
Collapse
Affiliation(s)
- Michael Anděl
- Department of Computer Science, Czech Technical University, Technická 2, Prague, Czech Republic.
| | - Jiří Kléma
- Department of Computer Science, Czech Technical University, Technická 2, Prague, Czech Republic.
| | - Zdeněk Krejčík
- Department of Molecular Genetics, Institute of Hematology and Blood Transfusion, U Nemocnice 1, Prague, Czech Republic.
| |
Collapse
|
16
|
Fondi M, Liò P. Multi -omics and metabolic modelling pipelines: challenges and tools for systems microbiology. Microbiol Res 2015; 171:52-64. [PMID: 25644953 DOI: 10.1016/j.micres.2015.01.003] [Citation(s) in RCA: 86] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Revised: 01/02/2015] [Accepted: 01/03/2015] [Indexed: 12/27/2022]
Abstract
Integrated -omics approaches are quickly spreading across microbiology research labs, leading to (i) the possibility of detecting previously hidden features of microbial cells like multi-scale spatial organization and (ii) tracing molecular components across multiple cellular functional states. This promises to reduce the knowledge gap between genotype and phenotype and poses new challenges for computational microbiologists. We underline how the capability to unravel the complexity of microbial life will strongly depend on the integration of the huge and diverse amount of information that can be derived today from -omics experiments. In this work, we present opportunities and challenges of multi -omics data integration in current systems biology pipelines. We here discuss which layers of biological information are important for biotechnological and clinical purposes, with a special focus on bacterial metabolism and modelling procedures. A general review of the most recent computational tools for performing large-scale datasets integration is also presented, together with a possible framework to guide the design of systems biology experiments by microbiologists.
Collapse
Affiliation(s)
- Marco Fondi
- Florence Computational Biology Group (ComBo), University of Florence, Via Madonna del Piano 6, Sesto Fiorentino, Florence 50019, Italy; Laboratory of Microbial and Molecular Evolution, Department of Biology, University of Florence, Via Madonna del Piano 6, Sesto Fiorentino, Florence 50019, Italy.
| | - Pietro Liò
- University of Cambridge, Computer Laboratory, 15 JJ Thomson Avenue, CB3 0FD Cambridge, UK
| |
Collapse
|
17
|
Tsiliki G, Karacapilidis N, Christodoulou S, Tzagarakis M. Collaborative mining and interpretation of large-scale data for biomedical research insights. PLoS One 2014; 9:e108600. [PMID: 25268270 PMCID: PMC4182494 DOI: 10.1371/journal.pone.0108600] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2014] [Accepted: 08/31/2014] [Indexed: 01/21/2023] Open
Abstract
Biomedical research becomes increasingly interdisciplinary and collaborative in nature. Researchers need to efficiently and effectively collaborate and make decisions by meaningfully assembling, mining and analyzing available large-scale volumes of complex multi-faceted data residing in different sources. In line with related research directives revealing that, in spite of the recent advances in data mining and computational analysis, humans can easily detect patterns which computer algorithms may have difficulty in finding, this paper reports on the practical use of an innovative web-based collaboration support platform in a biomedical research context. Arguing that dealing with data-intensive and cognitively complex settings is not a technical problem alone, the proposed platform adopts a hybrid approach that builds on the synergy between machine and human intelligence to facilitate the underlying sense-making and decision making processes. User experience shows that the platform enables more informed and quicker decisions, by displaying the aggregated information according to their needs, while also exploiting the associated human intelligence.
Collapse
Affiliation(s)
- Georgia Tsiliki
- School of Chemical Engineering, National Technical University of Athens, Athens, Greece
| | - Nikos Karacapilidis
- University of Patras and Computer Technology Institute & Press ‘Diophantus’, Patras, Greece
- * E-mail:
| | - Spyros Christodoulou
- University of Patras and Computer Technology Institute & Press ‘Diophantus’, Patras, Greece
| | - Manolis Tzagarakis
- University of Patras and Computer Technology Institute & Press ‘Diophantus’, Patras, Greece
| |
Collapse
|
18
|
Sass S, Buettner F, Mueller NS, Theis FJ. RAMONA: a Web application for gene set analysis on multilevel omics data. Bioinformatics 2014; 31:128-30. [DOI: 10.1093/bioinformatics/btu610] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
19
|
Espín-Pérez A, Krauskopf J, de Kok TM, Kleinjans JC. ‘OMICS-based’ Biomarkers for Environmental Health Studies. Curr Environ Health Rep 2014. [DOI: 10.1007/s40572-014-0028-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
20
|
Alcaraz N, Pauling J, Batra R, Barbosa E, Junge A, Christensen AGL, Azevedo V, Ditzel HJ, Baumbach J. KeyPathwayMiner 4.0: condition-specific pathway analysis by combining multiple omics studies and networks with Cytoscape. BMC SYSTEMS BIOLOGY 2014; 8:99. [PMID: 25134827 PMCID: PMC4236746 DOI: 10.1186/s12918-014-0099-x] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Accepted: 08/13/2014] [Indexed: 12/17/2022]
Abstract
Background Over the last decade network enrichment analysis has become popular in computational systems biology to elucidate aberrant network modules. Traditionally, these approaches focus on combining gene expression data with protein-protein interaction (PPI) networks. Nowadays, the so-called omics technologies allow for inclusion of many more data sets, e.g. protein phosphorylation or epigenetic modifications. This creates a need for analysis methods that can combine these various sources of data to obtain a systems-level view on aberrant biological networks. Results We present a new release of KeyPathwayMiner (version 4.0) that is not limited to analyses of single omics data sets, e.g. gene expression, but is able to directly combine several different omics data types. Version 4.0 can further integrate existing knowledge by adding a search bias towards sub-networks that contain (avoid) genes provided in a positive (negative) list. Finally the new release now also provides a set of novel visualization features and has been implemented as an app for the standard bioinformatics network analysis tool: Cytoscape. Conclusion With KeyPathwayMiner 4.0, we publish a Cytoscape app for multi-omics based sub-network extraction. It is available in Cytoscape’s app store http://apps.cytoscape.org/apps/keypathwayminer or via http://keypathwayminer.mpi-inf.mpg.de.
Collapse
|