1
|
Liu T, Salguero P, Petek M, Martinez-Mira C, Balzano-Nogueira L, Ramšak Ž, McIntyre L, Gruden K, Tarazona S, Conesa A. PaintOmics 4: new tools for the integrative analysis of multi-omics datasets supported by multiple pathway databases. Nucleic Acids Res 2022; 50:W551-W559. [PMID: 35609982 PMCID: PMC9252773 DOI: 10.1093/nar/gkac352] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Revised: 04/22/2022] [Accepted: 04/25/2022] [Indexed: 01/02/2023] Open
Abstract
PaintOmics is a web server for the integrative analysis and visualisation of multi-omics datasets using biological pathway maps. PaintOmics 4 has several notable updates that improve and extend analyses. Three pathway databases are now supported: KEGG, Reactome and MapMan, providing more comprehensive pathway knowledge for animals and plants. New metabolite analysis methods fill gaps in traditional pathway-based enrichment methods. The metabolite hub analysis selects compounds with a high number of significant genes in their neighbouring network, suggesting regulation by gene expression changes. The metabolite class activity analysis tests the hypothesis that a metabolic class has a higher-than-expected proportion of significant elements, indicating that these compounds are regulated in the experiment. Finally, PaintOmics 4 includes a regulatory omics module to analyse the contribution of trans-regulatory layers (microRNA and transcription factors, RNA-binding proteins) to regulate pathways. We show the performance of PaintOmics 4 on both mouse and plant data to highlight how these new analysis features provide novel insights into regulatory biology. PaintOmics 4 is available at https://paintomics.org/.
Collapse
Affiliation(s)
- Tianyuan Liu
- Department of Mechanical Engineering, School of Engineering, Cardiff University, Cardiff, UK
| | - Pedro Salguero
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
| | - Marko Petek
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
| | | | | | - Živa Ramšak
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
| | - Lauren McIntyre
- Department of Molecular Genetics and Microbiology, Genetics Institute, University of Florida, Gainesville, USA
| | - Kristina Gruden
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
| | - Sonia Tarazona
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain.,Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| |
Collapse
|
2
|
McIntyre LM, Huertas F, Morse AM, Kaletsky R, Murphy CT, Kalia V, Miller GW, Moskalenko O, Conesa A, Mor DE. GAIT-GM integrative cross-omics analyses reveal cholinergic defects in a C. elegans model of Parkinson's disease. Sci Rep 2022; 12:3268. [PMID: 35228596 PMCID: PMC8885929 DOI: 10.1038/s41598-022-07238-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 01/24/2022] [Indexed: 11/09/2022] Open
Abstract
Parkinson’s disease (PD) is a disabling neurodegenerative disorder in which multiple cell types, including dopaminergic and cholinergic neurons, are affected. The mechanisms of neurodegeneration in PD are not fully understood, limiting the development of therapies directed at disease-relevant molecular targets. C. elegans is a genetically tractable model system that can be used to disentangle disease mechanisms in complex diseases such as PD. Such mechanisms can be studied combining high-throughput molecular profiling technologies such as transcriptomics and metabolomics. However, the integrative analysis of multi-omics data in order to unravel disease mechanisms is a challenging task without advanced bioinformatics training. Galaxy, a widely-used resource for enabling bioinformatics analysis by the broad scientific community, has poor representation of multi-omics integration pipelines. We present the integrative analysis of gene expression and metabolite levels of a C. elegans PD model using GAIT-GM, a new Galaxy tool for multi-omics data analysis. Using GAIT-GM, we discovered an association between branched-chain amino acid metabolism and cholinergic neurons in the C. elegans PD model. An independent follow-up experiment uncovered cholinergic neurodegeneration in the C. elegans model that is consistent with cholinergic cell loss observed in PD. GAIT-GM is an easy to use Galaxy-based tool for generating novel testable hypotheses of disease mechanisms involving gene-metabolite relationships.
Collapse
Affiliation(s)
- Lauren M McIntyre
- University of Florida Genetics Institute, Gainesville, FL, USA. .,Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, USA.
| | - Francisco Huertas
- University of Florida Genetics Institute, Gainesville, FL, USA.,Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, 32610, USA
| | - Alison M Morse
- University of Florida Genetics Institute, Gainesville, FL, USA.,Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, USA
| | - Rachel Kaletsky
- Department of Molecular Biology and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, 08544, USA
| | - Coleen T Murphy
- Department of Molecular Biology and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, 08544, USA
| | - Vrinda Kalia
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY, 10032, USA
| | - Gary W Miller
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY, 10032, USA
| | - Olexander Moskalenko
- University of Florida Research Computing, University of Florida, Gainesville, FL, 32610, USA
| | - Ana Conesa
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, 32610, USA. .,Institute for Integrative Systems Biology, Spanish National Research Council, 46980, Paterna, Spain.
| | - Danielle E Mor
- Department of Neuroscience and Regenerative Medicine, Medical College of Georgia at Augusta University, Augusta, GA, 30912, USA.
| |
Collapse
|
3
|
Janbain A, Reynès C, Assaghir Z, Zeineddine H, Sabatier R, Journot L. TopoFun: a machine learning method to improve the functional similarity of gene co-expression modules. NAR Genom Bioinform 2021; 3:lqab103. [PMID: 34761220 PMCID: PMC8573820 DOI: 10.1093/nargab/lqab103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 09/22/2021] [Accepted: 10/13/2021] [Indexed: 11/14/2022] Open
Abstract
A comprehensive, accurate functional annotation of genes is key to systems-level approaches. As functionally related genes tend to be co-expressed, one possible approach to identify functional modules or supplement existing gene annotations is to analyse gene co-expression. We describe TopoFun, a machine learning method that combines topological and functional information to improve the functional similarity of gene co-expression modules. Using LASSO, we selected topological descriptors that discriminated modules made of functionally related genes and random modules. Using the selected topological descriptors, we performed linear discriminant analysis to construct a topological score that predicted the type of a module, random-like or functional-like. We combined the topological score with a functional similarity score in a fitness function that we used in a genetic algorithm to explore the co-expression network. To illustrate the use of TopoFun, we started from a subset of the Gene Ontology Biological Processes (GO-BPs) and showed that TopoFun efficiently retrieved genes that we omitted, and aggregated a number of novel genes to the initial GO-BP while improving module topology and functional similarity. Using an independent protein-protein interaction database, we confirmed that the novel genes gathered by TopoFun were functionally related to the original gene set.
Collapse
Affiliation(s)
- Ali Janbain
- IGF, Univ Montpellier, CNRS, INSERM, Montpellier 34094, France
| | | | - Zainab Assaghir
- Applied Mathematics Department, Lebanese University, Beirut 1003, Lebanon
| | - Hassan Zeineddine
- Applied Mathematics Department, Lebanese University, Beirut 1003, Lebanon
| | - Robert Sabatier
- IGF, Univ Montpellier, CNRS, INSERM, Montpellier 34094, France
| | - Laurent Journot
- IGF, Univ Montpellier, CNRS, INSERM, Montpellier 34094, France
| |
Collapse
|
4
|
Mishra R, Li B. The Application of Artificial Intelligence in the Genetic Study of Alzheimer's Disease. Aging Dis 2020; 11:1567-1584. [PMID: 33269107 PMCID: PMC7673858 DOI: 10.14336/ad.2020.0312] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/12/2020] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease (AD) is a neurodegenerative disease in which genetic factors contribute approximately 70% of etiological effects. Studies have found many significant genetic and environmental factors, but the pathogenesis of AD is still unclear. With the application of microarray and next-generation sequencing technologies, research using genetic data has shown explosive growth. In addition to conventional statistical methods for the processing of these data, artificial intelligence (AI) technology shows obvious advantages in analyzing such complex projects. This article first briefly reviews the application of AI technology in medicine and the current status of genetic research in AD. Then, a comprehensive review is focused on the application of AI in the genetic research of AD, including the diagnosis and prognosis of AD based on genetic data, the analysis of genetic variation, gene expression profile, gene-gene interaction in AD, and genetic analysis of AD based on a knowledge base. Although many studies have yielded some meaningful results, they are still in a preliminary stage. The main shortcomings include the limitations of the databases, failing to take advantage of AI to conduct a systematic biology analysis of multilevel databases, and lack of a theoretical framework for the analysis results. Finally, we outlook the direction of future development. It is crucial to develop high quality, comprehensive, large sample size, data sharing resources; a multi-level system biology AI analysis strategy is one of the development directions, and computational creativity may play a role in theory model building, verification, and designing new intervention protocols for AD.
Collapse
Affiliation(s)
- Rohan Mishra
- Washington Institute for Health Sciences, Arlington, VA 22203, USA
| | - Bin Li
- Washington Institute for Health Sciences, Arlington, VA 22203, USA
- Georgetown University Medical Center, Washington D.C. 20057, USA
| |
Collapse
|
5
|
Shan L, Qiao Z, Cheng L, Kim I. Joint Estimation of the Two-Level Gaussian Graphical Models Across Multiple Classes. J Comput Graph Stat 2020. [DOI: 10.1080/10618600.2019.1694522] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Liang Shan
- Division of Preventive Medicine, Department of Medicine, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| | - Zhilei Qiao
- Department of Management, Information Systems & Quantitative Methods (MISQ), Collat School of Business, The University of Alabama at Birmingham, Birmingham, AL
| | - Lulu Cheng
- Biostatistics Department, Incyte Corporation, Wilmington, DE
| | - Inyoung Kim
- Department of Statistics, Virginia Tech., Blacksburg, VA
| |
Collapse
|
6
|
Mora A. Gene set analysis methods for the functional interpretation of non-mRNA data—Genomic range and ncRNA data. Brief Bioinform 2019; 21:1495-1508. [DOI: 10.1093/bib/bbz090] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 05/30/2019] [Accepted: 06/28/2019] [Indexed: 12/31/2022] Open
Abstract
Abstract
Gene set analysis (GSA) is one of the methods of choice for analyzing the results of current omics studies; however, it has been mainly developed to analyze mRNA (microarray, RNA-Seq) data. The following review includes an update regarding general methods and resources for GSA and then emphasizes GSA methods and tools for non-mRNA omics datasets, specifically genomic range data (ChIP-Seq, SNP and methylation) and ncRNA data (miRNAs, lncRNAs and others). In the end, the state of the GSA field for non-mRNA datasets is discussed, and some current challenges and trends are highlighted, especially the use of network approaches to face complexity issues.
Collapse
Affiliation(s)
- Antonio Mora
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health - Chinese Academy of Sciences
| |
Collapse
|
7
|
Hernández-de-Diego R, Tarazona S, Martínez-Mira C, Balzano-Nogueira L, Furió-Tarí P, Pappas GJ, Conesa A. PaintOmics 3: a web resource for the pathway analysis and visualization of multi-omics data. Nucleic Acids Res 2019; 46:W503-W509. [PMID: 29800320 PMCID: PMC6030972 DOI: 10.1093/nar/gky466] [Citation(s) in RCA: 109] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Accepted: 05/16/2018] [Indexed: 12/27/2022] Open
Abstract
The increasing availability of multi-omic platforms poses new challenges to data analysis. Joint visualization of multi-omics data is instrumental in better understanding interconnections across molecular layers and in fully utilizing the multi-omic resources available to make biological discoveries. We present here PaintOmics 3, a web-based resource for the integrated visualization of multiple omic data types onto KEGG pathway diagrams. PaintOmics 3 combines server-end capabilities for data analysis with the potential of modern web resources for data visualization, providing researchers with a powerful framework for interactive exploration of their multi-omics information. Unlike other visualization tools, PaintOmics 3 covers a comprehensive pathway analysis workflow, including automatic feature name/identifier conversion, multi-layered feature matching, pathway enrichment, network analysis, interactive heatmaps, trend charts, and more. It accepts a wide variety of omic types, including transcriptomics, proteomics and metabolomics, as well as region-based approaches such as ATAC-seq or ChIP-seq data. The tool is freely available at www.paintomics.org.
Collapse
Affiliation(s)
| | - Sonia Tarazona
- Genomics of Gene Expression Lab, Centro de Investigación Príncipe Felipe, Valencia, Spain.,Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Spain
| | - Carlos Martínez-Mira
- Genomics of Gene Expression Lab, Centro de Investigación Príncipe Felipe, Valencia, Spain
| | - Leandro Balzano-Nogueira
- Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL, USA.,Genetics Institute, University of Florida, Gainesville, FL, USA
| | - Pedro Furió-Tarí
- Genomics of Gene Expression Lab, Centro de Investigación Príncipe Felipe, Valencia, Spain
| | - Georgios J Pappas
- Department of Cellular Biology, University of Brasilia, Biological Sciences Institute, Brasília, Brazil
| | - Ana Conesa
- Genomics of Gene Expression Lab, Centro de Investigación Príncipe Felipe, Valencia, Spain.,Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL, USA.,Genetics Institute, University of Florida, Gainesville, FL, USA
| |
Collapse
|
8
|
Wang ZT, Tan CC, Tan L, Yu JT. Systems biology and gene networks in Alzheimer’s disease. Neurosci Biobehav Rev 2019; 96:31-44. [DOI: 10.1016/j.neubiorev.2018.11.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Revised: 11/18/2018] [Accepted: 11/18/2018] [Indexed: 12/25/2022]
|
9
|
Rubiolo M, Milone DH, Stegmayer G. Extreme learning machines for reverse engineering of gene regulatory networks from expression time series. Bioinformatics 2018; 34:1253-1260. [PMID: 29182723 DOI: 10.1093/bioinformatics/btx730] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 11/21/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation The reconstruction of gene regulatory networks (GRNs) from genes profiles has a growing interest in bioinformatics for understanding the complex regulatory mechanisms in cellular systems. GRNs explicitly represent the cause-effect of regulation among a group of genes and its reconstruction is today a challenging computational problem. Several methods were proposed, but most of them require different input sources to provide an acceptable prediction. Thus, it is a great challenge to reconstruct a GRN only from temporal gene expression data. Results Extreme Learning Machine (ELM) is a new supervised neural model that has gained interest in the last years because of its higher learning rate and better performance than existing supervised models in terms of predictive power. This work proposes a novel approach for GRNs reconstruction in which ELMs are used for modeling the relationships between gene expression time series. Artificial datasets generated with the well-known benchmark tool used in DREAM competitions were used. Real datasets were used for validation of this novel proposal with well-known GRNs underlying the time series. The impact of increasing the size of GRNs was analyzed in detail for the compared methods. The results obtained confirm the superiority of the ELM approach against very recent state-of-the-art methods in the same experimental conditions. Availability and implementation The web demo can be found at http://sinc.unl.edu.ar/web-demo/elm-grnnminer/. The source code is available at https://sourceforge.net/projects/sourcesinc/files/elm-grnnminer. Contact mrubiolo@santafe-conicet.gov.ar. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- M Rubiolo
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, 3000 Santa Fe, Argentina.,Center of Research and Development of Information System Engineering, CIDISI, System Engineering Department, UTN-FRSF, 3000 Santa Fe, Argentina
| | - D H Milone
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, 3000 Santa Fe, Argentina
| | - G Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, 3000 Santa Fe, Argentina
| |
Collapse
|
10
|
Narvi E, Vaparanta K, Karrila A, Chakroborty D, Knuutila S, Pulliainen A, Sundvall M, Elenius K. Different responses of colorectal cancer cells to alternative sequences of cetuximab and oxaliplatin. Sci Rep 2018; 8:16579. [PMID: 30410004 PMCID: PMC6224565 DOI: 10.1038/s41598-018-34938-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Accepted: 10/29/2018] [Indexed: 11/14/2022] Open
Abstract
Therapeutic protocols including EGFR antibodies in the context of oxaliplatin-based regimens have variable clinical effect in colorectal cancer. Here, we tested the effect of the EGFR antibody cetuximab in different sequential combinations with oxaliplatin on the growth of colorectal cancer cells in vitro and in vivo. Cetuximab reduced the efficacy of oxaliplatin when administered before oxaliplatin but provided additive effect when administered after oxaliplatin regardless of the KRAS or BRAF mutation status of the cells. Systemic gene expression and protein phosphorylation screens revealed alternatively activated pathways regulating apoptosis, cell cycle and DNA damage response. Functional assays indicated that cetuximab-induced arrest of the cells into the G1 phase of the cell cycle was associated with reduced responsiveness of the cells to subsequent treatment with oxaliplatin. In contrast, oxaliplatin-enhanced responsiveness to subsequent treatment with cetuximab was associated with increased apoptosis, inhibition of STAT3 activity and increased EGFR down-regulation. This preclinical study indicates that optimizing the sequence of administration may enhance the antitumor effect of combination therapy with EGFR antibodies and oxaliplatin.
Collapse
Affiliation(s)
- Elli Narvi
- Institute of Biomedicine and Medicity Research Laboratories, University of Turku, Turku, Finland
| | - Katri Vaparanta
- Institute of Biomedicine and Medicity Research Laboratories, University of Turku, Turku, Finland.,Turku Doctoral Programme of Molecular Medicine, Turku, Finland
| | - Anna Karrila
- Institute of Biomedicine and Medicity Research Laboratories, University of Turku, Turku, Finland.,Turku Doctoral Programme of Molecular Medicine, Turku, Finland
| | - Deepankar Chakroborty
- Institute of Biomedicine and Medicity Research Laboratories, University of Turku, Turku, Finland.,Turku Doctoral Programme of Molecular Medicine, Turku, Finland
| | - Sakari Knuutila
- Department of Pathology, Haartman Institute, University of Helsinki, Helsinki, Finland
| | - Arto Pulliainen
- Institute of Biomedicine and Medicity Research Laboratories, University of Turku, Turku, Finland
| | - Maria Sundvall
- Institute of Biomedicine and Medicity Research Laboratories, University of Turku, Turku, Finland.,Department of Oncology, Turku University Hospital, Turku, Finland
| | - Klaus Elenius
- Institute of Biomedicine and Medicity Research Laboratories, University of Turku, Turku, Finland. .,Department of Oncology, Turku University Hospital, Turku, Finland.
| |
Collapse
|
11
|
Igolkina AA, Armoskus C, Newman JRB, Evgrafov OV, McIntyre LM, Nuzhdin SV, Samsonova MG. Analysis of Gene Expression Variance in Schizophrenia Using Structural Equation Modeling. Front Mol Neurosci 2018; 11:192. [PMID: 29942251 PMCID: PMC6004421 DOI: 10.3389/fnmol.2018.00192] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 05/15/2018] [Indexed: 01/02/2023] Open
Abstract
Schizophrenia (SCZ) is a psychiatric disorder of unknown etiology. There is evidence suggesting that aberrations in neurodevelopment are a significant attribute of schizophrenia pathogenesis and progression. To identify biologically relevant molecular abnormalities affecting neurodevelopment in SCZ we used cultured neural progenitor cells derived from olfactory neuroepithelium (CNON cells). Here, we tested the hypothesis that variance in gene expression differs between individuals from SCZ and control groups. In CNON cells, variance in gene expression was significantly higher in SCZ samples in comparison with control samples. Variance in gene expression was enriched in five molecular pathways: serine biosynthesis, PI3K-Akt, MAPK, neurotrophin and focal adhesion. More than 14% of variance in disease status was explained within the logistic regression model (C-value = 0.70) by predictors accounting for gene expression in 69 genes from these five pathways. Structural equation modeling (SEM) was applied to explore how the structure of these five pathways was altered between SCZ patients and controls. Four out of five pathways showed differences in the estimated relationships among genes: between KRAS and NF1, and KRAS and SOS1 in the MAPK pathway; between PSPH and SHMT2 in serine biosynthesis; between AKT3 and TSC2 in the PI3K-Akt signaling pathway; and between CRK and RAPGEF1 in the focal adhesion pathway. Our analysis provides evidence that variance in gene expression is an important characteristic of SCZ, and SEM is a promising method for uncovering altered relationships between specific genes thus suggesting affected gene regulation associated with the disease. We identified altered gene-gene interactions in pathways enriched for genes with increased variance in expression in SCZ. These pathways and loci were previously implicated in SCZ, providing further support for the hypothesis that gene expression variance plays important role in the etiology of SCZ.
Collapse
Affiliation(s)
- Anna A Igolkina
- Institute of Applied Mathematics and Mechanics, Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia
| | - Chris Armoskus
- Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA, United States
| | - Jeremy R B Newman
- Department of Molecular Genetics & Microbiology, Genetics Institute, University of Florida, Gainesville, FL, United States
| | - Oleg V Evgrafov
- Department of Cell Biology, SUNY Downstate Medical Center, Brooklyn, NY, United States
| | - Lauren M McIntyre
- Department of Molecular Genetics & Microbiology, Genetics Institute, University of Florida, Gainesville, FL, United States
| | - Sergey V Nuzhdin
- Institute of Applied Mathematics and Mechanics, Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia.,Molecular and Computation Biology, University of Southern California, Los Angeles, CA, United States
| | - Maria G Samsonova
- Institute of Applied Mathematics and Mechanics, Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia
| |
Collapse
|
12
|
Identification and characterization of some putative genes involved in arabinoxylan biosynthesis in Plantago ovata. 3 Biotech 2018; 8:266. [PMID: 29868304 DOI: 10.1007/s13205-018-1289-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2018] [Accepted: 05/14/2018] [Indexed: 01/18/2023] Open
Abstract
Plantago ovata is an important source of Psyllium (Isabgol), which swells upon contact with water forming mucilaginous mass, largely composed of arabinoxylans. In this study, we analyzed the expression pattern of arabinoxylan biosynthetic pathway genes at different stages of seed development in P. ovata. Besides, arabinoxylans were quantified at different stages of seed development in water extractable and water unextractable fractions. The expression analysis revealed 5-8 fold increase in the levels of expression of some genes involved in arabinoxylan biosynthetic pathway such as UDP-arabinopyranose mutase, UDP-xylosyltransferase 2 and xylan glucuronosyltransferase at 15 days after pollination stage in seed. The xylose and arabinose units were analyzed at different stages of seed development and also in water-soluble (cold water and hot water), alkali and ethanolic fractions. The concentration of xylose and arabinose units increased steadily after pollination. Overall, alkali extract had high concentration of xylose (0.70 ± 0.022 mg/g) and arabinose units (0.10 ± 0.01 mg/g) at 15 days after pollination stage.
Collapse
|
13
|
Hu YS, Xin J, Hu Y, Zhang L, Wang J. Analyzing the genes related to Alzheimer's disease via a network and pathway-based approach. ALZHEIMERS RESEARCH & THERAPY 2017; 9:29. [PMID: 28446202 PMCID: PMC5406904 DOI: 10.1186/s13195-017-0252-z] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Accepted: 03/01/2017] [Indexed: 12/29/2022]
Abstract
Background Our understanding of the molecular mechanisms underlying Alzheimer’s disease (AD) remains incomplete. Previous studies have revealed that genetic factors provide a significant contribution to the pathogenesis and development of AD. In the past years, numerous genes implicated in this disease have been identified via genetic association studies on candidate genes or at the genome-wide level. However, in many cases, the roles of these genes and their interactions in AD are still unclear. A comprehensive and systematic analysis focusing on the biological function and interactions of these genes in the context of AD will therefore provide valuable insights to understand the molecular features of the disease. Method In this study, we collected genes potentially associated with AD by screening publications on genetic association studies deposited in PubMed. The major biological themes linked with these genes were then revealed by function and biochemical pathway enrichment analysis, and the relation between the pathways was explored by pathway crosstalk analysis. Furthermore, the network features of these AD-related genes were analyzed in the context of human interactome and an AD-specific network was inferred using the Steiner minimal tree algorithm. Results We compiled 430 human genes reported to be associated with AD from 823 publications. Biological theme analysis indicated that the biological processes and biochemical pathways related to neurodevelopment, metabolism, cell growth and/or survival, and immunology were enriched in these genes. Pathway crosstalk analysis then revealed that the significantly enriched pathways could be grouped into three interlinked modules—neuronal and metabolic module, cell growth/survival and neuroendocrine pathway module, and immune response-related module—indicating an AD-specific immune-endocrine-neuronal regulatory network. Furthermore, an AD-specific protein network was inferred and novel genes potentially associated with AD were identified. Conclusion By means of network and pathway-based methodology, we explored the pathogenetic mechanism underlying AD at a systems biology level. Results from our work could provide valuable clues for understanding the molecular mechanism underlying AD. In addition, the framework proposed in this study could be used to investigate the pathological molecular network and genes relevant to other complex diseases or phenotypes. Electronic supplementary material The online version of this article (doi:10.1186/s13195-017-0252-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yan-Shi Hu
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, 300070, China
| | - Juncai Xin
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, 300070, China
| | - Ying Hu
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, 300070, China
| | - Lei Zhang
- School of Computer Science and Technology, Tianjin University, Tianjin, 300072, China.
| | - Ju Wang
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, 300070, China.
| |
Collapse
|
14
|
Dussaut JS, Gallo CA, Cecchini RL, Carballido JA, Ponzoni I. Crosstalk pathway inference using topological information and biclustering of gene expression data. Biosystems 2016; 150:1-12. [DOI: 10.1016/j.biosystems.2016.08.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Revised: 06/03/2016] [Accepted: 08/04/2016] [Indexed: 11/30/2022]
|
15
|
Singh NK, Ernst M, Liebscher V, Fuellen G, Taher L. Revealing complex function, process and pathway interactions with high-throughput expression and biological annotation data. MOLECULAR BIOSYSTEMS 2016; 12:3196-208. [PMID: 27507577 DOI: 10.1039/c6mb00280c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The biological relationships both between and within the functions, processes and pathways that operate within complex biological systems are only poorly characterized, making the interpretation of large scale gene expression datasets extremely challenging. Here, we present an approach that integrates gene expression and biological annotation data to identify and describe the interactions between biological functions, processes and pathways that govern a phenotype of interest. The product is a global, interconnected network, not of genes but of functions, processes and pathways, that represents the biological relationships within the system. We validated our approach on two high-throughput expression datasets describing organismal and organ development. Our findings are well supported by the available literature, confirming that developmental processes and apoptosis play key roles in cell differentiation. Furthermore, our results suggest that processes related to pluripotency and lineage commitment, which are known to be critical for development, interact mainly indirectly, through genes implicated in more general biological processes. Moreover, we provide evidence that supports the relevance of cell spatial organization in the developing liver for proper liver function. Our strategy can be viewed as an abstraction that is useful to interpret high-throughput data and devise further experiments.
Collapse
Affiliation(s)
- Nitesh Kumar Singh
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, Ernst-Heydemann-Str. 8, 18057 Rostock, Germany.
| | | | | | | | | |
Collapse
|
16
|
Feng C, Zhang J, Li X, Ai B, Han J, Wang Q, Wei T, Xu Y, Li M, Li S, Song C, Li C. Subpathway-CorSP: Identification of metabolic subpathways via integrating expression correlations and topological features between metabolites and genes of interest within pathways. Sci Rep 2016; 6:33262. [PMID: 27625019 PMCID: PMC5021946 DOI: 10.1038/srep33262] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Accepted: 08/24/2016] [Indexed: 12/23/2022] Open
Abstract
Metabolic pathway analysis is a popular strategy for comprehensively researching metabolites and genes of interest associated with specific diseases. However, the traditional pathway identification methods do not accurately consider the combined effect of these interesting molecules and neglects expression correlations or topological features embedded in the pathways. In this study, we propose a powerful method, Subpathway-CorSP, for identifying metabolic subpathway regions. This method improved on original pathway identification methods by using a subpathway identification strategy and emphasizing expression correlations between metabolites and genes of interest based on topological features within the metabolic pathways. We analyzed a prostate cancer data set and its metastatic sub-group data set with detailed comparison of Subpathway-CorSP with four traditional pathway identification methods. Subpathway-CorSP was able to identify multiple subpathway regions whose entire corresponding pathways were not detected by traditional pathway identification methods. Further evidences indicated that Subpathway-CorSP provided a robust and efficient way of reliably recalling cancer-related subpathways and locating novel subpathways by the combined effect of metabolites and genes. This was a novel subpathway strategy based on systematically considering expression correlations and topological features between metabolites and genes of interest within given pathways.
Collapse
Affiliation(s)
- Chenchen Feng
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Jian Zhang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Xuecang Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Bo Ai
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081,China
| | - Qiuyu Wang
- School of Nursing, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Taiming Wei
- School of Pharmacy, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Yong Xu
- The fifth Affiliated Hospital of Harbin Medical University, Daqing 163319, China
| | - Meng Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Shang Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081,China
| | - Chao Song
- Department of Pharmacology, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Chunquan Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| |
Collapse
|
17
|
Suravajhala P, Kogelman LJA, Kadarmideen HN. Multi-omic data integration and analysis using systems genomics approaches: methods and applications in animal production, health and welfare. Genet Sel Evol 2016; 48:38. [PMID: 27130220 PMCID: PMC4850674 DOI: 10.1186/s12711-016-0217-x] [Citation(s) in RCA: 100] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Accepted: 04/16/2016] [Indexed: 02/06/2023] Open
Abstract
In the past years, there has been a remarkable development of high-throughput omics (HTO) technologies such as genomics, epigenomics, transcriptomics, proteomics and metabolomics across all facets of biology. This has spearheaded the progress of the systems biology era, including applications on animal production and health traits. However, notwithstanding these new HTO technologies, there remains an emerging challenge in data analysis. On the one hand, different HTO technologies judged on their own merit are appropriate for the identification of disease-causing genes, biomarkers for prevention and drug targets for the treatment of diseases and for individualized genomic predictions of performance or disease risks. On the other hand, integration of multi-omic data and joint modelling and analyses are very powerful and accurate to understand the systems biology of healthy and sustainable production of animals. We present an overview of current and emerging HTO technologies each with a focus on their applications in animal and veterinary sciences before introducing an integrative systems genomics framework for analysing and integrating multi-omic data towards improved animal production, health and welfare. We conclude that there are big challenges in multi-omic data integration, modelling and systems-level analyses, particularly with the fast emerging HTO technologies. We highlight existing and emerging systems genomics approaches and discuss how they contribute to our understanding of the biology of complex traits or diseases and holistic improvement of production performance, disease resistance and welfare.
Collapse
Affiliation(s)
- Prashanth Suravajhala
- Department of Large Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Grønnegårdsvej 7, 1870, Frederiksberg C, Denmark
| | - Lisette J A Kogelman
- Department of Large Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Grønnegårdsvej 7, 1870, Frederiksberg C, Denmark
| | - Haja N Kadarmideen
- Department of Large Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Grønnegårdsvej 7, 1870, Frederiksberg C, Denmark.
| |
Collapse
|
18
|
Hou J, Acharya L, Zhu D, Cheng J. An overview of bioinformatics methods for modeling biological pathways in yeast. Brief Funct Genomics 2016; 15:95-108. [PMID: 26476430 PMCID: PMC5065356 DOI: 10.1093/bfgp/elv040] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The advent of high-throughput genomics techniques, along with the completion of genome sequencing projects, identification of protein-protein interactions and reconstruction of genome-scale pathways, has accelerated the development of systems biology research in the yeast organism Saccharomyces cerevisiae In particular, discovery of biological pathways in yeast has become an important forefront in systems biology, which aims to understand the interactions among molecules within a cell leading to certain cellular processes in response to a specific environment. While the existing theoretical and experimental approaches enable the investigation of well-known pathways involved in metabolism, gene regulation and signal transduction, bioinformatics methods offer new insights into computational modeling of biological pathways. A wide range of computational approaches has been proposed in the past for reconstructing biological pathways from high-throughput datasets. Here we review selected bioinformatics approaches for modeling biological pathways inS. cerevisiae, including metabolic pathways, gene-regulatory pathways and signaling pathways. We start with reviewing the research on biological pathways followed by discussing key biological databases. In addition, several representative computational approaches for modeling biological pathways in yeast are discussed.
Collapse
|
19
|
Ghosh A, De RK. Fuzzy Correlated Association Mining: Selecting altered associations among the genes, and some possible marker genes mediating certain cancers. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2015.09.057] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
20
|
Rudd J, Zelaya RA, Demidenko E, Goode EL, Greene CS, Doherty JA. Leveraging global gene expression patterns to predict expression of unmeasured genes. BMC Genomics 2015; 16:1065. [PMID: 26666289 PMCID: PMC4678722 DOI: 10.1186/s12864-015-2250-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Accepted: 11/27/2015] [Indexed: 12/31/2022] Open
Abstract
Background Large collections of paraffin-embedded tissue represent a rich resource to test hypotheses based on gene expression patterns; however, measurement of genome-wide expression is cost-prohibitive on a large scale. Using the known expression correlation structure within a given disease type (in this case, high grade serous ovarian cancer; HGSC), we sought to identify reduced sets of directly measured (DM) genes which could accurately predict the expression of a maximized number of unmeasured genes. Results We developed a greedy gene set selection (GGS) algorithm which returns a DM set of user specified size based on a specific correlation threshold (|rP|) and minimum number of DM genes that must be correlated to an unmeasured gene in order to infer the value of the unmeasured gene (redundancy). We evaluated GGS in the Cancer Genome Atlas (TCGA) HGSC data across 144 combinations of DM size, redundancy (1–3), and |rP| (0.60, 0.65, 0.70). Across the parameter sweep, GGS allows on average 9 times more gene expression information to be captured compared to the DM set alone. GGS successfully augments prognostic HGSC gene sets; the addition of 20 GGS selected genes more than doubles the number of genes whose expression is predictable. Moreover, the expression prediction is highly accurate. After training regression models for the predictable gene set using 2/3 of the TCGA data, the average accuracy (ranked correlation of true and predicted values) in the 1/3 testing partition and four independent populations is above 0.65 and approaches 0.8 for conservative parameter sets. We observe similar accuracies in the TCGA HGSC RNA-sequencing data. Specifically, the prediction accuracy increases with increasing redundancy and increasing |rP|. Conclusions GGS-selected genes, which maximize expression information about unmeasured genes, can be combined with candidate gene sets as a cost effective way to increase the amount of gene expression information obtained in large studies. This method can be applied to any organism, model system, disease, or tissue type for which whole genome gene expression data exists. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2250-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- James Rudd
- Department of Epidemiology, Geisel School of Medicine at Dartmouth College, One Medical Center Drive, 7927 Rubin Building, Lebanon, NH, 03756, USA.
| | - René A Zelaya
- Department of Genetics, Geisel School of Medicine at Dartmouth College; Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania Perelman School of Medicine, 10-131 SCTR, 34th & Civic Center Boulevard, Philadelphia, PA, 19104-5158, USA.
| | - Eugene Demidenko
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, One Medical Center Drive, 7927 Rubin Building, Lebanon, NH, 03756, USA.
| | - Ellen L Goode
- Department of Health Sciences Research, Division of Epidemiology, Mayo Clinic, 200 First St. SW, Rochester, MN, 55905, USA.
| | - Casey S Greene
- Department of Genetics, Geisel School of Medicine at Dartmouth College; Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania Perelman School of Medicine, 10-131 SCTR, 34th & Civic Center Boulevard, Philadelphia, PA, 19104-5158, USA.
| | - Jennifer A Doherty
- Department of Epidemiology, Geisel School of Medicine at Dartmouth College, One Medical Center Drive, 7927 Rubin Building, Lebanon, NH, 03756, USA.
| |
Collapse
|
21
|
Sundarrajan S, Lulu S, Arumugam M. Insights into protein interaction networks reveal non-receptor kinases as significant druggable targets for psoriasis. Gene 2015; 566:138-47. [PMID: 25881869 DOI: 10.1016/j.gene.2015.04.030] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Revised: 03/18/2015] [Accepted: 04/06/2015] [Indexed: 10/23/2022]
Abstract
Psoriasis is a chronic disease of the skin characterized by hyper proliferation and inflammation of the epidermis and dermal components of the skin. T-cell-dependent inflammatory process in skin governs the pathogenesis of psoriasis. An in-silico search strategy was utilized to identify psoriatic therapeutic drug targets. The gene expression profiling of psoriatic skin identified a total of 427 differentially expressed genes (DEGs). Gene ontology investigation of DEGs identified genes involved in calcium binding, apoptosis, keratinisation, lipid transportation and homeostasis apart from immune mediated processes. The protein interaction networks identified proteins involved in various signaling mechanisms with high degree of interconnections. The gene modules derived from the main network were enriched with rich kinome. These sub-networks were dominated by the presence of non-receptor kinase family members which are major signal transmitters in immune response. The computational approach has aided in the identification of non-receptor kinases as potential targets for psoriasis drug development.
Collapse
Affiliation(s)
- Sudharsana Sundarrajan
- Bioinformatics Division, School of Biosciences and Technology, Vellore Institute of Technology University, India
| | - Sajitha Lulu
- Bioinformatics Division, School of Biosciences and Technology, Vellore Institute of Technology University, India
| | - Mohanapriya Arumugam
- Bioinformatics Division, School of Biosciences and Technology, Vellore Institute of Technology University, India.
| |
Collapse
|
22
|
Sharpnack MF, Huang K. Detecting Cancer Pathway Crosstalk with Distance Correlation. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2015; 2015:41-5. [PMID: 26306231 PMCID: PMC4525273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Biological pathway regulation is complex, yet it underlies the functional coordination in a cell. Cancer is a disease that is characterized by unregulated growth, driven by underlying pathway deregulation. This pathway deregulation is both within pathways and between pathways. Here, we propose a method to detect inter-pathway coordination using distance correlation. Utilizing data generated from microarray experiments, we separate the genes into pathways and calculate the pairwise distance correlation between them. The result is intuitively viewed as a network of differentially dependent pathways. We find intuitive, yet surprising significant hub pathways, including glycophosphatidylinositol anchor synthesis in lung cancer.
Collapse
|
23
|
Gene network biological validity based on gene-gene interaction relevance. ScientificWorldJournal 2014; 2014:540679. [PMID: 25295303 PMCID: PMC4175387 DOI: 10.1155/2014/540679] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2014] [Accepted: 07/11/2014] [Indexed: 01/17/2023] Open
Abstract
In recent years, gene networks have become one of the most useful tools for modeling biological processes. Many inference gene network algorithms have been developed as techniques for extracting knowledge from gene expression data. Ensuring the reliability of the inferred gene relationships is a crucial task in any study in order to prove that the algorithms used are precise. Usually, this validation process can be carried out using prior biological knowledge. The metabolic pathways stored in KEGG are one of the most widely used knowledgeable sources for analyzing relationships between genes. This paper introduces a new methodology, GeneNetVal, to assess the biological validity of gene networks based on the relevance of the gene-gene interactions stored in KEGG metabolic pathways. Hence, a complete KEGG pathway conversion into a gene association network and a new matching distance based on gene-gene interaction relevance are proposed. The performance of GeneNetVal was established with three different experiments. Firstly, our proposal is tested in a comparative ROC analysis. Secondly, a randomness study is presented to show the behavior of GeneNetVal when the noise is increased in the input network. Finally, the ability of GeneNetVal to detect biological functionality of the network is shown.
Collapse
|
24
|
Gomez-Cabrero D, Abugessaisa I, Maier D, Teschendorff A, Merkenschlager M, Gisel A, Ballestar E, Bongcam-Rudloff E, Conesa A, Tegnér J. Data integration in the era of omics: current and future challenges. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 2:I1. [PMID: 25032990 PMCID: PMC4101704 DOI: 10.1186/1752-0509-8-s2-i1] [Citation(s) in RCA: 208] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
To integrate heterogeneous and large omics data constitutes not only a conceptual challenge but a practical hurdle in the daily analysis of omics data. With the rise of novel omics technologies and through large-scale consortia projects, biological systems are being further investigated at an unprecedented scale generating heterogeneous and often large data sets. These data-sets encourage researchers to develop novel data integration methodologies. In this introduction we review the definition and characterize current efforts on data integration in the life sciences. We have used a web-survey to assess current research projects on data-integration to tap into the views, needs and challenges as currently perceived by parts of the research community.
Collapse
|