1
|
Zheng X, Lim PK, Mutwil M, Wang Y. A method for mining condition-specific co-expressed genes in Camellia sinensis based on k-means clustering. BMC PLANT BIOLOGY 2024; 24:373. [PMID: 38714965 PMCID: PMC11077725 DOI: 10.1186/s12870-024-05086-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Accepted: 04/30/2024] [Indexed: 05/12/2024]
Abstract
BACKGROUND As one of the world's most important beverage crops, tea plants (Camellia sinensis) are renowned for their unique flavors and numerous beneficial secondary metabolites, attracting researchers to investigate the formation of tea quality. With the increasing availability of transcriptome data on tea plants in public databases, conducting large-scale co-expression analyses has become feasible to meet the demand for functional characterization of tea plant genes. However, as the multidimensional noise increases, larger-scale co-expression analyses are not always effective. Analyzing a subset of samples generated by effectively downsampling and reorganizing the global sample set often leads to more accurate results in co-expression analysis. Meanwhile, global-based co-expression analyses are more likely to overlook condition-specific gene interactions, which may be more important and worthy of exploration and research. RESULTS Here, we employed the k-means clustering method to organize and classify the global samples of tea plants, resulting in clustered samples. Metadata annotations were then performed on these clustered samples to determine the "conditions" represented by each cluster. Subsequently, we conducted gene co-expression network analysis (WGCNA) separately on the global samples and the clustered samples, resulting in global modules and cluster-specific modules. Comparative analyses of global modules and cluster-specific modules have demonstrated that cluster-specific modules exhibit higher accuracy in co-expression analysis. To measure the degree of condition specificity of genes within condition-specific clusters, we introduced the correlation difference value (CDV). By incorporating the CDV into co-expression analyses, we can assess the condition specificity of genes. This approach proved instrumental in identifying a series of high CDV transcription factor encoding genes upregulated during sustained cold treatment in Camellia sinensis leaves and buds, and pinpointing a pair of genes that participate in the antioxidant defense system of tea plants under sustained cold stress. CONCLUSIONS To summarize, downsampling and reorganizing the sample set improved the accuracy of co-expression analysis. Cluster-specific modules were more accurate in capturing condition-specific gene interactions. The introduction of CDV allowed for the assessment of condition specificity in gene co-expression analyses. Using this approach, we identified a series of high CDV transcription factor encoding genes related to sustained cold stress in Camellia sinensis. This study highlights the importance of considering condition specificity in co-expression analysis and provides insights into the regulation of the cold stress in Camellia sinensis.
Collapse
Affiliation(s)
- Xinghai Zheng
- Tea Research Institute, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore.
| | - Peng Ken Lim
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore.
| | - Yuefei Wang
- Tea Research Institute, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
2
|
Cai L, Huang X, Feng H, Fan G, Sun X. Antimicrobial mechanisms of g-C 3 N 4 @ZnO against oomycetes Phytophthora capsici: from its metabolism, membrane structures and growth. PEST MANAGEMENT SCIENCE 2024; 80:2096-2108. [PMID: 38135506 DOI: 10.1002/ps.7946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 11/25/2023] [Accepted: 12/23/2023] [Indexed: 12/24/2023]
Abstract
BACKGROUND Phytophthora capsici, a refractory and model oomycete plant pathogen, especially threatens multiple vegetable crops. A limited number of chemical pesticides play a vital role in controlling oomycete plant diseases. However, this approach often leads to excessive use of chemical agent, exacerbates environmental issues and more and more drug-resistant strains of oomycete. Therefore, it is imperative to devise innovative solutions that can effectively address the infection of oomycete while maintaining high levels of environmental sustainability and low toxicity. RESULTS In this study, g-C3 N4 @ZnO heterostructure was synthesized and characterized. The g-C3 N4 @ZnO showed higher toxicity on Phytophthora capsici than graphitic carbon nitride (g-C3 N4 ) nanosheets and zinc oxide (ZnO) nanoparticles in vitro and in vivo. Except the hyphal growth of Phytophthora capsici, their germination rate of spores, sporangium formation and number of spores were all suppressed by g-C3 N4 @ZnO heterostructure. Furthermore, we found that this g-C3 N4 @ZnO heterostructure has higher photocatalytic activity under visible light, which potentially enhanced the reactive oxygen species (ROS) mediated stress on Phytophthora capsici. Ultrastructural morphology, global changes of gene expression and weighted gene co-expression network analysis all supported that the anti-oomycete activity of g-C3 N4 @ZnO was manifested in the destruction of membrane system and inhibition of multiple metabolisms of Phytophthora capsici under visible irradiation, which also could be attributed to the ROS and zinc ion (Zn2+ ) mediated stress. CONCLUSION This works offers a novel oomycete disease management strategy by using g-C3 N4 @ZnO, which were attributed to the ROS stress, destruction of membrane system and inhibition of multiple metabolisms. © 2023 Society of Chemical Industry.
Collapse
Affiliation(s)
- Lin Cai
- Guizhou Key Laboratory for Tobacco Quality, College of Tobacco Science of Guizhou University, Guiyang, China
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals of Guizhou University, Guiyang, China
| | - Xunliang Huang
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals of Guizhou University, Guiyang, China
| | - Hui Feng
- Guizhou Key Laboratory for Tobacco Quality, College of Tobacco Science of Guizhou University, Guiyang, China
| | - Guangjin Fan
- College of Plant Protection, Southwest University, Chongqing, China
| | - Xianchao Sun
- College of Plant Protection, Southwest University, Chongqing, China
| |
Collapse
|
3
|
Russell M, Aqi A, Saitou M, Gokcumen O, Masuda N. Gene communities in co-expression networks across different tissues. ARXIV 2023:arXiv:2305.12963v2. [PMID: 37292479 PMCID: PMC10246089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
With the recent availability of tissue-specific gene expression data, e.g., provided by the GTEx Consortium, there is interest in comparing gene co-expression patterns across tissues. One promising approach to this problem is to use a multilayer network analysis framework and perform multilayer community detection. Communities in gene co-expression networks reveal groups of genes similarly expressed across individuals, potentially involved in related biological processes responding to specific environmental stimuli or sharing common regulatory variations. We construct a multilayer network in which each of the four layers is an exocrine gland tissue-specific gene co-expression network. We develop methods for multilayer community detection with correlation matrix input and an appropriate null model. Our correlation matrix input method identifies five groups of genes that are similarly co-expressed in multiple tissues (a community that spans multiple layers, which we call a generalist community) and two groups of genes that are co-expressed in just one tissue (a community that lies primarily within just one layer, which we call a specialist community). We further found gene co-expression communities where the genes physically cluster across the genome significantly more than expected by chance (on chromosomes 1 and 11). This clustering hints at underlying regulatory elements determining similar expression patterns across individuals and cell types. We suggest that KRTAP3-1, KRTAP3-3, and KRTAP3-5 share regulatory elements in skin and pancreas. Furthermore, we find that CELA3A and CELA3B share associated expression quantitative trait loci in the pancreas. The results indicate that our multilayer community detection method for correlation matrix input extracts biologically interesting communities of genes.
Collapse
Affiliation(s)
| | - Alber Aqi
- Department of Biological Sciences, University at Buffalo
| | - Marie Saitou
- Faculty of Biosciences, Norwegian University of Life Sciences
| | - Omer Gokcumen
- Department of Biological Sciences, University at Buffalo
| | - Naoki Masuda
- Department of Mathematics, University at Buffalo
- Institute for Artificial Intelligence and Data Science, University at Buffalo
| |
Collapse
|
4
|
Russell M, Aqil A, Saitou M, Gokcumen O, Masuda N. Gene communities in co-expression networks across different tissues. PLoS Comput Biol 2023; 19:e1011616. [PMID: 37976327 PMCID: PMC10691702 DOI: 10.1371/journal.pcbi.1011616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 12/01/2023] [Accepted: 10/19/2023] [Indexed: 11/19/2023] Open
Abstract
With the recent availability of tissue-specific gene expression data, e.g., provided by the GTEx Consortium, there is interest in comparing gene co-expression patterns across tissues. One promising approach to this problem is to use a multilayer network analysis framework and perform multilayer community detection. Communities in gene co-expression networks reveal groups of genes similarly expressed across individuals, potentially involved in related biological processes responding to specific environmental stimuli or sharing common regulatory variations. We construct a multilayer network in which each of the four layers is an exocrine gland tissue-specific gene co-expression network. We develop methods for multilayer community detection with correlation matrix input and an appropriate null model. Our correlation matrix input method identifies five groups of genes that are similarly co-expressed in multiple tissues (a community that spans multiple layers, which we call a generalist community) and two groups of genes that are co-expressed in just one tissue (a community that lies primarily within just one layer, which we call a specialist community). We further found gene co-expression communities where the genes physically cluster across the genome significantly more than expected by chance (on chromosomes 1 and 11). This clustering hints at underlying regulatory elements determining similar expression patterns across individuals and cell types. We suggest that KRTAP3-1, KRTAP3-3, and KRTAP3-5 share regulatory elements in skin and pancreas. Furthermore, we find that CELA3A and CELA3B share associated expression quantitative trait loci in the pancreas. The results indicate that our multilayer community detection method for correlation matrix input extracts biologically interesting communities of genes.
Collapse
Affiliation(s)
- Madison Russell
- Department of Mathematics, State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Alber Aqil
- Department of Biological Sciences, State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Marie Saitou
- Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Omer Gokcumen
- Department of Biological Sciences, State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Naoki Masuda
- Department of Mathematics, State University of New York at Buffalo, Buffalo, New York, United States of America
- Institute for Artificial Intelligence and Data Science, State University of New York at Buffalo, Buffalo, New York, United States of America
| |
Collapse
|
5
|
Panditrao G, Bhowmick R, Meena C, Sarkar RR. Emerging landscape of molecular interaction networks: Opportunities, challenges and prospects. J Biosci 2022. [PMID: 36210749 PMCID: PMC9018971 DOI: 10.1007/s12038-022-00253-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Network biology finds application in interpreting molecular interaction networks and providing insightful inferences using graph theoretical analysis of biological systems. The integration of computational bio-modelling approaches with different hybrid network-based techniques provides additional information about the behaviour of complex systems. With increasing advances in high-throughput technologies in biological research, attempts have been made to incorporate this information into network structures, which has led to a continuous update of network biology approaches over time. The newly minted centrality measures accommodate the details of omics data and regulatory network structure information. The unification of graph network properties with classical mathematical and computational modelling approaches and technologically advanced approaches like machine-learning- and artificial intelligence-based algorithms leverages the potential application of these techniques. These computational advances prove beneficial and serve various applications such as essential gene prediction, identification of drug–disease interaction and gene prioritization. Hence, in this review, we have provided a comprehensive overview of the emerging landscape of molecular interaction networks using graph theoretical approaches. With the aim to provide information on the wide range of applications of network biology approaches in understanding the interaction and regulation of genes, proteins, enzymes and metabolites at different molecular levels, we have reviewed the methods that utilize network topological properties, emerging hybrid network-based approaches and applications that integrate machine learning techniques to analyse molecular interaction networks. Further, we have discussed the applications of these approaches in biomedical research with a note on future prospects.
Collapse
Affiliation(s)
- Gauri Panditrao
- Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory, Pune, 411008 India
| | - Rupa Bhowmick
- Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory, Pune, 411008 India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002 India
| | - Chandrakala Meena
- Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory, Pune, 411008 India
| | - Ram Rup Sarkar
- Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory, Pune, 411008 India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002 India
| |
Collapse
|
6
|
Wu G, Li X, Guo W, Wei Z, Hu T, Shan Y, Gu J. JEBIN: analyzing gene co-expressions across multiple datasets by joint network embedding. Brief Bioinform 2022; 23:6519533. [PMID: 35134135 DOI: 10.1093/bib/bbab603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 12/15/2021] [Accepted: 12/27/2021] [Indexed: 11/13/2022] Open
Abstract
The inference of gene co-expression associations is one of the fundamental tasks for large-scale transcriptomic data analysis. Due to the high dimensionality and high noises in transcriptomic data, it is difficult to infer stable gene co-expression associations from single dataset. Meta-analysis of multisource data can effectively tackle this problem. We proposed Joint Embedding of multiple BIpartite Networks (JEBIN) to learn the low-dimensional consensus representation for genes by integrating multiple expression datasets. JEBIN infers gene co-expression associations in a nonlinear and global similarity manner and can integrate datasets with different distributions in linear time complexity with the gene and total sample size. The effectiveness and scalability of JEBIN were verified by simulation experiments, and its superiority over the commonly used integration methods was proved by three indexes on real biological datasets. Then, JEBIN was applied to study the gene co-expression patterns of hepatocellular carcinoma (HCC) based on multiple expression datasets of HCC and adjacent normal tissues, and further on latest HCC single-cell RNA-seq data. Results show that gene co-expressions are highly different between bulk and single-cell datasets. Finally, many differentially co-expressed ligand-receptor pairs were discovered by comparing HCC with adjacent normal data, providing candidate HCC targets for abnormal cell-cell communications.
Collapse
Affiliation(s)
- Guiying Wu
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiangyu Li
- School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China
| | - Wenbo Guo
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Zheng Wei
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Tao Hu
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Yiran Shan
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Jin Gu
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
7
|
Han M, Yuan L, Huang Y, Wang G, Du C, Wang Q, Zhang G. Integrated co-expression network analysis uncovers novel tissue-specific genes in major depressive disorder and bipolar disorder. Front Psychiatry 2022; 13:980315. [PMID: 36081461 PMCID: PMC9445988 DOI: 10.3389/fpsyt.2022.980315] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 08/01/2022] [Indexed: 11/13/2022] Open
Abstract
Tissue-specific gene expression has been found to be associated with multiple complex diseases including cancer, metabolic disease, aging, etc. However, few studies of brain-tissue-specific gene expression patterns have been reported, especially in psychiatric disorders. In this study, we performed joint analysis on large-scale transcriptome multi-tissue data to investigate tissue-specific expression patterns in major depressive disorder (MDD) and bipolar disorder (BP). We established the strategies of identifying tissues-specific modules, annotated pathways for elucidating biological functions of tissues, and tissue-specific genes based on weighted gene co-expression network analysis (WGCNA) and robust rank aggregation (RRA) with transcriptional profiling data from different human tissues and genome wide association study (GWAS) data, which have been expanded into overlapping tissue-specific modules and genes sharing with MDD and BP. Nine tissue-specific modules were identified and distributed across the four tissues in the MDD and six modules in the BP. In general, the annotated biological functions of differentially expressed genes (DEGs) in blood were mainly involved in MDD and BP progression through immune response, while those in the brain were in neuron and neuroendocrine response. Tissue-specific genes of the prefrontal cortex (PFC) in MDD-, such as IGFBP2 and HTR1A, were involved in disease-related functions, such as response to glucocorticoid, taste transduction, and tissue-specific genes of PFC in BP-, such as CHRM5 and LTB4R2, were involved in neuroactive ligand-receptor interaction. We also found PFC tissue-specific genes including SST and CRHBP were shared in MDD-BP, SST was enriched in neuroactive ligand-receptor interaction, and CRHBP shown was related to the regulation of hormone secretion and hormone transport.
Collapse
Affiliation(s)
- Mengyao Han
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Orthopaedic Department of Tongji Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China.,CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Liyun Yuan
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yuwei Huang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Guiying Wang
- Shanghai Key Laboratory of Signaling and Disease Research, Clinical and Translational Research Center of Shanghai First Maternity and Infant Hospital, Frontier Science Center for Stem Cell Research, National Stem Cell Translational Resource Center, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Changsheng Du
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Orthopaedic Department of Tongji Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Qingzhong Wang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.,Shanghai Key Laboratory of Compound Chinese Medicines, Institute of Chinese Materia Medica, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Guoqing Zhang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
8
|
WGCNA-Based Identification of Hub Genes and Key Pathways Involved in Nonalcoholic Fatty Liver Disease. BIOMED RESEARCH INTERNATIONAL 2021; 2021:5633211. [PMID: 34938809 PMCID: PMC8687832 DOI: 10.1155/2021/5633211] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 11/14/2021] [Accepted: 11/23/2021] [Indexed: 12/27/2022]
Abstract
Background The morbidity of nonalcoholic fatty liver disease (NAFLD) has been rising, but the pathogenesis of NAFLD is still elusive. This study is aimed at determining NAFLD-related hub genes based on weighted gene coexpression network analysis (WGCNA). Methods GSE126848 dataset based construction of coexpression networks was performed based on WGCNA. Database for Annotation, Visualization, and Integrated Discovery (DAVID) was utilized for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. Hub genes were identified and validated in independent datasets and mouse model. Results We found that the steelblue module was most significantly correlated with NAFLD. Total 15 hub genes (NDUFA9, UQCRQ, NDUFB8, COPS5, RPS17, UBL5, PSMA3, PSMA1, SF3B5, MRPL27, RPL26, PDCD5, PFDN6, SNRPD2, PSMB3) were derived from both the coexpression and PPI networks and considered “true” hub genes. Functional enrichment analysis showed that the hub genes were related to NAFLD pathway and oxidative phosphorylation. Independent dataset-based analysis and the establishment of NAFLD mouse model confirmed the involvement of two hub genes NDUFA9 and UQCRQ in the pathogenesis of NAFLD. Conclusions Oxidative phosphorylation and NAFLD pathway may be crucially involved in the pathogenesis of NAFLD, and two hub genes NDUFA9 and UQCRQ might be diagnostic biomarkers and therapeutic targets for NAFLD.
Collapse
|
9
|
Burns JJR, Shealy BT, Greer MS, Hadish JA, McGowan MT, Biggs T, Smith MC, Feltus FA, Ficklin SP. Addressing noise in co-expression network construction. Brief Bioinform 2021; 23:6446269. [PMID: 34850822 PMCID: PMC8769892 DOI: 10.1093/bib/bbab495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 10/25/2021] [Accepted: 10/28/2021] [Indexed: 11/13/2022] Open
Abstract
Gene co-expression networks (GCNs) provide multiple benefits to molecular research including hypothesis generation and biomarker discovery. Transcriptome profiles serve as input for GCN construction and are derived from increasingly larger studies with samples across multiple experimental conditions, treatments, time points, genotypes, etc. Such experiments with larger numbers of variables confound discovery of true network edges, exclude edges and inhibit discovery of context (or condition) specific network edges. To demonstrate this problem, a 475-sample dataset is used to show that up to 97% of GCN edges can be misleading because correlations are false or incorrect. False and incorrect correlations can occur when tests are applied without ensuring assumptions are met, and pairwise gene expression may not meet test assumptions if the expression of at least one gene in the pairwise comparison is a function of multiple confounding variables. The ‘one-size-fits-all’ approach to GCN construction is therefore problematic for large, multivariable datasets. Recently, the Knowledge Independent Network Construction toolkit has been used in multiple studies to provide a dynamic approach to GCN construction that ensures statistical tests meet assumptions and confounding variables are addressed. Additionally, it can associate experimental context for each edge of the network resulting in context-specific GCNs (csGCNs). To help researchers recognize such challenges in GCN construction, and the creation of csGCNs, we provide a review of the workflow.
Collapse
Affiliation(s)
- Joshua J R Burns
- Department of Horticulture, 149 Johnson Hall. Washington State University, Pullman, WA 99164. USA
| | - Benjamin T Shealy
- Department of Electrical & Computer Engineering, 105 Riggs Hall. Clemson University, Clemson, SC 29631. USA
| | - Mitchell S Greer
- School of Electrical Engineering and Computer Science, EME 102. Washington State University, Pullman, WA 99164. USA
| | - John A Hadish
- Molecular Plant Sciences Program, French Ad 324g. Washington State University, Pullman, WA 99164. USA
| | - Matthew T McGowan
- Molecular Plant Sciences Program, French Ad 324g. Washington State University, Pullman, WA 99164. USA
| | - Tyler Biggs
- Department of Horticulture, 149 Johnson Hall. Washington State University, Pullman, WA 99164. USA
| | - Melissa C Smith
- Department of Electrical & Computer Engineering, 105 Riggs Hall. Clemson University, Clemson, SC 29631. USA
| | - F Alex Feltus
- Department of Genetics and Biochemistry, 130 McGinty Court. Clemson University, Clemson, SC 29634. USA.,Biomedical Data Science & Informatics Program, 100 McAdams Hall. Clemson University, Clemson, SC 29634. USA.,Clemson Center for Human Genetics, 114 Gregor Mendel Circle, Greenwood, SC 29646. USA
| | - Stephen P Ficklin
- Department of Horticulture, 149 Johnson Hall. Washington State University, Pullman, WA 99164. USA.,School of Electrical Engineering and Computer Science, EME 102. Washington State University, Pullman, WA 99164. USA
| |
Collapse
|
10
|
Zhang P, Southey BR, Sweedler JV, Pradhan A, Rodriguez-Zas SL. Enhanced Understanding of Molecular Interactions and Function Underlying Pain Processes Through Networks of Transcript Isoforms, Genes, and Gene Families. Adv Appl Bioinform Chem 2021; 14:49-69. [PMID: 33633454 PMCID: PMC7901473 DOI: 10.2147/aabc.s284986] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 01/05/2021] [Indexed: 11/23/2022] Open
Abstract
Introduction Molecular networks based on the abundance of mRNA at the gene level and pathway networks that relate families or groups of paralog genes have supported the understanding of interactions between molecules. However, multiple molecular mechanisms underlying health and behavior, such as pain signal processing, are modulated by the abundances of the transcript isoforms that originate from alternative splicing, in addition to gene abundances. Alternative splice variants of growth factors, ion channels, and G-protein-coupled receptors can code for proteoforms that can have different effects on pain and nociception. Therefore, networks inferred using abundance from more agglomerative molecular units (eg, gene family, or gene) have limitations in capturing interactions at a more granular level (eg, gene, or transcript isoform, respectively) do not account for changes in the abundance at the transcript isoform level. Objective The objective of this study was to evaluate the relative benefits of network inference using abundance patterns at various aggregate levels. Methods Sparse networks were inferred using Gaussian Markov random fields and a novel aggregation criterion was used to aggregate network edges. The relative advantages of network aggregation were evaluated on two molecular systems that have different dimensions and connectivity, circadian rhythm and Toll-like receptor pathways, using RNA-sequencing data from mice representing two pain level groups, opioid-induced hyperalgesia and control, and two central nervous system regions, the nucleus accumbens and the trigeminal ganglia. Results The inferred networks were benchmarked against the Kyoto Encyclopedia of Genes and Genomes reference pathways using multiple criteria. Networks inferred using more granular information performed better than networks inferred using more aggregate information. The advantage of granular inference varied with the pathway and data set used. Discussion The differences in inferred network structure between data sets highlight the differences in OIH effect between central nervous system regions. Our findings suggest that inference of networks using alternative splicing variants can offer complementary insights into the relationship between genes and gene paralog groups.
Collapse
Affiliation(s)
- Pan Zhang
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Bruce R Southey
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Jonathan V Sweedler
- Department of Chemistry and the Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Amynah Pradhan
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA
| | - Sandra L Rodriguez-Zas
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL, USA.,Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA.,Department of Statistics, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
11
|
Portes LL, Small M. Navigating differential structures in complex networks. Phys Rev E 2021; 102:062301. [PMID: 33466036 DOI: 10.1103/physreve.102.062301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Accepted: 11/20/2020] [Indexed: 11/07/2022]
Abstract
Structural changes in a network representation of a system, due to different experimental conditions, different connectivity across layers, or to its time evolution, can provide insight on its organization, function, and on how it responds to external perturbations. The deeper understanding of how gene networks cope with diseases and treatments is maybe the most incisive demonstration of the gains obtained through this differential network analysis point of view, which led to an explosion of new numeric techniques in the last decade. However, where to focus one's attention, or how to navigate through the differential structures in the context of large networks, can be overwhelming even for a few experimental conditions. In this paper, we propose a theory and a methodological implementation for the characterization of shared "structural roles" of nodes simultaneously within and between networks. Inspired by recent methodological advances in chaotic phase synchronization analysis, we show how the information about the shared structures of a set of networks can be split and organized in an automatic fashion, in scenarios with very different (i) community sizes, (ii) total number of communities, and (iii) even for a large number of 100 networks compared using numerical benchmarks generated by a stochastic block model. Then, we investigate how the network size, number of networks, and mean size of communities influence the method performance in a series of Monte Carlo experiments. To illustrate its potential use in a more challenging scenario with real-world data, we show evidence that the method can still split and organize the structural information of a set of four gene coexpression networks obtained from two cell types × two treatments (interferon-β stimulated or control). Aside from its potential use as for automatic feature extraction and preprocessing tool, we discuss that another strength of the method is its "story-telling"-like characterization of the information encoded in a set of networks, which can be used to pinpoint unexpected shared structure, leading to further investigations and providing new insights. Finally, the method is flexible to address different research-field-specific questions, by not restricting what scientific-meaningful characteristic (or relevant feature) of a node shall be used.
Collapse
Affiliation(s)
- Leonardo L Portes
- Complex Systems Group, Department of Mathematics and Statistics, University of Western Australia, Nedlands, Perth, WA 6009, Australia
| | - Michael Small
- Complex Systems Group, Department of Mathematics and Statistics, University of Western Australia, Nedlands, Perth, WA 6009, Australia.,Mineral Resources, CSIRO, Kensington, Perth, WA 6151, Australia
| |
Collapse
|
12
|
Zoppi J, Guillaume JF, Neunlist M, Chaffron S. MiBiOmics: an interactive web application for multi-omics data exploration and integration. BMC Bioinformatics 2021; 22:6. [PMID: 33407076 PMCID: PMC7789220 DOI: 10.1186/s12859-020-03921-8] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 12/02/2020] [Indexed: 12/12/2022] Open
Abstract
Background Multi-omics experimental approaches are becoming common practice in biological and medical sciences underlining the need to design new integrative techniques and applications to enable the multi-scale characterization of biological systems. The integrative analysis of heterogeneous datasets generally allows to acquire additional insights and generate novel hypotheses about a given biological system. However, it can become challenging given the often-large size of omics datasets and the diversity of existing techniques. Moreover, visualization tools for interpretation are usually non-accessible to biologists without programming skills. Results Here, we present MiBiOmics, a web-based and standalone application that facilitates multi-omics data visualization, exploration, integration, and analysis by providing easy access to dedicated and interactive protocols. It implements classical ordination techniques and the inference of omics-based (multilayer) networks to mine complex biological systems, and identify robust biomarkers linked to specific contextual parameters or biological states. Conclusions MiBiOmics provides easy-access to exploratory ordination techniques and to a network-based approach for integrative multi-omics analyses through an intuitive and interactive interface. MiBiOmics is currently available as a Shiny app at https://shiny-bird.univ-nantes.fr/app/Mibiomics and as a standalone application at https://gitlab.univ-nantes.fr/combi-ls2n/mibiomics.
Collapse
Affiliation(s)
| | - Jean-François Guillaume
- CHU Nantes, Inserm, CNRS, SFR Santé, Inserm UMS016, CNRS UMS 3556, Université de Nantes, 44000, Nantes, France
| | | | - Samuel Chaffron
- CNRS UMR6004, LS2N, Université de Nantes, 44000, Nantes, France. .,Research Federation (FR2022) Tara Oceans GO-SEE, Paris, France.
| |
Collapse
|
13
|
Savino A, Provero P, Poli V. Differential Co-Expression Analyses Allow the Identification of Critical Signalling Pathways Altered during Tumour Transformation and Progression. Int J Mol Sci 2020; 21:E9461. [PMID: 33322692 PMCID: PMC7764314 DOI: 10.3390/ijms21249461] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 12/02/2020] [Accepted: 12/09/2020] [Indexed: 02/02/2023] Open
Abstract
Biological systems respond to perturbations through the rewiring of molecular interactions, organised in gene regulatory networks (GRNs). Among these, the increasingly high availability of transcriptomic data makes gene co-expression networks the most exploited ones. Differential co-expression networks are useful tools to identify changes in response to an external perturbation, such as mutations predisposing to cancer development, and leading to changes in the activity of gene expression regulators or signalling. They can help explain the robustness of cancer cells to perturbations and identify promising candidates for targeted therapy, moreover providing higher specificity with respect to standard co-expression methods. Here, we comprehensively review the literature about the methods developed to assess differential co-expression and their applications to cancer biology. Via the comparison of normal and diseased conditions and of different tumour stages, studies based on these methods led to the definition of pathways involved in gene network reorganisation upon oncogenes' mutations and tumour progression, often converging on immune system signalling. A relevant implementation still lagging behind is the integration of different data types, which would greatly improve network interpretability. Most importantly, performance and predictivity evaluation of the large variety of mathematical models proposed would urgently require experimental validations and systematic comparisons. We believe that future work on differential gene co-expression networks, complemented with additional omics data and experimentally tested, will considerably improve our insights into the biology of tumours.
Collapse
Affiliation(s)
- Aurora Savino
- Molecular Biotechnology Center, Department of Molecular Biotechnology and Health Sciences, University of Turin, Via Nizza 52, 10126 Turin, Italy
| | - Paolo Provero
- Department of Neurosciences “Rita Levi Montalcini”, University of Turin, Corso Massimo D’Ázeglio 52, 10126 Turin, Italy;
- Center for Omics Sciences, Ospedale San Raffaele IRCCS, Via Olgettina 60, 20132 Milan, Italy
| | - Valeria Poli
- Molecular Biotechnology Center, Department of Molecular Biotechnology and Health Sciences, University of Turin, Via Nizza 52, 10126 Turin, Italy
| |
Collapse
|
14
|
Sailani MR, Metwally AA, Zhou W, Rose SMSF, Ahadi S, Contrepois K, Mishra T, Zhang MJ, Kidziński Ł, Chu TJ, Snyder MP. Deep longitudinal multiomics profiling reveals two biological seasonal patterns in California. Nat Commun 2020; 11:4933. [PMID: 33004787 PMCID: PMC7529769 DOI: 10.1038/s41467-020-18758-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Accepted: 08/21/2020] [Indexed: 02/06/2023] Open
Abstract
The influence of seasons on biological processes is poorly understood. In order to identify biological seasonal patterns based on diverse molecular data, rather than calendar dates, we performed a deep longitudinal multiomics profiling of 105 individuals over 4 years. Here, we report more than 1000 seasonal variations in omics analytes and clinical measures. The different molecules group into two major seasonal patterns which correlate with peaks in late spring and late fall/early winter in California. The two patterns are enriched for molecules involved in human biological processes such as inflammation, immunity, cardiovascular health, as well as neurological and psychiatric conditions. Lastly, we identify molecules and microbes that demonstrate different seasonal patterns in insulin sensitive and insulin resistant individuals. The results of our study have important implications in healthcare and highlight the value of considering seasonality when assessing population wide health risk and management.
Collapse
Affiliation(s)
- M Reza Sailani
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Ahmed A Metwally
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Wenyu Zhou
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | | | - Sara Ahadi
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Kevin Contrepois
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Tejaswini Mishra
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Martin Jinye Zhang
- Department of Electrical Engineering, Stanford University, Stanford, CA, 94305, USA
| | - Łukasz Kidziński
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA
| | - Theodore J Chu
- Department of Pediatrics, Division of Allergy and Immunology, Stanford University, Stanford, CA, 94305, USA
| | - Michael P Snyder
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
15
|
Choi D, Lee S. SNeCT: Scalable Network Constrained Tucker Decomposition for Multi-Platform Data Profiling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1785-1796. [PMID: 30908262 DOI: 10.1109/tcbb.2019.2906205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
How do we integratively profile large-scale multi-platform genomic data that are high dimensional and sparse? Furthermore, how can we incorporate prior knowledge, such as the association between genes, in the analysis systematically to find better latent relationships? To solve this problem, we propose a Scalable Network Constrained Tucker decomposition method (SNeCT). SNeCT adopts parallel stochastic gradient descent approach on the proposed parallelizable network constrained optimization function. SNeCT decomposition is applied to a tensor constructed from a large scale multi-platform multi-cohort cancer data, PanCan12, constrained on a network built from PathwayCommons database. The decomposed factor matrices are applied to stratify cancers, to search for top- k similar patients given a new patient, and to illustrate how the matrices can be used to identify significant genomic patterns in each patient. In the stratification test, combined twelve-cohort data is clustered to form thirteen subclasses. The similarity of the top- k patient to the query was high for 23 clinical features, including estrogen/progesterone receptor statuses of BRCA patients with average precision value ranges from 0.72 to 0.86 and from 0.68 to 0.86, respectively. We also illustrate how the factor matrices can be used for identifying significant patterns for each patient. Resources are available at: https://github.com/leesael/SNeCT.
Collapse
|
16
|
Chowdhury HA, Bhattacharyya DK, Kalita JK. (Differential) Co-Expression Analysis of Gene Expression: A Survey of Best Practices. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1154-1173. [PMID: 30668502 DOI: 10.1109/tcbb.2019.2893170] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Analysis of gene expression data is widely used in transcriptomic studies to understand functions of molecules inside a cell and interactions among molecules. Differential co-expression analysis studies diseases and phenotypic variations by finding modules of genes whose co-expression patterns vary across conditions. We review the best practices in gene expression data analysis in terms of analysis of (differential) co-expression, co-expression network, differential networking, and differential connectivity considering both microarray and RNA-seq data along with comparisons. We highlight hurdles in RNA-seq data analysis using methods developed for microarrays. We include discussion of necessary tools for gene expression analysis throughout the paper. In addition, we shed light on scRNA-seq data analysis by including preprocessing and scRNA-seq in co-expression analysis along with useful tools specific to scRNA-seq. To get insights, biological interpretation and functional profiling is included. Finally, we provide guidelines for the analyst, along with research issues and challenges which should be addressed.
Collapse
|
17
|
Erola P, Björkegren JLM, Michoel T. Model-based clustering of multi-tissue gene expression data. Bioinformatics 2020; 36:1807-1813. [PMID: 31688915 PMCID: PMC7162352 DOI: 10.1093/bioinformatics/btz805] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2018] [Revised: 09/05/2019] [Accepted: 10/31/2019] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Recently, it has become feasible to generate large-scale, multi-tissue gene expression data, where expression profiles are obtained from multiple tissues or organs sampled from dozens to hundreds of individuals. When traditional clustering methods are applied to this type of data, important information is lost, because they either require all tissues to be analyzed independently, ignoring dependencies and similarities between tissues, or to merge tissues in a single, monolithic dataset, ignoring individual characteristics of tissues. RESULTS We developed a Bayesian model-based multi-tissue clustering algorithm, revamp, which can incorporate prior information on physiological tissue similarity, and which results in a set of clusters, each consisting of a core set of genes conserved across tissues as well as differential sets of genes specific to one or more subsets of tissues. Using data from seven vascular and metabolic tissues from over 100 individuals in the STockholm Atherosclerosis Gene Expression (STAGE) study, we demonstrate that multi-tissue clusters inferred by revamp are more enriched for tissue-dependent protein-protein interactions compared to alternative approaches. We further demonstrate that revamp results in easily interpretable multi-tissue gene expression associations to key coronary artery disease processes and clinical phenotypes in the STAGE individuals. AVAILABILITY AND IMPLEMENTATION Revamp is implemented in the Lemon-Tree software, available at https://github.com/eb00/lemon-tree. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pau Erola
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Midlothian EH25 9RG, UK
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol BS8 2BN, UK
| | - Johan L M Björkegren
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Integrated Cardio Metabolic Centre (ICMC), Karolinska Institutet, Huddinge 141 57, Sweden
| | - Tom Michoel
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Midlothian EH25 9RG, UK
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen N-5020, Norway
| |
Collapse
|
18
|
Wang J, Hossain MS, Lyu Z, Schmutz J, Stacey G, Xu D, Joshi T. SoyCSN: Soybean context-specific network analysis and prediction based on tissue-specific transcriptome data. PLANT DIRECT 2019; 3:e00167. [PMID: 31549018 PMCID: PMC6747016 DOI: 10.1002/pld3.167] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 08/12/2019] [Accepted: 08/20/2019] [Indexed: 05/04/2023]
Abstract
The Soybean Gene Atlas project provides a comprehensive map for understanding gene expression patterns in major soybean tissues from flower, root, leaf, nodule, seed, and shoot and stem. The RNA-Seq data generated in the project serve as a valuable resource for discovering tissue-specific transcriptome behavior of soybean genes in different tissues. We developed a computational pipeline for Soybean context-specific network (SoyCSN) inference with a suite of prediction tools to analyze, annotate, retrieve, and visualize soybean context-specific networks at both transcriptome and interactome levels. BicMix and Cross-Conditions Cluster Detection algorithms were applied to detect modules based on co-expression relationships across all the tissues. Soybean context-specific interactomes were predicted by combining soybean tissue gene expression and protein-protein interaction data. Functional analyses of these predicted networks provide insights into soybean tissue specificities. For example, under symbiotic, nitrogen-fixing conditions, the constructed soybean leaf network highlights the connection between the photosynthesis function and rhizobium-legume symbiosis. SoyCSN data and all its results are publicly available via an interactive web service within the Soybean Knowledge Base (SoyKB) at http://soykb.org/SoyCSN. SoyCSN provides a useful web-based access for exploring context specificities systematically in gene regulatory mechanisms and gene relationships for soybean researchers and molecular breeders.
Collapse
Affiliation(s)
- Juexin Wang
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriSt. LouisMOUSA
- Christopher S. Bond Life Sciences CenterUniversity of MissouriSt. LouisMOUSA
| | - Md Shakhawat Hossain
- Christopher S. Bond Life Sciences CenterUniversity of MissouriSt. LouisMOUSA
- Divisions of Plant Science and BiochemistryUniversity of MissouriSt. LouisMOUSA
| | - Zhen Lyu
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriSt. LouisMOUSA
| | - Jeremy Schmutz
- HudsonAlpha Institute for BiotechnologyHuntsvilleALUSA
- DOE Joint Genome InstituteWalnut CreekCAUSA
| | - Gary Stacey
- Christopher S. Bond Life Sciences CenterUniversity of MissouriSt. LouisMOUSA
- Divisions of Plant Science and BiochemistryUniversity of MissouriSt. LouisMOUSA
| | - Dong Xu
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriSt. LouisMOUSA
- Christopher S. Bond Life Sciences CenterUniversity of MissouriSt. LouisMOUSA
- Informatics InstituteUniversity of MissouriSt. LouisMOUSA
| | - Trupti Joshi
- Christopher S. Bond Life Sciences CenterUniversity of MissouriSt. LouisMOUSA
- Informatics InstituteUniversity of MissouriSt. LouisMOUSA
- Department of Health Management and Informatics and Office of ResearchSchool of MedicineUniversity of MissouriSt. LouisMOUSA
| |
Collapse
|
19
|
Sonawane AR, Weiss ST, Glass K, Sharma A. Network Medicine in the Age of Biomedical Big Data. Front Genet 2019; 10:294. [PMID: 31031797 PMCID: PMC6470635 DOI: 10.3389/fgene.2019.00294] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Accepted: 03/19/2019] [Indexed: 12/13/2022] Open
Abstract
Network medicine is an emerging area of research dealing with molecular and genetic interactions, network biomarkers of disease, and therapeutic target discovery. Large-scale biomedical data generation offers a unique opportunity to assess the effect and impact of cellular heterogeneity and environmental perturbations on the observed phenotype. Marrying the two, network medicine with biomedical data provides a framework to build meaningful models and extract impactful results at a network level. In this review, we survey existing network types and biomedical data sources. More importantly, we delve into ways in which the network medicine approach, aided by phenotype-specific biomedical data, can be gainfully applied. We provide three paradigms, mainly dealing with three major biological network archetypes: protein-protein interaction, expression-based, and gene regulatory networks. For each of these paradigms, we discuss a broad overview of philosophies under which various network methods work. We also provide a few examples in each paradigm as a test case of its successful application. Finally, we delineate several opportunities and challenges in the field of network medicine. We hope this review provides a lexicon for researchers from biological sciences and network theory to come on the same page to work on research areas that require interdisciplinary expertise. Taken together, the understanding gained from combining biomedical data with networks can be useful for characterizing disease etiologies and identifying therapeutic targets, which, in turn, will lead to better preventive medicine with translational impact on personalized healthcare.
Collapse
Affiliation(s)
- Abhijeet R. Sonawane
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Scott T. Weiss
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Amitabh Sharma
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
- Center for Interdisciplinary Cardiovascular Sciences, Cardiovascular Division, Brigham and Women’s Hospital, Boston, MA, United States
| |
Collapse
|
20
|
Wang P, Gao L, Hu Y, Li F. Feature related multi-view nonnegative matrix factorization for identifying conserved functional modules in multiple biological networks. BMC Bioinformatics 2018; 19:394. [PMID: 30373534 PMCID: PMC6206826 DOI: 10.1186/s12859-018-2434-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 10/15/2018] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Comprehensive analyzing multi-omics biological data in different conditions is important for understanding biological mechanism in system level. Multiple or multi-layer network model gives us a new insight into simultaneously analyzing these data, for instance, to identify conserved functional modules in multiple biological networks. However, because of the larger scale and more complicated structure of multiple networks than single network, how to accurate and efficient detect conserved functional biological modules remains a significant challenge. RESULTS Here, we propose an efficient method, named ConMod, to discover conserved functional modules in multiple biological networks. We introduce two features to characterize multiple networks, thus all networks are compressed into two feature matrices. The module detection is only performed in the feature matrices by using multi-view non-negative matrix factorization (NMF), which is independent of the number of input networks. Experimental results on both synthetic and real biological networks demonstrate that our method is promising in identifying conserved modules in multiple networks since it improves the accuracy and efficiency comparing with state-of-the-art methods. Furthermore, applying ConMod to co-expression networks of different cancers, we find cancer shared gene modules, the majority of which have significantly functional implications, such as ribosome biogenesis and immune response. In addition, analyzing on brain tissue-specific protein interaction networks, we detect conserved modules related to nervous system development, mRNA processing, etc. CONCLUSIONS: ConMod facilitates finding conserved modules in any number of networks with a low time and space complexity, thereby serve as a valuable tool for inference shared traits and biological functions of multiple biological system.
Collapse
Affiliation(s)
- Peizhuo Wang
- School of Computer Science and Technology, Xidian University, Xi’an, 710071 China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an, 710071 China
| | - Yuxuan Hu
- School of Computer Science and Technology, Xidian University, Xi’an, 710071 China
| | - Feng Li
- School of Computer Science and Technology, Xidian University, Xi’an, 710071 China
| |
Collapse
|
21
|
Condition-adaptive fused graphical lasso (CFGL): An adaptive procedure for inferring condition-specific gene co-expression network. PLoS Comput Biol 2018; 14:e1006436. [PMID: 30240439 PMCID: PMC6173447 DOI: 10.1371/journal.pcbi.1006436] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Revised: 10/05/2018] [Accepted: 08/15/2018] [Indexed: 12/14/2022] Open
Abstract
Co-expression network analysis provides useful information for studying gene regulation in biological processes. Examining condition-specific patterns of co-expression can provide insights into the underlying cellular processes activated in a particular condition. One challenge in this type of analysis is that the sample sizes in each condition are usually small, making the statistical inference of co-expression patterns highly underpowered. A joint network construction that borrows information from related structures across conditions has the potential to improve the power of the analysis. One possible approach to constructing the co-expression network is to use the Gaussian graphical model. Though several methods are available for joint estimation of multiple graphical models, they do not fully account for the heterogeneity between samples and between co-expression patterns introduced by condition specificity. Here we develop the condition-adaptive fused graphical lasso (CFGL), a data-driven approach to incorporate condition specificity in the estimation of co-expression networks. We show that this method improves the accuracy with which networks are learned. The application of this method on a rat multi-tissue dataset and The Cancer Genome Atlas (TCGA) breast cancer dataset provides interesting biological insights. In both analyses, we identify numerous modules enriched for Gene Ontology functions and observe that the modules that are upregulated in a particular condition are often involved in condition-specific activities. Interestingly, we observe that the genes strongly associated with survival time in the TCGA dataset are less likely to be network hubs, suggesting that genes associated with cancer progression are likely to govern specific functions or execute final biological functions in pathways, rather than regulating a large number of biological processes. Additionally, we observed that the tumor-specific hub genes tend to have few shared edges with normal tissue, revealing tumor-specific regulatory mechanism. Gene co-expression networks provide insights into the mechanism of cellular activity and gene regulation. Condition-specific mechanisms may be identified by constructing and comparing co-expression networks of multiple conditions. We propose a novel statistical method to jointly construct co-expression networks for gene expression profiles from multiple conditions. By using a data-driven approach to capture condition-specific co-expression patterns, this method is effective in identifying both co-expression patterns that are specific to a condition and that are common across conditions. The application of this method to real datasets reveals interesting biological insights.
Collapse
|
22
|
Aiello KA, Ponnapalli SP, Alter O. Mathematically universal and biologically consistent astrocytoma genotype encodes for transformation and predicts survival phenotype. APL Bioeng 2018; 2. [PMID: 30397684 PMCID: PMC6215493 DOI: 10.1063/1.5037882] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
DNA alterations have been observed in astrocytoma for decades. A copy-number genotype predictive of a survival phenotype was only discovered by using the generalized singular value decomposition (GSVD) formulated as a comparative spectral decomposition. Here, we use the GSVD to compare whole-genome sequencing (WGS) profiles of patient-matched astrocytoma and normal DNA. First, the GSVD uncovers a genome-wide pattern of copy-number alterations, which is bounded by patterns recently uncovered by the GSVDs of microarray-profiled patient-matched glioblastoma (GBM) and, separately, lower-grade astrocytoma and normal genomes. Like the microarray patterns, the WGS pattern is correlated with an approximately one-year median survival time. By filling in gaps in the microarray patterns, the WGS pattern reveals that this biologically consistent genotype encodes for transformation via the Notch together with the Ras and Shh pathways. Second, like the GSVDs of the microarray profiles, the GSVD of the WGS profiles separates the tumor-exclusive pattern from normal copy-number variations and experimental inconsistencies. These include the WGS technology-specific effects of guanine-cytosine content variations across the genomes that are correlated with experimental batches. Third, by identifying the biologically consistent phenotype among the WGS-profiled tumors, the GBM pattern proves to be a technology-independent predictor of survival and response to chemotherapy and radiation, statistically better than the patient's age and tumor's grade, the best other indicators, and MGMT promoter methylation and IDH1 mutation. We conclude that by using the complex structure of the data, comparative spectral decompositions underlie a mathematically universal description of the genotype-phenotype relations in cancer that other methods miss.
Collapse
Affiliation(s)
- Katherine A Aiello
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah 84112, USA.,Department of Bioengineering, University of Utah, Salt Lake City, Utah 84112, USA
| | - Sri Priya Ponnapalli
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah 84112, USA
| | - Orly Alter
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah 84112, USA.,Department of Bioengineering, University of Utah, Salt Lake City, Utah 84112, USA.,Huntsman Cancer Institute and Department of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA
| |
Collapse
|
23
|
van Dam S, Võsa U, van der Graaf A, Franke L, de Magalhães JP. Gene co-expression analysis for functional classification and gene-disease predictions. Brief Bioinform 2018; 19:575-592. [PMID: 28077403 PMCID: PMC6054162 DOI: 10.1093/bib/bbw139] [Citation(s) in RCA: 431] [Impact Index Per Article: 71.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Revised: 12/01/2016] [Indexed: 01/06/2023] Open
Abstract
Gene co-expression networks can be used to associate genes of unknown function with biological processes, to prioritize candidate disease genes or to discern transcriptional regulatory programmes. With recent advances in transcriptomics and next-generation sequencing, co-expression networks constructed from RNA sequencing data also enable the inference of functions and disease associations for non-coding genes and splice variants. Although gene co-expression networks typically do not provide information about causality, emerging methods for differential co-expression analysis are enabling the identification of regulatory genes underlying various phenotypes. Here, we introduce and guide researchers through a (differential) co-expression analysis. We provide an overview of methods and tools used to create and analyse co-expression networks constructed from gene expression data, and we explain how these can be used to identify genes with a regulatory role in disease. Furthermore, we discuss the integration of other data types with co-expression networks and offer future perspectives of co-expression analysis.
Collapse
Affiliation(s)
- Sipko van Dam
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | - Urmo Võsa
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | | | - Lude Franke
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | | |
Collapse
|
24
|
Abstract
The diversity and huge omics data take biology and biomedicine research and application into a big data era, just like that popular in human society a decade ago. They are opening a new challenge from horizontal data ensemble (e.g., the similar types of data collected from different labs or companies) to vertical data ensemble (e.g., the different types of data collected for a group of person with match information), which requires the integrative analysis in biology and biomedicine and also asks for emergent development of data integration to address the great changes from previous population-guided to newly individual-guided investigations.Data integration is an effective concept to solve the complex problem or understand the complicate system. Several benchmark studies have revealed the heterogeneity and trade-off that existed in the analysis of omics data. Integrative analysis can combine and investigate many datasets in a cost-effective reproducible way. Current integration approaches on biological data have two modes: one is "bottom-up integration" mode with follow-up manual integration, and the other one is "top-down integration" mode with follow-up in silico integration.This paper will firstly summarize the combinatory analysis approaches to give candidate protocol on biological experiment design for effectively integrative study on genomics and then survey the data fusion approaches to give helpful instruction on computational model development for biological significance detection, which have also provided newly data resources and analysis tools to support the precision medicine dependent on the big biomedical data. Finally, the problems and future directions are highlighted for integrative analysis of omics big data.
Collapse
Affiliation(s)
- Xiang-Tian Yu
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, China.
| |
Collapse
|
25
|
Yu W, Zhao S, Wang Y, Zhao BN, Zhao W, Zhou X. Identification of cancer prognosis-associated functional modules using differential co-expression networks. Oncotarget 2017; 8:112928-112941. [PMID: 29348878 PMCID: PMC5762563 DOI: 10.18632/oncotarget.22878] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Accepted: 11/15/2017] [Indexed: 01/23/2023] Open
Abstract
The rapid accumulation of cancer-related data owing to high-throughput technologies has provided unprecedented choices to understand the progression of cancer and discover functional networks in multiple cancers. Establishment of co-expression networks will help us to discover the systemic properties of carcinogenesis features and regulatory mechanisms of multiple cancers. Here, we proposed a computational workflow to identify differentially co-expressed gene modules across 8 cancer types by using combined gene differential expression analysis methods and a higher-order generalized singular value decomposition. Four co-expression modules were identified; and oncogenes and tumor suppressors were significantly enriched in these modules. Functional enrichment analysis demonstrated the significantly enriched pathways in these modules, including ECM-receptor interaction, focal adhesion and PI3K-Akt signaling pathway. The top-ranked miRNAs (mir-199, mir-29, mir-200) and transcription factors (FOXO4, E2A, NFAT, and MAZ) were identified, which play an important role in deregulating cellular energetics; and regulating angiogenesis and cancer immune system. The clinical significance of the co-expressed gene clusters was assessed by evaluating their predictability of cancer patients’ survival. The predictive power of different clusters and subclusters was demonstrated. Our results will be valuable in cancer-related gene function annotation and for the evaluation of cancer patients’ prognosis.
Collapse
Affiliation(s)
- Wenshuai Yu
- Key Laboratory of Embedded System and Service Computing, College of Electronics and Information Engineering, The Ministry of Education, Tongji University, Shanghai, China
| | - Shengjie Zhao
- Key Laboratory of Embedded System and Service Computing, College of Electronics and Information Engineering, The Ministry of Education, Tongji University, Shanghai, China.,College of Software Engineering, Tongji University, Shanghai, China
| | - Yongcui Wang
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China
| | | | - Weiling Zhao
- Department of Radiology and Comprehensive Cancer Center, Wake Forest University School of Medicine, Winston Salem, NC, USA
| | - Xiaobo Zhou
- College of Electronics and Information Engineering, Tongji University, Shanghai, China.,Center for Big Data Sciences and Network Security, Tongji University, Shanghai, China.,Center for Bioinformatics and System Biology, Wake Forest University School of Medicine, Winston Salem, NC, USA
| |
Collapse
|
26
|
Saha A, Kim Y, Gewirtz ADH, Jo B, Gao C, McDowell IC, Engelhardt BE, Battle A. Co-expression networks reveal the tissue-specific regulation of transcription and splicing. Genome Res 2017; 27:1843-1858. [PMID: 29021288 PMCID: PMC5668942 DOI: 10.1101/gr.216721.116] [Citation(s) in RCA: 106] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 08/22/2017] [Indexed: 11/24/2022]
Abstract
Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues.
Collapse
|
27
|
Ma S, Ding Z, Li P. Maize network analysis revealed gene modules involved in development, nutrients utilization, metabolism, and stress response. BMC PLANT BIOLOGY 2017; 17:131. [PMID: 28764653 PMCID: PMC5540570 DOI: 10.1186/s12870-017-1077-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Accepted: 07/19/2017] [Indexed: 06/07/2023]
Abstract
BACKGROUND The advent of big data in biology offers opportunities while poses challenges to derive biological insights. For maize, a large amount of publicly available transcriptome datasets have been generated but a comprehensive analysis is lacking. RESULTS We constructed a maize gene co-expression network based on the graphical Gaussian model, using massive RNA-seq data. The network, containing 20,269 genes, assembles into 964 gene modules that function in a variety of plant processes, such as cell organization, the development of inflorescences, ligules and kernels, the uptake and utilization of nutrients (e.g. nitrogen and phosphate), the metabolism of benzoxazionids, oxylipins, flavonoids, and wax, and the response to stresses. Among them, the inflorescences development module is enriched with domestication genes (like ra1, ba1, gt1, tb1, tga1) that control plant architecture and kernel structure, while multiple other modules relate to diverse agronomic traits. Contained within these modules are transcription factors acting as known or potential expression regulators for the genes within the same modules, suggesting them as candidate regulators for related biological processes. A comparison with an established Arabidopsis network revealed conserved gene association patterns for specific modules involved in cell organization, nutrients uptake & utilization, and metabolism. The analysis also identified significant divergences between the two species for modules that orchestrate developmental pathways. CONCLUSIONS This network sheds light on how gene modules are organized between different species in the context of evolutionary divergence and highlights modules whose structure and gene content can provide important resources for maize gene functional studies with application potential.
Collapse
Affiliation(s)
- Shisong Ma
- School of Life Sciences, University of Science and Technology of China, Hefei, Anhui China
| | - Zehong Ding
- The Institute of Tropical Bioscience and Biotechnology, Chinese Academy of Tropical Agricultural Sciences, Haikou, Hainan China
| | - Pinghua Li
- State Key Laboratory of Crop Biology, College of Agronomy, Shandong Agricultural University, Tai’an, Shandong China
| |
Collapse
|
28
|
Integrative clustering of multi-level 'omic data based on non-negative matrix factorization algorithm. PLoS One 2017; 12:e0176278. [PMID: 28459819 PMCID: PMC5411077 DOI: 10.1371/journal.pone.0176278] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 04/07/2017] [Indexed: 11/30/2022] Open
Abstract
Integrative analyses of high-throughput ‘omic data, such as DNA methylation, DNA copy number alteration, mRNA and protein expression levels, have created unprecedented opportunities to understand the molecular basis of human disease. In particular, integrative analyses have been the cornerstone in the study of cancer to determine molecular subtypes within a given cancer. As malignant tumors with similar morphological characteristics have been shown to exhibit entirely different molecular profiles, there has been significant interest in using multiple ‘omic data for the identification of novel molecular subtypes of disease, which could impact treatment decisions. Therefore, we have developed intNMF, an integrative approach for disease subtype classification based on non-negative matrix factorization. The proposed approach carries out integrative clustering of multiple high dimensional molecular data in a single comprehensive analysis utilizing the information across multiple biological levels assessed on the same individual. As intNMF does not assume any distributional form for the data, it has obvious advantages over other model based clustering methods which require specific distributional assumptions. Application of intNMF is illustrated using both simulated and real data from The Cancer Genome Atlas (TCGA).
Collapse
|
29
|
Moreno-Moral A, Pesce F, Behmoaras J, Petretto E. Systems Genetics as a Tool to Identify Master Genetic Regulators in Complex Disease. Methods Mol Biol 2017; 1488:337-362. [PMID: 27933533 DOI: 10.1007/978-1-4939-6427-7_16] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Systems genetics stems from systems biology and similarly employs integrative modeling approaches to describe the perturbations and phenotypic effects observed in a complex system. However, in the case of systems genetics the main source of perturbation is naturally occurring genetic variation, which can be analyzed at the systems-level to explain the observed variation in phenotypic traits. In contrast with conventional single-variant association approaches, the success of systems genetics has been in the identification of gene networks and molecular pathways that underlie complex disease. In addition, systems genetics has proven useful in the discovery of master trans-acting genetic regulators of functional networks and pathways, which in many cases revealed unexpected gene targets for disease. Here we detail the central components of a fully integrated systems genetics approach to complex disease, starting from assessment of genetic and gene expression variation, linking DNA sequence variation to mRNA (expression QTL mapping), gene regulatory network analysis and mapping the genetic control of regulatory networks. By summarizing a few illustrative (and successful) examples, we highlight how different data-modeling strategies can be effectively integrated in a systems genetics study.
Collapse
Affiliation(s)
- Aida Moreno-Moral
- Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore
| | - Francesco Pesce
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, Hammersmith Campus, Imperial Centre for Translational and Experimental Medicine, London, UK
| | - Jacques Behmoaras
- Centre for Complement and Inflammation Research, Imperial College London, Hammersmith Hospital, Du Cane Road, London, W12 0NN, UK
| | - Enrico Petretto
- Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore.
| |
Collapse
|
30
|
Aiello KA, Alter O. Platform-Independent Genome-Wide Pattern of DNA Copy-Number Alterations Predicting Astrocytoma Survival and Response to Treatment Revealed by the GSVD Formulated as a Comparative Spectral Decomposition. PLoS One 2016; 11:e0164546. [PMID: 27798635 PMCID: PMC5087864 DOI: 10.1371/journal.pone.0164546] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 09/27/2016] [Indexed: 01/07/2023] Open
Abstract
We use the generalized singular value decomposition (GSVD), formulated as a comparative spectral decomposition, to model patient-matched grades III and II, i.e., lower-grade astrocytoma (LGA) brain tumor and normal DNA copy-number profiles. A genome-wide tumor-exclusive pattern of DNA copy-number alterations (CNAs) is revealed, encompassed in that previously uncovered in glioblastoma (GBM), i.e., grade IV astrocytoma, where GBM-specific CNAs encode for enhanced opportunities for transformation and proliferation via growth and developmental signaling pathways in GBM relative to LGA. The GSVD separates the LGA pattern from other sources of biological and experimental variation, common to both, or exclusive to one of the tumor and normal datasets. We find, first, and computationally validate, that the LGA pattern is correlated with a patient's survival and response to treatment. Second, the GBM pattern identifies among the LGA patients a subtype, statistically indistinguishable from that among the GBM patients, where the CNA genotype is correlated with an approximately one-year survival phenotype. Third, cross-platform classification of the Affymetrix-measured LGA and GBM profiles by using the Agilent-derived GBM pattern shows that the GBM pattern is a platform-independent predictor of astrocytoma outcome. Statistically, the pattern is a better predictor (corresponding to greater median survival time difference, proportional hazard ratio, and concordance index) than the patient's age and the tumor's grade, which are the best indicators of astrocytoma currently in clinical use, and laboratory tests. The pattern is also statistically independent of these indicators, and, combined with either one, is an even better predictor of astrocytoma outcome. Recurring DNA CNAs have been observed in astrocytoma tumors' genomes for decades, however, copy-number subtypes that are predictive of patients' outcomes were not identified before. This is despite the growing number of datasets recording different aspects of the disease, and due to an existing fundamental need for mathematical frameworks that can simultaneously find similarities and dissimilarities across the datasets. This illustrates the ability of comparative spectral decompositions to find what other methods miss.
Collapse
Affiliation(s)
- Katherine A. Aiello
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
| | - Orly Alter
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Bioengineering, University of Utah, Salt Lake City, Utah, United States of America
- Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, United States of America
- Department of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
- * E-mail:
| |
Collapse
|
31
|
Wang Y, Zhao W, Zhou X. Matrix factorization reveals aging-specific co-expression gene modules in the fat and muscle tissues in nonhuman primates. Sci Rep 2016; 6:34335. [PMID: 27703186 PMCID: PMC5050522 DOI: 10.1038/srep34335] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 09/12/2016] [Indexed: 11/29/2022] Open
Abstract
Accurate identification of coherent transcriptional modules (subnetworks) in adipose and muscle tissues is important for revealing the related mechanisms and co-regulated pathways involved in the development of aging-related diseases. Here, we proposed a systematically computational approach, called ICEGM, to Identify the Co-Expression Gene Modules through a novel mathematical framework of Higher-Order Generalized Singular Value Decomposition (HO-GSVD). ICEGM was applied on the adipose, and heart and skeletal muscle tissues in old and young female African green vervet monkeys. The genes associated with the development of inflammation, cardiovascular and skeletal disorder diseases, and cancer were revealed by the ICEGM. Meanwhile, genes in the ICEGM modules were also enriched in the adipocytes, smooth muscle cells, cardiac myocytes, and immune cells. Comprehensive disease annotation and canonical pathway analysis indicated that immune cells, adipocytes, cardiomyocytes, and smooth muscle cells played a synergistic role in cardiac and physical functions in the aged monkeys by regulation of the biological processes associated with metabolism, inflammation, and atherosclerosis. In conclusion, the ICEGM provides an efficiently systematic framework for decoding the co-expression gene modules in multiple tissues. Analysis of genes in the ICEGM module yielded important insights on the cooperative role of multiple tissues in the development of diseases.
Collapse
Affiliation(s)
- Yongcui Wang
- Center for Bioinformatics & Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston Salem, NC, USA
- Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China
| | - Weiling Zhao
- Center for Bioinformatics & Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston Salem, NC, USA
| | - Xiaobo Zhou
- Center for Bioinformatics & Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston Salem, NC, USA
| |
Collapse
|
32
|
Moreno-Moral A, Petretto E. From integrative genomics to systems genetics in the rat to link genotypes to phenotypes. Dis Model Mech 2016; 9:1097-1110. [PMID: 27736746 PMCID: PMC5087832 DOI: 10.1242/dmm.026104] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Complementary to traditional gene mapping approaches used to identify the hereditary components of complex diseases, integrative genomics and systems genetics have emerged as powerful strategies to decipher the key genetic drivers of molecular pathways that underlie disease. Broadly speaking, integrative genomics aims to link cellular-level traits (such as mRNA expression) to the genome to identify their genetic determinants. With the characterization of several cellular-level traits within the same system, the integrative genomics approach evolved into a more comprehensive study design, called systems genetics, which aims to unravel the complex biological networks and pathways involved in disease, and in turn map their genetic control points. The first fully integrated systems genetics study was carried out in rats, and the results, which revealed conserved trans-acting genetic regulation of a pro-inflammatory network relevant to type 1 diabetes, were translated to humans. Many studies using different organisms subsequently stemmed from this example. The aim of this Review is to describe the most recent advances in the fields of integrative genomics and systems genetics applied in the rat, with a focus on studies of complex diseases ranging from inflammatory to cardiometabolic disorders. We aim to provide the genetics community with a comprehensive insight into how the systems genetics approach came to life, starting from the first integrative genomics strategies [such as expression quantitative trait loci (eQTLs) mapping] and concluding with the most sophisticated gene network-based analyses in multiple systems and disease states. Although not limited to studies that have been directly translated to humans, we will focus particularly on the successful investigations in the rat that have led to primary discoveries of genes and pathways relevant to human disease.
Collapse
Affiliation(s)
- Aida Moreno-Moral
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore (NUS) Medical School, Singapore
| | - Enrico Petretto
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore (NUS) Medical School, Singapore
| |
Collapse
|
33
|
Nelms BD, Waldron L, Barrera LA, Weflen AW, Goettel JA, Guo G, Montgomery RK, Neutra MR, Breault DT, Snapper SB, Orkin SH, Bulyk ML, Huttenhower C, Lencer WI. CellMapper: rapid and accurate inference of gene expression in difficult-to-isolate cell types. Genome Biol 2016; 17:201. [PMID: 27687735 PMCID: PMC5043525 DOI: 10.1186/s13059-016-1062-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Accepted: 09/13/2016] [Indexed: 02/25/2023] Open
Abstract
We present a sensitive approach to predict genes expressed selectively in specific cell types, by searching publicly available expression data for genes with a similar expression profile to known cell-specific markers. Our method, CellMapper, strongly outperforms previous computational algorithms to predict cell type-specific expression, especially for rare and difficult-to-isolate cell types. Furthermore, CellMapper makes accurate predictions for human brain cell types that have never been isolated, and can be rapidly applied to diverse cell types from many tissues. We demonstrate a clinically relevant application to prioritize candidate genes in disease susceptibility loci identified by GWAS.
Collapse
Affiliation(s)
- Bradlee D Nelms
- Division of Gastroenterology, Children's Hospital and Harvard Medical School, Boston, MA, 02115, USA. .,Graduate Program in Biophysics, Harvard University, Cambridge, MA, 02138, USA.
| | - Levi Waldron
- City University of New York School of Public Health, New York, NY, 10027, USA
| | - Luis A Barrera
- Graduate Program in Biophysics, Harvard University, Cambridge, MA, 02138, USA.,Division of Genetics, Department of Medicine and Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Andrew W Weflen
- Division of Gastroenterology, Children's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Jeremy A Goettel
- Division of Gastroenterology, Children's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Guoji Guo
- Center of Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Zhejiang, 310058, People's Republic of China
| | - Robert K Montgomery
- Division of Gastroenterology, Children's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Marian R Neutra
- Division of Gastroenterology, Children's Hospital and Harvard Medical School, Boston, MA, 02115, USA.,Harvard Digestive Diseases Center, Harvard Medical School, Boston, MA, 02115, USA
| | - David T Breault
- Harvard Digestive Diseases Center, Harvard Medical School, Boston, MA, 02115, USA.,Division of Endocrinology, Children's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Scott B Snapper
- Division of Gastroenterology, Children's Hospital and Harvard Medical School, Boston, MA, 02115, USA.,Harvard Digestive Diseases Center, Harvard Medical School, Boston, MA, 02115, USA.,Department of Gastroenterology, Brigham and Women's Hospital, Boston, MA, 02115, USA
| | - Stuart H Orkin
- Division of Hematology/Oncology and Harvard Stem Cell Institute, Children's Hospital and Harvard Medical School, Boston, MA, 02115, USA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, 02115, USA
| | - Martha L Bulyk
- Division of Genetics, Department of Medicine and Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, 02115, USA
| | - Wayne I Lencer
- Division of Gastroenterology, Children's Hospital and Harvard Medical School, Boston, MA, 02115, USA. .,Graduate Program in Biophysics, Harvard University, Cambridge, MA, 02138, USA. .,Harvard Digestive Diseases Center, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
34
|
Petereit J, Smith S, Harris FC, Schlauch KA. petal: Co-expression network modelling in R. BMC SYSTEMS BIOLOGY 2016; 10 Suppl 2:51. [PMID: 27490697 PMCID: PMC4977474 DOI: 10.1186/s12918-016-0298-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Background Networks provide effective models to study complex biological systems, such as gene and protein interaction networks. With the advent of new sequencing technologies, many life scientists are grasping for user-friendly methods and tools to examine biological components at the whole-systems level. Gene co-expression network analysis approaches are frequently used to successfully associate genes with biological processes and demonstrate great potential to gain further insights into the functionality of genes, thus becoming a standard approach in Systems Biology. Here the objective is to construct biologically meaningful and statistically strong co-expression networks, the identification of research dependent subnetworks, and the presentation of self-contained results. Results We introduce petal, a novel approach to generate gene co-expression network models based on experimental gene expression measures. petal focuses on statistical, mathematical, and biological characteristics of both, input data and output network models. Often over-looked issues of current co-expression analysis tools include the assumption of data normality, which is seldom the case for hight-throughput expression data obtained from RNA-seq technologies. petal does not assume data normality, making it a statistically appropriate method for RNA-seq data. Also, network models are rarely tested for their known typical architecture: scale-free and small-world. petal explicitly constructs networks based on both these characteristics, thereby generating biologically meaningful models. Furthermore, many network analysis tools require a number of user-defined input variables, these often require tuning and/or an understanding of the underlying algorithm; petal requires no user input other than experimental data. This allows for reproducible results, and simplifies the use of petal. Lastly, this approach is specifically designed for very large high-throughput datasets; this way, petal’s network models represent as much of the entire system as possible to provide a whole-system approach. Conclusion petal is a novel tool for generating co-expression network models of whole-genomics experiments. It is implemented in R and available as a library. Its application to several whole-genome experiments has generated novel meaningful results and has lead the way to new testing hypothesizes for further biological investigation. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0298-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Juli Petereit
- University of Nevada, Reno, 1664 N. Virginia Street, Reno, 89557, USA.
| | - Sebastian Smith
- University of Nevada, Reno, 1664 N. Virginia Street, Reno, 89557, USA
| | | | - Karen A Schlauch
- University of Nevada, Reno, 1664 N. Virginia Street, Reno, 89557, USA
| |
Collapse
|
35
|
van der Kloet FM, Sebastián-León P, Conesa A, Smilde AK, Westerhuis JA. Separating common from distinctive variation. BMC Bioinformatics 2016; 17 Suppl 5:195. [PMID: 27294690 PMCID: PMC4905617 DOI: 10.1186/s12859-016-1037-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Joint and individual variation explained (JIVE), distinct and common simultaneous component analysis (DISCO) and O2-PLS, a two-block (X-Y) latent variable regression method with an integral OSC filter can all be used for the integrated analysis of multiple data sets and decompose them in three terms: a low(er)-rank approximation capturing common variation across data sets, low(er)-rank approximations for structured variation distinctive for each data set, and residual noise. In this paper these three methods are compared with respect to their mathematical properties and their respective ways of defining common and distinctive variation. Results The methods are all applied on simulated data and mRNA and miRNA data-sets from GlioBlastoma Multiform (GBM) brain tumors to examine their overlap and differences. When the common variation is abundant, all methods are able to find the correct solution. With real data however, complexities in the data are treated differently by the three methods. Conclusions All three methods have their own approach to estimate common and distinctive variation with their specific strength and weaknesses. Due to their orthogonality properties and their used algorithms their view on the data is slightly different. By assuming orthogonality between common and distinctive, true natural or biological phenomena that may not be orthogonal at all might be misinterpreted. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1037-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Frans M van der Kloet
- Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, 1098, XH, Amsterdam, The Netherlands
| | | | - Ana Conesa
- Computational Genomics Program, Centro de Investigaciones Príncipe Felipe, Valencia, Spain
| | - Age K Smilde
- Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, 1098, XH, Amsterdam, The Netherlands
| | - Johan A Westerhuis
- Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, 1098, XH, Amsterdam, The Netherlands.
| |
Collapse
|
36
|
Zhou C, York SR, Chen JY, Pondick JV, Motola DL, Chung RT, Mullen AC. Long noncoding RNAs expressed in human hepatic stellate cells form networks with extracellular matrix proteins. Genome Med 2016; 8:31. [PMID: 27007663 PMCID: PMC4804564 DOI: 10.1186/s13073-016-0285-0] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Accepted: 03/03/2016] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Hepatic fibrosis is the underlying cause of cirrhosis and liver failure in nearly every form of chronic liver disease, and hepatic stellate cells (HSCs) are the primary cell type responsible for fibrosis. Long noncoding RNAs (lncRNAs) are increasingly recognized as regulators of development and disease; however, little is known about their expression in human HSCs and their function in hepatic fibrosis. METHODS We performed RNA sequencing and ab initio assembly of RNA transcripts to define the lncRNAs expressed in human HSC myofibroblasts. We analyzed chromatin immunoprecipitation data and expression data to identify lncRNAs that were regulated by transforming growth factor beta (TGF-β) signaling, associated with super-enhancers and restricted in expression to HSCs compared with 43 human tissues and cell types. Co-expression network analyses were performed to discover functional modules of lncRNAs, and principle component analysis and K-mean clustering were used to compare lncRNA expression in HSCs with other myofibroblast cell types. RESULTS We identified over 3600 lncRNAs that are expressed in human HSC myofibroblasts. Many are regulated by TGF-β, a major fibrotic signal, and form networks with genes encoding key components of the extracellular matrix (ECM), which is the substrate of the fibrotic scar. The lncRNAs directly regulated by TGF-β signaling are also enriched at super-enhancers. More than 400 of the lncRNAs identified in HSCs are uniquely expressed in HSCs compared with 43 other human tissues and cell types and HSC myofibroblasts demonstrate different patterns of lncRNA expression compared with myofibroblasts originating from other tissues. Co-expression analyses identified a subset of lncRNAs that are tightly linked to collagen genes and numerous proteins that regulate the ECM during formation of the fibrotic scar. Finally, we identified lncRNAs that are induced during progression of human liver disease. CONCLUSIONS lncRNAs are likely key contributors to the formation and progression of fibrosis in human liver disease.
Collapse
Affiliation(s)
- Chan Zhou
- />Gastrointestinal Unit, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114 USA
| | - Samuel R. York
- />Gastrointestinal Unit, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114 USA
| | - Jennifer Y. Chen
- />Gastrointestinal Unit, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114 USA
| | - Joshua V. Pondick
- />Gastrointestinal Unit, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114 USA
| | - Daniel L. Motola
- />Gastrointestinal Unit, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114 USA
| | - Raymond T. Chung
- />Gastrointestinal Unit, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114 USA
| | - Alan C. Mullen
- />Gastrointestinal Unit, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114 USA
- />Harvard Stem Cell Institute, Cambridge, MA 02138 USA
| |
Collapse
|
37
|
Pushing the annotation of cellular activities to a higher resolution: Predicting functions at the isoform level. Methods 2015; 93:110-8. [PMID: 26238263 DOI: 10.1016/j.ymeth.2015.07.016] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 07/20/2015] [Accepted: 07/29/2015] [Indexed: 12/23/2022] Open
Abstract
In past decades, the experimental determination of protein functions was expensive and time-consuming, so numerous computational methods were developed to speed up and guide the process. However, most of these methods predict protein functions at the gene level and do not consider the fact that protein isoforms (translated from alternatively spliced transcripts), not genes, are the actual function carriers. Now, high-throughput RNA-seq technology is providing unprecedented opportunities to unravel protein functions at the isoform level. In this article, we review recent progress in the high-resolution functional annotations of protein isoforms, focusing on two methods developed by the authors. Both methods can integrate multiple RNA-seq datasets for comprehensively characterizing functions of protein isoforms.
Collapse
|
38
|
Wang Z, Yuan W, Montana G. Sparse multi-view matrix factorization: a multivariate approach to multiple tissue comparisons. Bioinformatics 2015; 31:3163-71. [DOI: 10.1093/bioinformatics/btv344] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 05/29/2015] [Indexed: 12/25/2022] Open
|
39
|
Goldinger A, Shakhbazov K, Henders AK, McRae AF, Montgomery GW, Powell JE. Seasonal effects on gene expression. PLoS One 2015; 10:e0126995. [PMID: 26023781 PMCID: PMC4449160 DOI: 10.1371/journal.pone.0126995] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 04/09/2015] [Indexed: 12/16/2022] Open
Abstract
Many health conditions, ranging from psychiatric disorders to cardiovascular disease, display notable seasonal variation in severity and onset. In order to understand the molecular processes underlying this phenomenon, we have examined seasonal variation in the transcriptome of 606 healthy individuals. We show that 74 transcripts associated with a 12-month seasonal cycle were enriched for processes involved in DNA repair and binding. An additional 94 transcripts demonstrated significant seasonal variability that was largely influenced by blood cell count levels. These transcripts were enriched for immune function, protein production, and specific cellular markers for lymphocytes. Accordingly, cell counts for erythrocytes, platelets, neutrophils, monocytes, and CD19 cells demonstrated significant association with a 12-month seasonal cycle. These results demonstrate that seasonal variation is an important environmental regulator of gene expression and blood cell composition. Notable changes in leukocyte counts and genes involved in immune function indicate that immune cell physiology varies throughout the year in healthy individuals.
Collapse
Affiliation(s)
- Anita Goldinger
- University of Queensland Diamantina Institute, The Translational Research Institute, Brisbane, Queensland 4102, Australia
- The Queensland Brain Institute, The University of Queensland, Brisbane, Queensland 4072, Australia
- * E-mail:
| | - Konstantin Shakhbazov
- University of Queensland Diamantina Institute, The Translational Research Institute, Brisbane, Queensland 4102, Australia
| | - Anjali K. Henders
- The Queensland Brain Institute, The University of Queensland, Brisbane, Queensland 4072, Australia
- Queensland Institute of Medical Research, Herston, Brisbane, QLD 4006, Australia
| | - Allan F. McRae
- University of Queensland Diamantina Institute, The Translational Research Institute, Brisbane, Queensland 4102, Australia
- The Queensland Brain Institute, The University of Queensland, Brisbane, Queensland 4072, Australia
- Queensland Institute of Medical Research, Herston, Brisbane, QLD 4006, Australia
| | - Grant W. Montgomery
- Queensland Institute of Medical Research, Herston, Brisbane, QLD 4006, Australia
| | - Joseph E. Powell
- The Queensland Brain Institute, The University of Queensland, Brisbane, Queensland 4072, Australia
| |
Collapse
|
40
|
Micale G, Ferro A, Pulvirenti A, Giugno R. SPECTRA: An Integrated Knowledge Base for Comparing Tissue and Tumor-Specific PPI Networks in Human. Front Bioeng Biotechnol 2015; 3:58. [PMID: 26005672 PMCID: PMC4424906 DOI: 10.3389/fbioe.2015.00058] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 04/17/2015] [Indexed: 12/11/2022] Open
Abstract
Protein–protein interaction (PPI) networks available in public repositories usually represent relationships between proteins within the cell. They ignore the specific set of tissues or tumors where the interactions take place. Indeed, proteins can form tissue-selective complexes, while they remain inactive in other tissues. For these reasons, a great attention has been recently paid to tissue-specific PPI networks, in which nodes are proteins of the global PPI network whose corresponding genes are preferentially expressed in specific tissues. In this paper, we present SPECTRA, a knowledge base to build and compare tissue or tumor-specific PPI networks. SPECTRA integrates gene expression and protein interaction data from the most authoritative online repositories. We also provide tools for visualizing and comparing such networks, in order to identify the expression and interaction changes of proteins across tissues, or between the normal and pathological states of the same tissue. SPECTRA is available as a web server at http://alpha.dmi.unict.it/spectra.
Collapse
Affiliation(s)
- Giovanni Micale
- Department of Computer Science, University of Pisa , Pisa , Italy
| | - Alfredo Ferro
- Department of Clinical and Molecular Biomedicine, University of Catania , Catania , Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Molecular Biomedicine, University of Catania , Catania , Italy
| | - Rosalba Giugno
- Department of Clinical and Molecular Biomedicine, University of Catania , Catania , Italy
| |
Collapse
|
41
|
Mal C, Aftabuddin M, Kundu S. No3CoGP: non-conserved and conserved coexpressed gene pairs. BMC Res Notes 2014; 7:886. [PMID: 25487059 PMCID: PMC4295278 DOI: 10.1186/1756-0500-7-886] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2014] [Accepted: 11/17/2014] [Indexed: 12/15/2022] Open
Abstract
Background Analyzing the microarray data of different conditions, one can identify the conserved and condition-specific genes and gene modules, and thus can infer the underlying cellular activities. All the available tools based on Bioconductor and R packages differ in how they extract differential coexpression and at what level they study. There is a need for a user-friendly, flexible tool which can start analysis using raw or preprocessed microarray data and can report different levels of useful information. Findings We present a GUI software, No3CoGP: Non-Conserved and Conserved Coexpressed Gene Pairs which takes Affymetrix microarray data (.CEL files or log2 normalized.txt files) along with annotation file (.csv file), Chip Definition File (CDF file) and probe file as inputs, utilizes the concept of network density cut-off and Fisher’s z-test to extract biologically relevant information. It can identify four possible types of gene pairs based on their coexpression relationships. These are (i) gene pair showing coexpression in one condition but not in the other, (ii) gene pair which is positively coexpressed in one condition but negatively coexpressed in the other condition, (iii) positively and (iv) negatively coexpressed in both the conditions. Further, it can generate modules of coexpressed genes. Conclusion Easy-to-use GUI interface enables researchers without knowledge in R language to use No3CoGP. Utilization of one or more CPU cores, depending on the availability, speeds up the program. The output files stored in the respective directories under the user-defined project offer the researchers to unravel condition-specific functionalities of gene, gene sets or modules. Electronic supplementary material The online version of this article (doi:10.1186/1756-0500-7-886) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | - Sudip Kundu
- Department of Biophysics, Molecular Biology & Bioinformatics, University of Calcutta, 92, A,P,C, Road, Kolkata 700009, India.
| |
Collapse
|
42
|
van Dam S, Craig T, de Magalhães JP. GeneFriends: a human RNA-seq-based gene and transcript co-expression database. Nucleic Acids Res 2014; 43:D1124-32. [PMID: 25361971 PMCID: PMC4383890 DOI: 10.1093/nar/gku1042] [Citation(s) in RCA: 81] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Co-expression networks have proven effective at assigning putative functions to genes based on the functional annotation of their co-expressed partners, in candidate gene prioritization studies and in improving our understanding of regulatory networks. The growing number of genome resequencing efforts and genome-wide association studies often identify loci containing novel genes and there is a need to infer their functions and interaction partners. To facilitate this we have expanded GeneFriends, an online database that allows users to identify co-expressed genes with one or more user-defined genes. This expansion entails an RNA-seq-based co-expression map that includes genes and transcripts that are not present in the microarray-based co-expression maps, including over 10 000 non-coding RNAs. The results users obtain from GeneFriends include a co-expression network as well as a summary of the functional enrichment among the co-expressed genes. Novel insights can be gathered from this database for different splice variants and ncRNAs, such as microRNAs and lincRNAs. Furthermore, our updated tool allows candidate transcripts to be linked to diseases and processes using a guilt-by-association approach. GeneFriends is freely available from http://www.GeneFriends.org and can be used to quickly identify and rank candidate targets relevant to the process or disease under study.
Collapse
Affiliation(s)
- Sipko van Dam
- Integrative Genomics of Ageing Group, Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| | - Thomas Craig
- Integrative Genomics of Ageing Group, Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| | - João Pedro de Magalhães
- Integrative Genomics of Ageing Group, Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| |
Collapse
|
43
|
Acar E, Papalexakis EE, Gürdeniz G, Rasmussen MA, Lawaetz AJ, Nilsson M, Bro R. Structure-revealing data fusion. BMC Bioinformatics 2014; 15:239. [PMID: 25015427 PMCID: PMC4117975 DOI: 10.1186/1471-2105-15-239] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2013] [Accepted: 06/26/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analysis of data from multiple sources has the potential to enhance knowledge discovery by capturing underlying structures, which are, otherwise, difficult to extract. Fusing data from multiple sources has already proved useful in many applications in social network analysis, signal processing and bioinformatics. However, data fusion is challenging since data from multiple sources are often (i) heterogeneous (i.e., in the form of higher-order tensors and matrices), (ii) incomplete, and (iii) have both shared and unshared components. In order to address these challenges, in this paper, we introduce a novel unsupervised data fusion model based on joint factorization of matrices and higher-order tensors. RESULTS While the traditional formulation of coupled matrix and tensor factorizations modeling only shared factors fails to capture the underlying structures in the presence of both shared and unshared factors, the proposed data fusion model has the potential to automatically reveal shared and unshared components through modeling constraints. Using numerical experiments, we demonstrate the effectiveness of the proposed approach in terms of identifying shared and unshared components. Furthermore, we measure a set of mixtures with known chemical composition using both LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) and demonstrate that the structure-revealing data fusion model can (i) successfully capture the chemicals in the mixtures and extract the relative concentrations of the chemicals accurately, (ii) provide promising results in terms of identifying shared and unshared chemicals, and (iii) reveal the relevant patterns in LC-MS by coupling with the diffusion NMR data. CONCLUSIONS We have proposed a structure-revealing data fusion model that can jointly analyze heterogeneous, incomplete data sets with shared and unshared components and demonstrated its promising performance as well as potential limitations on both simulated and real data.
Collapse
Affiliation(s)
- Evrim Acar
- Department of Food Science, Faculty of Science, University of Copenhagen, Frederiksberg C, Denmark.
| | | | | | | | | | | | | |
Collapse
|