1
|
Kasimanickam R, Kasimanickam V. MicroRNAs in the Pathogenesis of Preeclampsia-A Case-Control In Silico Analysis. Curr Issues Mol Biol 2024; 46:3438-3459. [PMID: 38666946 PMCID: PMC11048894 DOI: 10.3390/cimb46040216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/03/2024] [Accepted: 04/04/2024] [Indexed: 04/28/2024] Open
Abstract
Preeclampsia (PE) occurs in 5% to 7% of all pregnancies, and the PE that results from abnormal placentation acts as a primary cause of maternal and neonatal morbidity and mortality. The objective of this secondary analysis was to elucidate the pathogenesis of PE by probing protein-protein interactions from in silico analysis of transcriptomes between PE and normal placenta from Gene Expression Omnibus (GSE149812). The pathogenesis of PE is apparently determined by associations of miRNA molecules and their target genes and the degree of changes in their expressions with irregularities in the functions of hemostasis, vascular systems, and inflammatory processes at the fetal-maternal interface. These irregularities ultimately lead to impaired placental growth and hypoxic injuries, generally manifesting as placental insufficiency. These differentially expressed miRNAs or genes in placental tissue and/or in blood can serve as novel diagnostic and therapeutic biomarkers.
Collapse
Affiliation(s)
- Ramanathan Kasimanickam
- Department of Veterinary Clinical Sciences, College of Veterinary Medicine, Washington State University, Pullman, WA 99164, USA
| | - Vanmathy Kasimanickam
- Center for Reproductive Biology, College of Veterinary Medicine, Washington State University, Pullman, WA 99164, USA;
| |
Collapse
|
2
|
Chang LY, Lee MZ, Wu Y, Lee WK, Ma CL, Chang JM, Chen CW, Huang TC, Lee CH, Lee JC, Tseng YY, Lin CY. Gene set correlation enrichment analysis for interpreting and annotating gene expression profiles. Nucleic Acids Res 2024; 52:e17. [PMID: 38096046 PMCID: PMC10853793 DOI: 10.1093/nar/gkad1187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 11/17/2023] [Accepted: 11/29/2023] [Indexed: 02/10/2024] Open
Abstract
Pathway analysis, including nontopology-based (non-TB) and topology-based (TB) methods, is widely used to interpret the biological phenomena underlying differences in expression data between two phenotypes. By considering dependencies and interactions between genes, TB methods usually perform better than non-TB methods in identifying pathways that include closely relevant or directly causative genes for a given phenotype. However, most TB methods may be limited by incomplete pathway data used as the reference network or by difficulties in selecting appropriate reference networks for different research topics. Here, we propose a gene set correlation enrichment analysis method, Gscore, based on an expression dataset-derived coexpression network to examine whether a differentially expressed gene (DEG) list (or each of its DEGs) is associated with a known gene set. Gscore is better able to identify target pathways in 89 human disease expression datasets than eight other state-of-the-art methods and offers insight into how disease-wide and pathway-wide associations reflect clinical outcomes. When applied to RNA-seq data from COVID-19-related cells and patient samples, Gscore provided a means for studying how DEGs are implicated in COVID-19-related pathways. In summary, Gscore offers a powerful analytical approach for annotating individual DEGs, DEG lists, and genome-wide expression profiles based on existing biological knowledge.
Collapse
Affiliation(s)
- Lan-Yun Chang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Meng-Zhan Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Yujia Wu
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Wen-Kai Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Chia-Liang Ma
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Jun-Mao Chang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Ciao-Wen Chen
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Tzu-Chun Huang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
| | - Chia-Hwa Lee
- School of Medical Laboratory Science and Biotechnology, College of Medical Science and Technology, Taipei Medical University, New Taipei City 235, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDSB), National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- TMU Research Center of Cancer Translational Medicine, Taipei Medical University, Taipei 110, Taiwan
- Ph.D. Program in Medical Biotechnology, College of Medical Science and Technology, Taipei Medical University, New Taipei City 235, Taiwan
| | - Jih-Chin Lee
- Department of Otolaryngology-Head and Neck Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei 110, Taiwan
| | - Yu-Yao Tseng
- Department of Food Science, Nutrition, and Nutraceutical Biotechnology, Shih Chien University, Taipei 104, Taiwan
| | - Chun-Yu Lin
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDSB), National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- Cancer and Immunology Research Center, National Yang Ming Chiao Tung University, Taipei 112, Taiwan
- Institute of Data Science and Engineering, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan
- School of Dentistry, Kaohsiung Medical University, Kaohsiung 807, Taiwan
| |
Collapse
|
3
|
O’Connor LM, O’Connor BA, Zeng J, Lo CH. Data Mining of Microarray Datasets in Translational Neuroscience. Brain Sci 2023; 13:1318. [PMID: 37759919 PMCID: PMC10527016 DOI: 10.3390/brainsci13091318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 09/04/2023] [Accepted: 09/10/2023] [Indexed: 09/29/2023] Open
Abstract
Data mining involves the computational analysis of a plethora of publicly available datasets to generate new hypotheses that can be further validated by experiments for the improved understanding of the pathogenesis of neurodegenerative diseases. Although the number of sequencing datasets is on the rise, microarray analysis conducted on diverse biological samples represent a large collection of datasets with multiple web-based programs that enable efficient and convenient data analysis. In this review, we first discuss the selection of biological samples associated with neurological disorders, and the possibility of a combination of datasets, from various types of samples, to conduct an integrated analysis in order to achieve a holistic understanding of the alterations in the examined biological system. We then summarize key approaches and studies that have made use of the data mining of microarray datasets to obtain insights into translational neuroscience applications, including biomarker discovery, therapeutic development, and the elucidation of the pathogenic mechanisms of neurodegenerative diseases. We further discuss the gap to be bridged between microarray and sequencing studies to improve the utilization and combination of different types of datasets, together with experimental validation, for more comprehensive analyses. We conclude by providing future perspectives on integrating multi-omics, to advance precision phenotyping and personalized medicine for neurodegenerative diseases.
Collapse
Affiliation(s)
- Lance M. O’Connor
- College of Biological Sciences, University of Minnesota, Minneapolis, MN 55455, USA;
| | - Blake A. O’Connor
- School of Pharmacy, University of Wisconsin, Madison, WI 53705, USA;
| | - Jialiu Zeng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 308232, Singapore;
| | - Chih Hung Lo
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 308232, Singapore;
| |
Collapse
|
4
|
Mankovich N, Kehoe E, Peterson A, Kirby M. Pathway expression analysis. Sci Rep 2022; 12:21839. [PMID: 36528702 PMCID: PMC9759056 DOI: 10.1038/s41598-022-26381-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
This paper introduces a pathway expression framework as an approach for constructing derived biomarkers. The pathway expression framework incorporates the biological connections of genes leading to a biologically relevant model. Using this framework, we distinguish between shedding subjects post-infection and all subjects pre-infection in human blood transcriptomic samples challenged with various respiratory viruses: H1N1, H3N2, HRV (Human Rhinoviruses), and RSV (Respiratory Syncytial Virus). Additionally, pathway expression data is used for selecting discriminatory pathways from these experiments. The classification results and selected pathways are benchmarked against standard gene expression based classification and pathway ranking methodologies. We find that using the pathway expression data along with selected pathways, which have minimal overlap with high ranking pathways found by traditional methods, improves classification rates across experiments.
Collapse
Affiliation(s)
- Nathan Mankovich
- grid.47894.360000 0004 1936 8083Colorado State University, Mathematics, Fort Collins, 80523 USA
| | - Eric Kehoe
- grid.47894.360000 0004 1936 8083Colorado State University, Mathematics, Fort Collins, 80523 USA
| | - Amy Peterson
- grid.47894.360000 0004 1936 8083Colorado State University, Mathematics, Fort Collins, 80523 USA
| | - Michael Kirby
- grid.47894.360000 0004 1936 8083Colorado State University, Mathematics, Fort Collins, 80523 USA
| |
Collapse
|
5
|
Maghsoudi Z, Nguyen H, Tavakkoli A, Nguyen T. A comprehensive survey of the approaches for pathway analysis using multi-omics data integration. Brief Bioinform 2022; 23:6761962. [PMID: 36252928 PMCID: PMC9677478 DOI: 10.1093/bib/bbac435] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 08/26/2022] [Accepted: 09/08/2022] [Indexed: 02/07/2023] Open
Abstract
Pathway analysis has been widely used to detect pathways and functions associated with complex disease phenotypes. The proliferation of this approach is due to better interpretability of its results and its higher statistical power compared with the gene-level statistics. A plethora of pathway analysis methods that utilize multi-omics setup, rather than just transcriptomics or proteomics, have recently been developed to discover novel pathways and biomarkers. Since multi-omics gives multiple views into the same problem, different approaches are employed in aggregating these views into a comprehensive biological context. As a result, a variety of novel hypotheses regarding disease ideation and treatment targets can be formulated. In this article, we review 32 such pathway analysis methods developed for multi-omics and multi-cohort data. We discuss their availability and implementation, assumptions, supported omics types and databases, pathway analysis techniques and integration strategies. A comprehensive assessment of each method's practicality, and a thorough discussion of the strengths and drawbacks of each technique will be provided. The main objective of this survey is to provide a thorough examination of existing methods to assist potential users and researchers in selecting suitable tools for their data and analysis purposes, while highlighting outstanding challenges in the field that remain to be addressed for future development.
Collapse
Affiliation(s)
- Zeynab Maghsoudi
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Ha Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Alireza Tavakkoli
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Tin Nguyen
- Corresponding author: Tin Nguyen, Department of Computer Science and Engineering, University of Nevada, Reno, NV, USA. Tel.: +1-775-784-6619;
| |
Collapse
|
6
|
Wang Y, Hong Y, Mao S, Jiang Y, Cui Y, Pan J, Luo Y. An Interaction-Based Method for Refining Results From Gene Set Enrichment Analysis. Front Genet 2022; 13:890672. [PMID: 35706447 PMCID: PMC9189359 DOI: 10.3389/fgene.2022.890672] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 05/04/2022] [Indexed: 11/13/2022] Open
Abstract
Purpose: To demonstrate an interaction-based method for the refinement of Gene Set Enrichment Analysis (GSEA) results. Method: Intravitreal injection of miR-124-3p antagomir was used to knockdown the expression of miR-124-3p in mouse retina at postnatal day 3 (P3). Whole retinal RNA was extracted for mRNA transcriptome sequencing at P9. After preprocessing the dataset, GSEA was performed, and the leading-edge subsets were obtained. The Apriori algorithm was used to identify the frequent genes or gene sets from the union of the leading-edge subsets. A new statistic d was introduced to evaluate the frequent genes or gene sets. Reverse transcription quantitative PCR (RT-qPCR) was performed to validate the expression trend of candidate genes after the knockdown of miR-124-3p. Results: A total of 115,140 assembled transcript sequences were obtained from the clean data. With GSEA, the NOD-like receptor signaling pathway, C-type-like lectin receptor signaling pathway, phagosome, necroptosis, JAK-STAT signaling pathway, Toll-like receptor signaling pathway, leukocyte transendothelial migration, chemokine signaling pathway, NF-kappa B signaling pathway and RIG-I-like signaling pathway were identified as the top 10 enriched pathways, and their leading-edge subsets were obtained. After being refined by the Apriori algorithm and sorted by the value of the modulus of d, Prkcd, Irf9, Stat3, Cxcl12, Stat1, Stat2, Isg15, Eif2ak2, Il6st, Pdgfra, Socs4 and Csf2ra had the significant number of interactions and the greatest value of d to downstream genes among all frequent transactions. Results of RT-qPCR validation for the expression of candidate genes after the knockdown of miR-124-3p showed a similar trend to the RNA-Seq results. Conclusion: This study indicated that using the Apriori algorithm and defining the statistic d was a novel way to refine the GSEA results. We hope to convey the intricacies from the computational results to the low-throughput experiments, and to plan experimental investigations specifically.
Collapse
Affiliation(s)
- Yishen Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Yiwen Hong
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Shudi Mao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Yukang Jiang
- Department of Statistical Science, School of Mathematics, Sun Yat-Sen University, Guangzhou, China
| | - Yamei Cui
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Jianying Pan
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Yan Luo
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
- *Correspondence: Yan Luo,
| |
Collapse
|
7
|
Mubeen S, Tom Kodamullil A, Hofmann-Apitius M, Domingo-Fernández D. On the influence of several factors on pathway enrichment analysis. Brief Bioinform 2022; 23:bbac143. [PMID: 35453140 PMCID: PMC9116215 DOI: 10.1093/bib/bbac143] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 03/21/2022] [Accepted: 03/30/2022] [Indexed: 02/01/2023] Open
Abstract
Pathway enrichment analysis has become a widely used knowledge-based approach for the interpretation of biomedical data. Its popularity has led to an explosion of both enrichment methods and pathway databases. While the elegance of pathway enrichment lies in its simplicity, multiple factors can impact the results of such an analysis, which may not be accounted for. Researchers may fail to give influential aspects their due, resorting instead to popular methods and gene set collections, or default settings. Despite ongoing efforts to establish set guidelines, meaningful results are still hampered by a lack of consensus or gold standards around how enrichment analysis should be conducted. Nonetheless, such concerns have prompted a series of benchmark studies specifically focused on evaluating the influence of various factors on pathway enrichment results. In this review, we organize and summarize the findings of these benchmarks to provide a comprehensive overview on the influence of these factors. Our work covers a broad spectrum of factors, spanning from methodological assumptions to those related to prior biological knowledge, such as pathway definitions and database choice. In doing so, we aim to shed light on how these aspects can lead to insignificant, uninteresting or even contradictory results. Finally, we conclude the review by proposing future benchmarks as well as solutions to overcome some of the challenges, which originate from the outlined factors.
Collapse
Affiliation(s)
- Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115 Bonn, Germany
- Fraunhofer Center for Machine Learning, Germany
| | - Alpha Tom Kodamullil
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, 53115 Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin 53757, Germany
- Fraunhofer Center for Machine Learning, Germany
- Enveda Biosciences, Boulder, CO, 80301, USA
| |
Collapse
|
8
|
Sun Y, Luo Z, Fan X. Robust structured heterogeneity analysis approach for high-dimensional data. Stat Med 2022; 41:3229-3259. [PMID: 35460280 DOI: 10.1002/sim.9414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 02/07/2022] [Accepted: 04/05/2022] [Indexed: 11/12/2022]
Abstract
Revealing relationships between genes and disease phenotypes is a critical problem in biomedical studies. This problem has been challenged by the heterogeneity of diseases. Patients of a perceived same disease may form multiple subgroups, and different subgroups have distinct sets of important genes. It is hence imperative to discover the latent subgroups and reveal the subgroup-specific important genes. Some heterogeneity analysis methods have been proposed in the recent literature. Despite considerable successes, most of the existing studies are still limited as they cannot accommodate data contamination and ignore the interconnections among genes. Aiming at these shortages, we develop a robust structured heterogeneity analysis approach to identify subgroups, select important genes as well as estimate their effects on the phenotype of interest. Possible data contamination is accommodated by employing the Huber loss function. A sparse overlapping group lasso penalty is imposed to conduct regularization estimation and gene identification, while taking into account the possibly overlapping cluster structure of genes. This approach takes an iterative strategy in the similar spirit of K-means clustering. Simulations demonstrate that the proposed approach outperforms alternatives in revealing the heterogeneity and selecting important genes for each subgroup. The analysis of Cancer Cell Line Encyclopedia data leads to biologically meaningful findings with improved prediction and grouping stability.
Collapse
Affiliation(s)
- Yifan Sun
- Center for Applied Statistics, Renmin University of China, Beijing, China.,School of Statistics, Renmin University of China, Beijing, China
| | - Ziye Luo
- School of Statistics, Renmin University of China, Beijing, China
| | - Xinyan Fan
- Center for Applied Statistics, Renmin University of China, Beijing, China.,School of Statistics, Renmin University of China, Beijing, China
| |
Collapse
|
9
|
Thistlethwaite LR, Li X, Burrage LC, Riehle K, Hacia JG, Braverman N, Wangler MF, Miller MJ, Elsea SH, Milosavljevic A. Clinical diagnosis of metabolic disorders using untargeted metabolomic profiling and disease-specific networks learned from profiling data. Sci Rep 2022; 12:6556. [PMID: 35449147 PMCID: PMC9023513 DOI: 10.1038/s41598-022-10415-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Accepted: 03/14/2022] [Indexed: 02/06/2023] Open
Abstract
Untargeted metabolomics is a global molecular profiling technology that can be used to screen for inborn errors of metabolism (IEMs). Metabolite perturbations are evaluated based on current knowledge of specific metabolic pathway deficiencies, a manual diagnostic process that is qualitative, has limited scalability, and is not equipped to learn from accumulating clinical data. Our purpose was to improve upon manual diagnosis of IEMs in the clinic by developing novel computational methods for analyzing untargeted metabolomics data. We employed CTD, an automated computational diagnostic method that "connects the dots" between metabolite perturbations observed in individual metabolomics profiling data and modules identified in disease-specific metabolite co-perturbation networks learned from prior profiling data. We also extended CTD to calculate distances between any two individuals (CTDncd) and between an individual and a disease state (CTDdm), to provide additional network-quantified predictors for use in diagnosis. We show that across 539 plasma samples, CTD-based network-quantified measures can reproduce accurate diagnosis of 16 different IEMs, including adenylosuccinase deficiency, argininemia, argininosuccinic aciduria, aromatic L-amino acid decarboxylase deficiency, cerebral creatine deficiency syndrome type 2, citrullinemia, cobalamin biosynthesis defect, GABA-transaminase deficiency, glutaric acidemia type 1, maple syrup urine disease, methylmalonic aciduria, ornithine transcarbamylase deficiency, phenylketonuria, propionic acidemia, rhizomelic chondrodysplasia punctata, and the Zellweger spectrum disorders. Our approach can be used to supplement information from biochemical pathways and has the potential to significantly enhance the interpretation of variants of uncertain significance uncovered by exome sequencing. CTD, CTDdm, and CTDncd can serve as an essential toolset for biological interpretation of untargeted metabolomics data that overcomes limitations associated with manual diagnosis to assist diagnosticians in clinical decision-making. By automating and quantifying the interpretation of perturbation patterns, CTD can improve the speed and confidence by which clinical laboratory directors make diagnostic and treatment decisions, while automatically improving performance with new case data.
Collapse
Affiliation(s)
- Lillian R Thistlethwaite
- Quantitative and Computational Biosciences Program, Baylor College of Medicine, One Baylor Plaza, 400D, Houston, TX, 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Xiqi Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Lindsay C Burrage
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Texas Children's Hospital, Houston, TX, USA
| | - Kevin Riehle
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Joseph G Hacia
- Department of Biochemistry and Molecular Medicine, Keck School of Medicine of the University of Southern California, Los Angeles, CA, USA
| | - Nancy Braverman
- Department of Pediatrics and Human Genetics, McGill University, Montreal, QC, Canada
| | - Michael F Wangler
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Texas Children's Hospital, Houston, TX, USA
- Jan and Dan Duncan Texas Children's Hospital Neurological Research Institute, Houston, TX, USA
| | - Marcus J Miller
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Sarah H Elsea
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Aleksandar Milosavljevic
- Quantitative and Computational Biosciences Program, Baylor College of Medicine, One Baylor Plaza, 400D, Houston, TX, 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
10
|
Suomi T, Elo LL. Statistical and machine learning methods to study human CD4+ T cell proteome profiles. Immunol Lett 2022; 245:8-17. [DOI: 10.1016/j.imlet.2022.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/11/2022] [Accepted: 03/15/2022] [Indexed: 11/05/2022]
|
11
|
Leysen H, Walter D, Christiaenssen B, Vandoren R, Harputluoğlu İ, Van Loon N, Maudsley S. GPCRs Are Optimal Regulators of Complex Biological Systems and Orchestrate the Interface between Health and Disease. Int J Mol Sci 2021; 22:ijms222413387. [PMID: 34948182 PMCID: PMC8708147 DOI: 10.3390/ijms222413387] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 12/08/2021] [Accepted: 12/09/2021] [Indexed: 02/06/2023] Open
Abstract
GPCRs arguably represent the most effective current therapeutic targets for a plethora of diseases. GPCRs also possess a pivotal role in the regulation of the physiological balance between healthy and pathological conditions; thus, their importance in systems biology cannot be underestimated. The molecular diversity of GPCR signaling systems is likely to be closely associated with disease-associated changes in organismal tissue complexity and compartmentalization, thus enabling a nuanced GPCR-based capacity to interdict multiple disease pathomechanisms at a systemic level. GPCRs have been long considered as controllers of communication between tissues and cells. This communication involves the ligand-mediated control of cell surface receptors that then direct their stimuli to impact cell physiology. Given the tremendous success of GPCRs as therapeutic targets, considerable focus has been placed on the ability of these therapeutics to modulate diseases by acting at cell surface receptors. In the past decade, however, attention has focused upon how stable multiprotein GPCR superstructures, termed receptorsomes, both at the cell surface membrane and in the intracellular domain dictate and condition long-term GPCR activities associated with the regulation of protein expression patterns, cellular stress responses and DNA integrity management. The ability of these receptorsomes (often in the absence of typical cell surface ligands) to control complex cellular activities implicates them as key controllers of the functional balance between health and disease. A greater understanding of this function of GPCRs is likely to significantly augment our ability to further employ these proteins in a multitude of diseases.
Collapse
Affiliation(s)
- Hanne Leysen
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - Deborah Walter
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - Bregje Christiaenssen
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - Romi Vandoren
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - İrem Harputluoğlu
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
- Department of Chemistry, Middle East Technical University, Çankaya, Ankara 06800, Turkey
| | - Nore Van Loon
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - Stuart Maudsley
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
- Correspondence:
| |
Collapse
|
12
|
Mokhtar MM, El Allali A, Hegazy MEF, Atia MAM. PlantPathMarks (PPMdb): an interactive hub for pathways-based markers in plant genomes. Sci Rep 2021; 11:21300. [PMID: 34716373 PMCID: PMC8556342 DOI: 10.1038/s41598-021-00504-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 09/06/2021] [Indexed: 11/12/2022] Open
Abstract
Over the past decade, the problem of finding an efficient gene-targeting marker set or signature for plant trait characterization has remained challenging. Many databases focusing on pathway mining have been released with one major deficiency, as they lack to develop marker sets that target only genes controlling a specific pathway or certain biological process. Herein, we present the PlantPathMarks database (PPMdb) as a comprehensive, web-based, user-friendly, and interactive hub for pathway-based markers in plant genomes. Based on our newly developed pathway gene set mining approach, two novel pathway-based marker systems called pathway gene-targeted markers (PGTMs) and pathway microsatellite-targeted markers (PMTMs) were developed as a novel class of annotation-based markers. In the PPMdb database, 2,690,742 pathway-based markers reflecting 9,894 marker panels were developed across 82 plant genomes. The markers include 691,555 PGTMs and 1,999,187 PMTMs. Across these genomes, 165,378 enzyme-coding genes were mapped against 126 KEGG reference pathway maps. PPMdb is furnished with three interactive visualization tools (Map Browse, JBrowse and Species Comparison) to visualize, map, and compare the developed markers over their KEGG reference pathway maps. All the stored marker panels can be freely downloaded. PPMdb promises to create a radical shift in the paradigm of the area of molecular marker research. The use of PPMdb as a mega-tool represents an impediment for non-bioinformatician plant scientists and breeders. PPMdb is freely available at http://ppmdb.easyomics.org.
Collapse
Affiliation(s)
- Morad M Mokhtar
- African Genome Center, Mohammed VI Polytechnic University, Ben Guerir, Morocco
| | - Achraf El Allali
- African Genome Center, Mohammed VI Polytechnic University, Ben Guerir, Morocco.
| | | | - Mohamed A M Atia
- Molecular Genetics and Genome Mapping Laboratory, Genome Mapping Department, Agricultural Genetic Engineering Research Institute (AGERI), Agriculture Research Center (ARC), Giza, 12619, Egypt.
| |
Collapse
|
13
|
Minadakis G, Muñoz-Pomer Fuentes A, Tsouloupas G, Papatheodorou I, Spyrou GM. PathExNET: A tool for extracting pathway expression networks from gene expression statistics. Comput Struct Biotechnol J 2021; 19:4336-4344. [PMID: 34429851 PMCID: PMC8363825 DOI: 10.1016/j.csbj.2021.07.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Revised: 07/12/2021] [Accepted: 07/28/2021] [Indexed: 11/26/2022] Open
Abstract
A fundamental issue related to the understanding of the molecular mechanisms, is the way in which common pathways act across different biological experiments related to complex diseases. Using network-based approaches, this work aims to provide a numeric characterization of pathways across different biological experiments, in the prospect to create unique footprints that may characterise a specific disease under study at a pathway network level. In this line we propose PathExNET, a web service that allows the creation of pathway-to-pathway expression networks that hold the over- and under expression information obtained from differential gene expression analyses. The unique numeric characterization of pathway expression status related to a specific biological experiment (or disease), as well as the creation of diverse combination of pathway networks generated by PathExNET, is expected to provide a concrete contribution towards the individualization of disease, and further lead to a more precise personalised medicine and management of treatment. PathExNET is available at: https://bioinformatics.cing.ac.cy/PathExNET and at https://pathexnet.cing-big.hpcf.cyi.ac.cy/.
Collapse
Affiliation(s)
- George Minadakis
- Bioinformatics Department, The Cyprus Institute of Neurology & Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
- The Cyprus School of Molecular Medicine, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
| | | | - George Tsouloupas
- HPC Facility, The Cyprus Institute, 20 Konstantinou Kavafi Street, 2121, Aglantzia, Nicosia, Cyprus
| | - Irene Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - George M. Spyrou
- Bioinformatics Department, The Cyprus Institute of Neurology & Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
- The Cyprus School of Molecular Medicine, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
| |
Collapse
|
14
|
Nguyen H, Tran D, Galazka JM, Costes SV, Beheshti A, Petereit J, Draghici S, Nguyen T. CPA: a web-based platform for consensus pathway analysis and interactive visualization. Nucleic Acids Res 2021; 49:W114-W124. [PMID: 34037798 PMCID: PMC8262702 DOI: 10.1093/nar/gkab421] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 04/16/2021] [Accepted: 05/05/2021] [Indexed: 01/06/2023] Open
Abstract
In molecular biology and genetics, there is a large gap between the ease of data collection and our ability to extract knowledge from these data. Contributing to this gap is the fact that living organisms are complex systems whose emerging phenotypes are the results of multiple complex interactions taking place on various pathways. This demands powerful yet user-friendly pathway analysis tools to translate the now abundant high-throughput data into a better understanding of the underlying biological phenomena. Here we introduce Consensus Pathway Analysis (CPA), a web-based platform that allows researchers to (i) perform pathway analysis using eight established methods (GSEA, GSA, FGSEA, PADOG, Impact Analysis, ORA/Webgestalt, KS-test, Wilcox-test), (ii) perform meta-analysis of multiple datasets, (iii) combine methods and datasets to accurately identify the impacted pathways underlying the studied condition and (iv) interactively explore impacted pathways, and browse relationships between pathways and genes. The platform supports three types of input: (i) a list of differentially expressed genes, (ii) genes and fold changes and (iii) an expression matrix. It also allows users to import data from NCBI GEO. The CPA platform currently supports the analysis of multiple organisms using KEGG and Gene Ontology, and it is freely available at http://cpa.tinnguyen-lab.com.
Collapse
Affiliation(s)
- Hung Nguyen
- University of Nevada Reno, Department of Computer Science and Engineering, Reno, NV 89557, USA
| | - Duc Tran
- University of Nevada Reno, Department of Computer Science and Engineering, Reno, NV 89557, USA
| | - Jonathan M Galazka
- NASA Ames Research Center, Space Biosciences Division, Moffett Field, CA 94035, USA
| | - Sylvain V Costes
- NASA Ames Research Center, Space Biosciences Division, Moffett Field, CA 94035, USA
| | - Afshin Beheshti
- KBR, NASA Ames Research Center, Space Biosciences Division, Moffett Field, CA 94035, USA
| | - Juli Petereit
- University of Nevada Reno, Nevada Bioinformatics Center, Reno, NV 89557, USA
| | - Sorin Draghici
- Wayne State University, Department of Computer Science, Detroit, MI 48202, USA
| | - Tin Nguyen
- University of Nevada Reno, Department of Computer Science and Engineering, Reno, NV 89557, USA
| |
Collapse
|
15
|
Hellstern M, Ma J, Yue K, Shojaie A. netgsa: Fast computation and interactive visualization for topology-based pathway enrichment analysis. PLoS Comput Biol 2021; 17:e1008979. [PMID: 34115744 PMCID: PMC8221786 DOI: 10.1371/journal.pcbi.1008979] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 06/23/2021] [Accepted: 04/18/2021] [Indexed: 01/26/2023] Open
Abstract
Existing software tools for topology-based pathway enrichment analysis are either computationally inefficient, have undesirable statistical power, or require expert knowledge to leverage the methods' capabilities. To address these limitations, we have overhauled NetGSA, an existing topology-based method, to provide a computationally-efficient user-friendly tool that offers interactive visualization. Pathway enrichment analysis for thousands of genes can be performed in minutes on a personal computer without sacrificing statistical power. The new software also removes the need for expert knowledge by directly curating gene-gene interaction information from multiple external databases. Lastly, by utilizing the capabilities of Cytoscape, the new software also offers interactive and intuitive network visualization.
Collapse
Affiliation(s)
- Michael Hellstern
- Department of Biostatistics, University of Washington, Seattle, Washington
| | - Jing Ma
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Kun Yue
- Department of Biostatistics, University of Washington, Seattle, Washington
| | - Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle, Washington
| |
Collapse
|
16
|
Carter KA, Simpson CD, Raftery D, Baker MG. Short Report: Using Targeted Urine Metabolomics to Distinguish Between Manganese Exposed and Unexposed Workers in a Small Occupational Cohort. Front Public Health 2021; 9:666787. [PMID: 34095069 PMCID: PMC8172780 DOI: 10.3389/fpubh.2021.666787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 04/09/2021] [Indexed: 11/13/2022] Open
Abstract
Objectives: Despite the widespread use of manganese (Mn) in industrial settings and its association with adverse neurological outcomes, a validated and reliable biomarker for Mn exposure is still elusive. Here, we utilize targeted metabolomics to investigate metabolic differences between Mn-exposed and -unexposed workers, which could inform a putative biomarker for Mn and lead to increased understanding of Mn toxicity. Methods: End of shift spot urine samples collected from Mn exposed (n = 17) and unexposed (n = 15) workers underwent a targeted assay of 362 metabolites using LC-MS/MS; 224 were quantified and retained for analysis. Differences in metabolite abundances between exposed and unexposed workers were tested with a Benjamini-Hochberg adjusted Wilcoxon Rank-Sum test. We explored perturbed pathways related to exposure using a pathway analysis. Results: Seven metabolites were significantly differentially abundant between exposed and unexposed workers (FDR ≤ 0.1), including n-isobutyrylglycine, cholic acid, anserine, beta-alanine, methionine, n-isovalerylglycine, and threonine. Three pathways were significantly perturbed in exposed workers and had an impact score >0.5: beta-alanine metabolism, histidine metabolism, and glycine, serine, and threonine metabolism. Conclusion: This is one of few studies utilizing targeted metabolomics to explore differences between Mn-exposed and -unexposed workers. Metabolite and pathway analysis showed amino acid metabolism was perturbed in these Mn-exposed workers. Amino acids have also been shown to be perturbed in other occupational cohorts exposed to Mn. Additional research is needed to characterize the biological importance of amino acids in the Mn exposure-disease continuum, and to determine how to appropriately utilize and interpret metabolomics data collected from occupational cohorts.
Collapse
Affiliation(s)
- Kayla A Carter
- Department of Epidemiology, University of Washington, Seattle, WA, United States
| | - Christopher D Simpson
- Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA, United States
| | - Daniel Raftery
- Northwest Metabolomics Research Center, University of Washington, Seattle, WA, United States
| | - Marissa G Baker
- Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA, United States
| |
Collapse
|
17
|
Katz S, Song J, Webb KP, Lounsbury NW, Bryant CE, Fraser IDC. SIGNAL: A web-based iterative analysis platform integrating pathway and network approaches optimizes hit selection from genome-scale assays. Cell Syst 2021; 12:338-352.e5. [PMID: 33894945 DOI: 10.1016/j.cels.2021.03.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 11/25/2020] [Accepted: 03/03/2021] [Indexed: 01/13/2023]
Abstract
Hit selection from high-throughput assays remains a critical bottleneck in realizing the potential of omic-scale studies in biology. Widely used methods such as setting of cutoffs, prioritizing pathway enrichments, or incorporating predicted network interactions offer divergent solutions yet are associated with critical analytical trade-offs. The specific limitations of these individual approaches and the lack of a systematic way by which to integrate their rankings have contributed to limited overlap in the reported results from comparable genome-wide studies and costly inefficiencies in secondary validation efforts. Using comparative analysis of parallel independent studies as a benchmark, we characterize the specific complementary contributions of each approach and demonstrate an optimal framework to integrate these methods. We describe selection by iterative pathway group and network analysis looping (SIGNAL), an integrated, iterative approach that uses both pathway and network methods to optimize gene prioritization. SIGNAL is accessible as a rapid user-friendly web-based application (https://signal.niaid.nih.gov). A record of this paper's transparent peer review is included in the Supplemental information.
Collapse
Affiliation(s)
- Samuel Katz
- NIAID, National Institutes of Health, Laboratory of Immune System Biology, Bethesda, MD 20892, USA; University of Cambridge, Department of Veterinary Medicine, Cambridge, UK
| | - Jian Song
- NIAID, National Institutes of Health, Laboratory of Immune System Biology, Bethesda, MD 20892, USA
| | - Kyle P Webb
- NIAID, National Institutes of Health, Laboratory of Immune System Biology, Bethesda, MD 20892, USA
| | - Nicolas W Lounsbury
- NIAID, National Institutes of Health, Laboratory of Immune System Biology, Bethesda, MD 20892, USA
| | - Clare E Bryant
- University of Cambridge, Department of Veterinary Medicine, Cambridge, UK
| | - Iain D C Fraser
- NIAID, National Institutes of Health, Laboratory of Immune System Biology, Bethesda, MD 20892, USA.
| |
Collapse
|
18
|
Abstract
Perturbation in the normal function of the cell signaling pathways often leads to diseases. One of the factors that help understand the mechanism of diseases is the precise identification and investigation of perturbed signaling pathways. Pathway analysis methods have been developed as their purpose is to identify perturbed signaling pathways in given conditions. Among these methods, some consider the pathways topologies in their analysis, which are referred to as topology-based methods. Most of the topology-based methods used simple graph-based models to incorporate topology in their analysis, which have some limitations. We describe a new Pathway Analysis method using Petri net (PAPet) that uses the Petri net to model the signaling pathways and then propose an algorithm to measure the perturbation on a given pathway under a given condition. Modeling with Petri net has some advantages and could overcome the shortcomings of the simple graph-based models. We illustrate the capabilities of the proposed method using sensitivity, prioritization, mean reciprocal rank, and false-positive rate metrics on 36 real datasets from various diseases. The results of comparing PAPet with five pathway analysis methods FoPA, PADOG, GSEA, CePa and SPIA show that PAPet is the best one that provides a good compromise between all metrics. In addition, the results of applying methods to gene expression profiles in normal and Pancreatic Ductal Adenocarcinoma cancer (PDAC) samples show that the PAPet method achieves the best rank among others in finding the pathways that have been previously reported for PDAC. The PAPet method is available at https://github.com/fmansoori/PAPET.
Collapse
|
19
|
Thistlethwaite LR, Petrosyan V, Li X, Miller MJ, Elsea SH, Milosavljevic A. CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models. PLoS Comput Biol 2021; 17:e1008550. [PMID: 33513132 PMCID: PMC7875364 DOI: 10.1371/journal.pcbi.1008550] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 02/10/2021] [Accepted: 11/16/2020] [Indexed: 01/17/2023] Open
Abstract
We consider the following general family of algorithmic problems that arises in transcriptomics, metabolomics and other fields: given a weighted graph G and a subset of its nodes S, find subsets of S that show significant connectedness within G. A specific solution to this problem may be defined by devising a scoring function, the Maximum Clique problem being a classic example, where S includes all nodes in G and where the score is defined by the size of the largest subset of S fully connected within G. Major practical obstacles for the plethora of algorithms addressing this type of problem include computational efficiency and, particularly for more complex scores which take edge weights into account, the computational cost of permutation testing, a statistical procedure required to obtain a bound on the p-value for a connectedness score. To address these problems, we developed CTD, "Connect the Dots", a fast algorithm based on data compression that detects highly connected subsets within S. CTD provides information-theoretic upper bounds on p-values when S contains a small fraction of nodes in G without requiring computationally costly permutation testing. We apply the CTD algorithm to interpret multi-metabolite perturbations due to inborn errors of metabolism and multi-transcript perturbations associated with breast cancer in the context of disease-specific Gaussian Markov Random Field networks learned directly from respective molecular profiling data.
Collapse
Affiliation(s)
- Lillian R. Thistlethwaite
- Quantitative and Computational Biosciences Program, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Varduhi Petrosyan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Xiqi Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Marcus J. Miller
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
| | - Sarah H. Elsea
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Aleksandar Milosavljevic
- Quantitative and Computational Biosciences Program, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| |
Collapse
|
20
|
Yan S, Chi X, Chang X, Tian M. Analysing the meta-interaction between pathways by gene set topological impact analysis. BMC Genomics 2020; 21:748. [PMID: 33109101 PMCID: PMC7592530 DOI: 10.1186/s12864-020-07148-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2020] [Accepted: 10/13/2020] [Indexed: 11/25/2022] Open
Abstract
Background Pathway analysis is widely applied in transcriptome analysis. Given certain transcriptomic changes, current pathway analysis tools tend to search for the most impacted pathways, which provides insight into underlying biological mechanisms. Further refining of the enriched pathways and extracting functional modules by “crosstalk” analysis have been proposed. However, the upstream/downstream relationships between the modules, which may provide extra biological insights such as the coordination of different functional modules and the signal transduction flow have been ignored. Results To quantitatively analyse the upstream/downstream relationships between functional modules, we developed a novel GEne Set Topological Impact Analysis (GESTIA), which could be used to assemble the enriched pathways and functional modules into a super-module with a topological structure. We showed the advantages of this analysis in the exploration of extra biological insight in addition to the individual enriched pathways and functional modules. Conclusions GESTIA can be applied to a broad range of pathway/module analysis result. We hope that GESTIA may help researchers to get one additional step closer to understanding the molecular mechanism from the pathway/module analysis results. Supplementary information Supplementary information accompanies this paper at 10.1186/s12864-020-07148-y.
Collapse
Affiliation(s)
- Shen Yan
- College of Agronomy, Sichuan Agricultural University, Chengdu, 611130, Sichuan, China
| | - Xu Chi
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 101300, China.,China National Center for Bioinformation, Chaoyang, Beijing, 101300, China
| | - Xiao Chang
- Department of Dermatology and Venereal Disease, Xuanwu Hospital, Capital Medical University, Beijing, 100053, China
| | - Mengliang Tian
- College of Agronomy, Sichuan Agricultural University, Chengdu, 611130, Sichuan, China.
| |
Collapse
|
21
|
Balomenos P, Dragomir A, Tsakalidis AK, Bezerianos A. Identification of differentially expressed subpathways via a bilevel consensus scoring of network topology and gene expression. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2020:5316-5319. [PMID: 33019184 DOI: 10.1109/embc44109.2020.9176556] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Identifying differentially expressed subpathways connected to the emergence of a disease that can be considered as candidates for pharmacological intervention, with minimal off-target effects, is a daunting task. In this direction, we present a bilevel subpathway analysis method to identify differentially expressed subpathways that are connected with an experimental condition, while taking into account potential crosstalks between subpathways which arise due to their connectivity in a combined multi-pathway network. The efficacy of the method is demonstrated on a hematopoietic stem cell aging dataset, with findings corroborated using recent literature.
Collapse
|
22
|
Vrahatis AG, Kotsireas IS, Vlamos P. Detecting Common Pathways and Key Molecules of Neurodegenerative Diseases from the Topology of Molecular Networks. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2020; 1194:409-421. [PMID: 32468556 DOI: 10.1007/978-3-030-32622-7_38] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
Abstract
MotivationNeurodegenerative diseases (NDs), including amyotrophic lateral sclerosis, Parkinson's disease, Alzheimer's disease, and Huntington's disease, occur as a result of neurodegenerative processes. Thus, it has been increasingly appreciated that many neurodegenerative conditions overlap at multiple levels. However, traditional clinicopathological correlation approaches to better classify a disease have met with limited success. Discovering this overlap offers hope for therapeutic advances that could ameliorate many ND simultaneously. In parallel, in the last decade, systems biology approaches have become a reliable choice in complex disease analysis for gaining more delicate biological insights and have enabled the comprehension of the higher order functions of the biological systems.ResultsToward this orientation, we developed a systems biology approach for the identification of common links and pathways of ND, based on well-established and novel topological and functional measures. For this purpose, a molecular pathway network was constructed, using molecular interactions and relations of four main neurodegenerative diseases (Alzheimer's disease, Parkinson's disease, amyotrophic lateral sclerosis, and Huntington's disease). Our analysis captured the overlapped subregions forming molecular subpathways fully enriched in these four NDs. Also, it exported molecules that act as bridges, hubs, and key players for neurodegeneration concerning either their topology or their functional role.ConclusionUnderstanding these common links and central topologies under the perspective of systems biology and network theory and greater insights are provided to uncover the complex neurodegeneration processes.
Collapse
Affiliation(s)
| | - Ilias S Kotsireas
- Department of Physics and Computer Science, Wilfrid Laurier University, Waterloo, Canada
| | | |
Collapse
|
23
|
Yeganeh PN, Mostafavi MT. Causal Disturbance Analysis: A Novel Graph Centrality Based Method for Pathway Enrichment Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1613-1624. [PMID: 30908237 DOI: 10.1109/tcbb.2019.2907246] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Pathway enrichment analysis models (PEM) are the premier methods for interpreting gene expression profiles from high-throughput experiments. PEM often use a priori background knowledge to infer the underlying biological functions and mechanisms. A shortcoming of standard PEM is their disregarding of interactions for simplicity, which potentially results in partial and inaccurate inference. In this study, we introduce a graph-based PEM, namely Causal Disturbance Analysis (CADIA), that leverages gene interactions to quantify the topological importance of genes' expression profiles in pathways organizations. In particular, CADIA uses a novel graph centrality model, namely Source/Sink, to measure the topological importance. Source/Sink Centrality quantifies a gene's importance as a receiver and a sender of biological information, which allows for prioritizing the genes that are more likely to disturb a pathways functionality. CADIA infers an enrichment score for a pathway by deriving statistical evidence from Source/Sink centrality of the differentially expressed genes and combines it with classical over-representation analysis. Through real-world experimental and synthetic data evaluations, we show that CADIA can uniquely infer critical pathway enrichments that are not observable through other PEM. Our results indicate that CADIA is sensitive towards topologically central gene-level changes that and provides an informative framework for interpreting high-throughput data.
Collapse
|
24
|
Maleki F, Ovens K, Hogan DJ, Kusalik AJ. Gene Set Analysis: Challenges, Opportunities, and Future Research. Front Genet 2020; 11:654. [PMID: 32695141 PMCID: PMC7339292 DOI: 10.3389/fgene.2020.00654] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Accepted: 05/29/2020] [Indexed: 12/14/2022] Open
Abstract
Gene set analysis methods are widely used to provide insight into high-throughput gene expression data. There are many gene set analysis methods available. These methods rely on various assumptions and have different requirements, strengths and weaknesses. In this paper, we classify gene set analysis methods based on their components, describe the underlying requirements and assumptions for each class, and provide directions for future research in developing and evaluating gene set analysis methods.
Collapse
|
25
|
Naderi Yeganeh P, Richardson C, Saule E, Loraine A, Taghi Mostafavi M. Revisiting the use of graph centrality models in biological pathway analysis. BioData Min 2020; 13:5. [PMID: 32549913 PMCID: PMC7296696 DOI: 10.1186/s13040-020-00214-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 05/12/2020] [Indexed: 12/15/2022] Open
Abstract
The use of graph theory models is widespread in biological pathway analyses as it is often desired to evaluate the position of genes and proteins in their interaction networks of the biological systems. In this article, we argue that the common standard graph centrality measures do not sufficiently capture the informative topological organizations of the pathways, and thus, limit the biological inference. While key pathway elements may appear both upstream and downstream in pathways, standard directed graph centralities attribute significant topological importance to the upstream elements and evaluate the downstream elements as having no importance.We present a directed graph framework, Source/Sink Centrality (SSC), to address the limitations of standard models. SSC separately measures the importance of a node in the upstream and the downstream of a pathway, as a sender and a receiver of biological signals, and combines the two terms for evaluating the centrality. To validate SSC, we evaluate the topological position of known human cancer genes and mouse lethal genes in their respective KEGG annotated pathways and show that SSC-derived centralities provide an effective framework for associating higher positional importance to the genes with higher importance from a priori knowledge. While the presented work challenges some of the modeling assumptions in the common pathway analyses, it provides a straight-forward methodology to extend the existing models. The SSC extensions can result in more informative topological description of pathways, and thus, more informative biological inference.
Collapse
Affiliation(s)
- Pourya Naderi Yeganeh
- Beth Israel Deaconess Medical Center, Harvard Medical School, 330 Brookline Ave., Boston, 02215 MA USA.,Department of Computer Science, The University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, 28223 NC USA
| | - Chrsitine Richardson
- Department of Biological Sciences, The University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, 28223 NC USA
| | - Erik Saule
- Department of Computer Science, The University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, 28223 NC USA
| | - Ann Loraine
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, 28223 NC USA
| | - M Taghi Mostafavi
- Department of Computer Science, The University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, 28223 NC USA
| |
Collapse
|
26
|
Zyla J, Marczyk M, Domaszewska T, Kaufmann SHE, Polanska J, Weiner J. Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms. Bioinformatics 2019; 35:5146-5154. [PMID: 31165139 PMCID: PMC6954644 DOI: 10.1093/bioinformatics/btz447] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 05/08/2019] [Accepted: 06/10/2019] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Analysis of gene set (GS) enrichment is an essential part of functional omics studies. Here, we complement the established evaluation metrics of GS enrichment algorithms with a novel approach to assess the practical reproducibility of scientific results obtained from GS enrichment tests when applied to related data from different studies. RESULTS We evaluated eight established and one novel algorithm for reproducibility, sensitivity, prioritization, false positive rate and computational time. In addition to eight established algorithms, we also included Coincident Extreme Ranks in Numerical Observations (CERNO), a flexible and fast algorithm based on modified Fisher P-value integration. Using real-world datasets, we demonstrate that CERNO is robust to ranking metrics, as well as sample and GS size. CERNO had the highest reproducibility while remaining sensitive, specific and fast. In the overall ranking Pathway Analysis with Down-weighting of Overlapping Genes, CERNO and over-representation analysis performed best, while CERNO and GeneSetTest scored high in terms of reproducibility. AVAILABILITY AND IMPLEMENTATION tmod package implementing the CERNO algorithm is available from CRAN (cran.r-project.org/web/packages/tmod/index.html) and an online implementation can be found at http://tmod.online/. The datasets analyzed in this study are widely available in the KEGGdzPathwaysGEO, KEGGandMetacoreDzPathwaysGEO R package and GEO repository. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joanna Zyla
- Data Mining Group, Faculty of Automatic Control, Electronic and Computer Science, Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland
- Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany
| | - Michal Marczyk
- Data Mining Group, Faculty of Automatic Control, Electronic and Computer Science, Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland
- Yale School of Medicine, Yale Cancer Center, New Haven, CT 06510, USA
| | - Teresa Domaszewska
- Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany
| | - Stefan H E Kaufmann
- Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany
| | - Joanna Polanska
- Data Mining Group, Faculty of Automatic Control, Electronic and Computer Science, Institute of Automatic Control, Silesian University of Technology, Gliwice, Poland
| | - January Weiner
- Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany
| |
Collapse
|
27
|
Ma J, Shojaie A, Michailidis G. A comparative study of topology-based pathway enrichment analysis methods. BMC Bioinformatics 2019; 20:546. [PMID: 31684881 PMCID: PMC6829999 DOI: 10.1186/s12859-019-3146-1] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 10/02/2019] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Pathway enrichment extensively used in the analysis of Omics data for gaining biological insights into the functional roles of pre-defined subsets of genes, proteins and metabolites. A large number of methods have been proposed in the literature for this task. The vast majority of these methods use as input expression levels of the biomolecules under study together with their membership in pathways of interest. The latest generation of pathway enrichment methods also leverages information on the topology of the underlying pathways, which as evidence from their evaluation reveals, lead to improved sensitivity and specificity. Nevertheless, a systematic empirical comparison of such methods is still lacking, making selection of the most suitable method for a specific experimental setting challenging. This comparative study of nine network-based methods for pathway enrichment analysis aims to provide a systematic evaluation of their performance based on three real data sets with different number of features (genes/metabolites) and number of samples. RESULTS The findings highlight both methodological and empirical differences across the nine methods. In particular, certain methods assess pathway enrichment due to differences both across expression levels and in the strength of the interconnectedness of the members of the pathway, while others only leverage differential expression levels. In the more challenging setting involving a metabolomics data set, the results show that methods that utilize both pieces of information (with NetGSA being a prototypical one) exhibit superior statistical power in detecting pathway enrichment. CONCLUSION The analysis reveals that a number of methods perform equally well when testing large size pathways, which is the case with genomic data. On the other hand, NetGSA that takes into consideration both differential expression of the biomolecules in the pathway, as well as changes in the topology exhibits a superior performance when testing small size pathways, which is usually the case for metabolomics data.
Collapse
Affiliation(s)
- Jing Ma
- Texas A&M University, Department of Statistics, College Station, 77840 USA
- Fred Hutchinson Cancer Research Center, Public Health Sciences Division, Seattle, 98107 USA
| | - Ali Shojaie
- University of Washington, Department of Biostatistics, Seattle, 98105 USA
| | | |
Collapse
|
28
|
Nguyen TM, Shafi A, Nguyen T, Draghici S. Identifying significantly impacted pathways: a comprehensive review and assessment. Genome Biol 2019; 20:203. [PMID: 31597578 PMCID: PMC6784345 DOI: 10.1186/s13059-019-1790-4] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Accepted: 08/13/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Many high-throughput experiments compare two phenotypes such as disease vs. healthy, with the goal of understanding the underlying biological phenomena characterizing the given phenotype. Because of the importance of this type of analysis, more than 70 pathway analysis methods have been proposed so far. These can be categorized into two main categories: non-topology-based (non-TB) and topology-based (TB). Although some review papers discuss this topic from different aspects, there is no systematic, large-scale assessment of such methods. Furthermore, the majority of the pathway analysis approaches rely on the assumption of uniformity of p values under the null hypothesis, which is often not true. RESULTS This article presents the most comprehensive comparative study on pathway analysis methods available to date. We compare the actual performance of 13 widely used pathway analysis methods in over 1085 analyses. These comparisons were performed using 2601 samples from 75 human disease data sets and 121 samples from 11 knockout mouse data sets. In addition, we investigate the extent to which each method is biased under the null hypothesis. Together, these data and results constitute a reliable benchmark against which future pathway analysis methods could and should be tested. CONCLUSION Overall, the result shows that no method is perfect. In general, TB methods appear to perform better than non-TB methods. This is somewhat expected since the TB methods take into consideration the structure of the pathway which is meant to describe the underlying phenomena. We also discover that most, if not all, listed approaches are biased and can produce skewed results under the null.
Collapse
Affiliation(s)
- Tuan-Minh Nguyen
- Department of Computer Science, Wayne State University, Detroit, 48202 USA
| | - Adib Shafi
- Department of Computer Science, Wayne State University, Detroit, 48202 USA
| | - Tin Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557 USA
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, 48202 USA
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, 48202 USA
| |
Collapse
|
29
|
Amadoz A, Hidalgo MR, Çubuk C, Carbonell-Caballero J, Dopazo J. A comparison of mechanistic signaling pathway activity analysis methods. Brief Bioinform 2019; 20:1655-1668. [PMID: 29868818 PMCID: PMC6917216 DOI: 10.1093/bib/bby040] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Revised: 03/31/2018] [Indexed: 12/11/2022] Open
Abstract
Understanding the aspects of cell functionality that account for disease mechanisms or drug modes of action is a main challenge for precision medicine. Classical gene-based approaches ignore the modular nature of most human traits, whereas conventional pathway enrichment approaches produce only illustrative results of limited practical utility. Recently, a family of new methods has emerged that change the focus from the whole pathways to the definition of elementary subpathways within them that have any mechanistic significance and to the study of their activities. Thus, mechanistic pathway activity (MPA) methods constitute a new paradigm that allows recoding poorly informative genomic measurements into cell activity quantitative values and relate them to phenotypes. Here we provide a review on the MPA methods available and explain their contribution to systems medicine approaches for addressing challenges in the diagnostic and treatment of complex diseases.
Collapse
Affiliation(s)
- Alicia Amadoz
- Department of Bioinformatics, Igenomix S.L., 46980 Valencia, Spain
| | - Marta R Hidalgo
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), CDCA, Hospital Virgen del Rocio, Sevilla 41013, Spain
| | - Cankut Çubuk
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), CDCA, Hospital Virgen del Rocio, Sevilla 41013, Spain
| | - José Carbonell-Caballero
- Chromatin and Gene expression Lab, Gene Regulation, Stem Cells and Cancer Program, Centre de Regulació Genòmica (CRG), The Barcelona Institute of Science and Technology, PRBB, Barcelona 08003, Spain
| | - Joaquín Dopazo
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), CDCA, Hospital Virgen del Rocio, Sevilla 41013, Spain
- Chromatin and Gene expression Lab, Gene Regulation, Stem Cells and Cancer Program, Centre de Regulació Genòmica (CRG), The Barcelona Institute of Science and Technology, PRBB, Barcelona 08003, Spain
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), CDCA, Hospital Virgen del Rocio, Sevilla 41013, Spain, Functional Genomics Node (INB), FPS, Hospital Virgen del Rocío, Sevilla 41013, Spain and Bioinformatics in Rare Diseases (BiER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), FPS, Hospital Virgen del Rocío, Sevilla 41013, Spain
| |
Collapse
|
30
|
Valenzuela JFB, Monterola C, Tong VJC, Fülöp T, Ng TP, Larbi A. Degree and centrality-based approaches in network-based variable selection: Insights from the Singapore Longitudinal Aging Study. PLoS One 2019; 14:e0219186. [PMID: 31318894 PMCID: PMC6638841 DOI: 10.1371/journal.pone.0219186] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Accepted: 06/18/2019] [Indexed: 11/18/2022] Open
Abstract
We describe a network-based method to obtain a subset of representative variables from clinical data of subjects of the second Singapore Longitudinal Aging Study (SLAS-2), while preserving to a good extent the predictive performance of the full set with regards to a multi-faceted index of successful aging, SAGE. To examine differences in predictive performance of high-degree nodes (“hubs”) and high-centrality ones (“cores”), we implement four subsetting strategies (two degree-based, two centrality-based) and obtain four surrogate sets of variables, which we use as input features for machine learning models to predict the SAGE index of subjects. All four models have variables belonging to the physical, cardiovascular, cognitive and immunological domains among their fifteen most important predictors. A fifth domain (leisure-time activities, LTA) is also present in some form. From a comparison of the surrogate sets’ size and predictive performance, a centrality-based approach (selection of the most central variable-nodes within each cluster) yielded the smallest-sized surrogate set, while having high prediction accuracy (measured by its model’s area-under-curve, AUC) in comparison to its analogous degree-based strategy (selection of the highest-degree nodes per cluster). Inclusion of the next most-central variables yielded negligible changes in predictive performance while more than doubling the surrogate set size. The centrality-based approach thus yields a surrogate set which offers a good balance between number of variables and prediction performance, and can act as a representative subset of the SLAS-2 clinical dataset.
Collapse
Affiliation(s)
- Jesus Felix Bayta Valenzuela
- Computing Science Department, Institute of High Performance Computing, Singapore, Singapore
- Analytics, Computing and Complex Systems Laboratory, Asian Institute of Management, Makati City, Philippines
- Aboitiz School of Innovation, Technology and Entrepreneurship, Asian Institute of Management, Makati City, Philippines
- * E-mail: (JFBV); (CM)
| | - Christopher Monterola
- Computing Science Department, Institute of High Performance Computing, Singapore, Singapore
- Analytics, Computing and Complex Systems Laboratory, Asian Institute of Management, Makati City, Philippines
- Aboitiz School of Innovation, Technology and Entrepreneurship, Asian Institute of Management, Makati City, Philippines
- * E-mail: (JFBV); (CM)
| | - Victor Joo Chuan Tong
- Social and Cognitive Computing Department, Institute of High Performance Computing, Singapore, Singapore
- Yong Loo Lin School of Medicine, Department of Biochemistry, National University of Singapore, Singapore, Singapore
| | - Tamàs Fülöp
- Department of Medicine, University of Sherbrooke, Quebec, Canada
| | - Tze Pin Ng
- Yong Loo Lin School of Medicine, National University of Singapore, Department of Psychological Medicine, Singapore, Singapore
| | - Anis Larbi
- Department of Medicine, University of Sherbrooke, Quebec, Canada
- Singapore Immunology Network, Singapore, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Department of Microbiology and Immunology, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University (NTU), Singapore, Singapore
- Department of Biology, Faculty of Sciences, Tunis El Manar University, Tunis, Tunisia
| |
Collapse
|
31
|
Li Y, Wu Y, Zhang X, Bai Y, Akthar LM, Lu X, Shi M, Zhao J, Jiang Q, Li Y. SCIA: A Novel Gene Set Analysis Applicable to Data With Different Characteristics. Front Genet 2019; 10:598. [PMID: 31293623 PMCID: PMC6603225 DOI: 10.3389/fgene.2019.00598] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Accepted: 06/05/2019] [Indexed: 01/06/2023] Open
Abstract
Gene set analysis is commonly used in functional enrichment and molecular pathway analyses. Most of the present methods are based on the competitive testing methods which assume each gene is independent of the others. However, the false discovery rates of competitive methods are amplified when they are applied to datasets with high inter-gene correlations. The self-contained testing methods could solve this problem, but there are other restrictions on data characteristics. Therefore, a statistically rigorous testing method applicable to different datasets with various complex characteristics is needed to obtain unbiased and comparable results. We propose a self-contained and competitive incorporated analysis (SCIA) to alleviate the bias caused by the limited application scope of existing gene set analysis methods. This is accomplished through a novel permutation strategy using a priori biological networks to selectively permute gene labels with different probabilities. In simulation studies, SCIA was compared with four representative analysis methods (GSEA, CAMERA, ROAST, and NES), and produced the best performance in both false discovery rate and sensitivity under most conditions with different parameter settings. Further, the KEGG pathway analysis on two real datasets of lung cancer showed that the results found by SCIA in both of the two datasets are much more than that of GSEA and most of them could be supported by literature. Overall, SCIA promisingly offers researchers more reliable and comparable results with different datasets.
Collapse
Affiliation(s)
- Yiqun Li
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ying Wu
- Department of Biostatistics, School of Public Health, Southern Medical University, Guangzhou, China
| | - Xiaohan Zhang
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yunfan Bai
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Luqman Muhammad Akthar
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xin Lu
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ming Shi
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jianxiang Zhao
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Qinghua Jiang
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yu Li
- Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
32
|
Shah SD, Braun R. GeneSurrounder: network-based identification of disease genes in expression data. BMC Bioinformatics 2019; 20:229. [PMID: 31060502 PMCID: PMC6503437 DOI: 10.1186/s12859-019-2829-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 04/17/2019] [Indexed: 11/24/2022] Open
Abstract
Background A key challenge of identifying disease–associated genes is analyzing transcriptomic data in the context of regulatory networks that control cellular processes in order to capture multi-gene interactions and yield mechanistically interpretable results. One existing category of analysis techniques identifies groups of related genes using interaction networks, but these gene sets often comprise tens or hundreds of genes, making experimental follow-up challenging. A more recent category of methods identifies precise gene targets while incorporating systems-level information, but these techniques do not determine whether a gene is a driving source of changes in its network, an important characteristic when looking for potential drug targets. Results We introduce GeneSurrounder, an analysis method that integrates expression data and network information in a novel procedure to detect genes that are sources of dysregulation on the network. The key idea of our method is to score genes based on the evidence that they influence the dysregulation of their neighbors on the network in a manner that impacts cell function. Applying GeneSurrounder to real expression data, we show that our method is able to identify biologically relevant genes, integrate pathway and expression data, and yield more reproducible results across multiple studies of the same phenotype than competing methods. Conclusions Together these findings suggest that GeneSurrounder provides a new avenue for identifying individual genes that can be targeted therapeutically. The key innovation of GeneSurrounder is the combination of pathway network information with gene expression data to determine the degree to which a gene is a source of dysregulation on the network. By prioritizing genes in this way, our method provides insights into disease mechanisms and suggests diagnostic and therapeutic targets. Our method can be used to help biologists select among tens or hundreds of genes for further validation. The implementation in R is available at github.com/sahildshah1/gene-surrounder. Electronic supplementary material The online version of this article (10.1186/s12859-019-2829-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sahil D Shah
- Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, USA
| | - Rosemary Braun
- Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, USA. .,Biostatistics, Feinberg School of Medicine, Chicago, USA. .,Northwestern Institute on Complex Systems, Northwestern University, Evanston, USA.
| |
Collapse
|
33
|
Nguyen T, Mitrea C, Draghici S. Network-Based Approaches for Pathway Level Analysis. ACTA ACUST UNITED AC 2019; 61:8.25.1-8.25.24. [PMID: 30040185 DOI: 10.1002/cpbi.42] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Identification of impacted pathways is an important problem because it allows us to gain insights into the underlying biology beyond the detection of differentially expressed genes. In the past decade, a plethora of methods have been developed for this purpose. The last generation of pathway analysis methods are designed to take into account various aspects of pathway topology in order to increase the accuracy of the findings. Here, we cover 34 such topology-based pathway analysis methods published in the past 13 years. We compare these methods on categories related to implementation, availability, input format, graph models, and statistical approaches used to compute pathway level statistics and statistical significance. We also discuss a number of critical challenges that need to be addressed, arising both in methodology and pathway representation, including inconsistent terminology, data format, lack of meaningful benchmarks, and, more importantly, a systematic bias that is present in most existing methods. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Tin Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, Nevada
| | - Cristina Mitrea
- Department of Computer Science, Wayne State University, Detroit, Michigan
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, Michigan.,Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan
| |
Collapse
|
34
|
Jaakkola MK, McGlinchey AJ, Klén R, Elo LL. PASI: A novel pathway method to identify delicate group effects. PLoS One 2018; 13:e0199991. [PMID: 29975740 PMCID: PMC6033442 DOI: 10.1371/journal.pone.0199991] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 06/17/2018] [Indexed: 01/02/2023] Open
Abstract
Pathway analysis is a common approach in diverse biomedical studies, yet the currently-available pathway tools do not typically support the increasingly popular personalized analyses. Another weakness of the currently-available pathway methods is their inability to handle challenging data with only modest group-based effects compared to natural individual variation. In an effort to address these issues, this study presents a novel pathway method PASI (Pathway Analysis for Sample-level Information) and demonstrates its performance on complex diseases with different levels of group-based differences in gene expression. PASI is freely available as an R package.
Collapse
Affiliation(s)
- Maria K. Jaakkola
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| | - Aidan J. McGlinchey
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland
| | - Riku Klén
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland
| | - Laura L. Elo
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland
| |
Collapse
|
35
|
Igolkina AA, Armoskus C, Newman JRB, Evgrafov OV, McIntyre LM, Nuzhdin SV, Samsonova MG. Analysis of Gene Expression Variance in Schizophrenia Using Structural Equation Modeling. Front Mol Neurosci 2018; 11:192. [PMID: 29942251 PMCID: PMC6004421 DOI: 10.3389/fnmol.2018.00192] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 05/15/2018] [Indexed: 01/02/2023] Open
Abstract
Schizophrenia (SCZ) is a psychiatric disorder of unknown etiology. There is evidence suggesting that aberrations in neurodevelopment are a significant attribute of schizophrenia pathogenesis and progression. To identify biologically relevant molecular abnormalities affecting neurodevelopment in SCZ we used cultured neural progenitor cells derived from olfactory neuroepithelium (CNON cells). Here, we tested the hypothesis that variance in gene expression differs between individuals from SCZ and control groups. In CNON cells, variance in gene expression was significantly higher in SCZ samples in comparison with control samples. Variance in gene expression was enriched in five molecular pathways: serine biosynthesis, PI3K-Akt, MAPK, neurotrophin and focal adhesion. More than 14% of variance in disease status was explained within the logistic regression model (C-value = 0.70) by predictors accounting for gene expression in 69 genes from these five pathways. Structural equation modeling (SEM) was applied to explore how the structure of these five pathways was altered between SCZ patients and controls. Four out of five pathways showed differences in the estimated relationships among genes: between KRAS and NF1, and KRAS and SOS1 in the MAPK pathway; between PSPH and SHMT2 in serine biosynthesis; between AKT3 and TSC2 in the PI3K-Akt signaling pathway; and between CRK and RAPGEF1 in the focal adhesion pathway. Our analysis provides evidence that variance in gene expression is an important characteristic of SCZ, and SEM is a promising method for uncovering altered relationships between specific genes thus suggesting affected gene regulation associated with the disease. We identified altered gene-gene interactions in pathways enriched for genes with increased variance in expression in SCZ. These pathways and loci were previously implicated in SCZ, providing further support for the hypothesis that gene expression variance plays important role in the etiology of SCZ.
Collapse
Affiliation(s)
- Anna A Igolkina
- Institute of Applied Mathematics and Mechanics, Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia
| | - Chris Armoskus
- Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA, United States
| | - Jeremy R B Newman
- Department of Molecular Genetics & Microbiology, Genetics Institute, University of Florida, Gainesville, FL, United States
| | - Oleg V Evgrafov
- Department of Cell Biology, SUNY Downstate Medical Center, Brooklyn, NY, United States
| | - Lauren M McIntyre
- Department of Molecular Genetics & Microbiology, Genetics Institute, University of Florida, Gainesville, FL, United States
| | - Sergey V Nuzhdin
- Institute of Applied Mathematics and Mechanics, Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia.,Molecular and Computation Biology, University of Southern California, Los Angeles, CA, United States
| | - Maria G Samsonova
- Institute of Applied Mathematics and Mechanics, Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia
| |
Collapse
|
36
|
Ihnatova I, Popovici V, Budinska E. A critical comparison of topology-based pathway analysis methods. PLoS One 2018; 13:e0191154. [PMID: 29370226 PMCID: PMC5784953 DOI: 10.1371/journal.pone.0191154] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2017] [Accepted: 12/29/2017] [Indexed: 11/18/2022] Open
Abstract
One of the aims of high-throughput gene/protein profiling experiments is the identification of biological processes altered between two or more conditions. Pathway analysis is an umbrella term for a multitude of computational approaches used for this purpose. While in the beginning pathway analysis relied on enrichment-based approaches, a newer generation of methods is now available, exploiting pathway topologies in addition to gene/protein expression levels. However, little effort has been invested in their critical assessment with respect to their performance in different experimental setups. Here, we assessed the performance of seven representative methods identifying differentially expressed pathways between two groups of interest based on gene expression data with prior knowledge of pathway topologies: SPIA, PRS, CePa, TAPPA, TopologyGSA, Clipper and DEGraph. We performed a number of controlled experiments that investigated their sensitivity to sample and pathway size, threshold-based filtering of differentially expressed genes, ability to detect target pathways, ability to exploit the topological information and the sensitivity to different pre-processing strategies. We also verified type I error rates and described the influence of overexpression of single genes, gene sets and topological motifs of various sizes on the detection of a pathway as differentially expressed. The results of our experiments demonstrate a wide variability of the tested methods. We provide a set of recommendations for an informed selection of the proper method for a given data analysis task.
Collapse
Affiliation(s)
- Ivana Ihnatova
- RECETOX, Faculty of Science, Masarykova Univerzita, Brno, Czech Republic
- Institute of Biostatistics and Analyses, Faculty of Medicine, Masarykova Univerzita, Brno, Czech Republic
| | - Vlad Popovici
- RECETOX, Faculty of Science, Masarykova Univerzita, Brno, Czech Republic
| | - Eva Budinska
- RECETOX, Faculty of Science, Masarykova Univerzita, Brno, Czech Republic
- Institute of Biostatistics and Analyses, Faculty of Medicine, Masarykova Univerzita, Brno, Czech Republic
- * E-mail:
| |
Collapse
|
37
|
Harrington LX, Way GP, Doherty JA, Greene CS. Functional network community detection can disaggregate and filter multiple underlying pathways in enrichment analyses. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:157-167. [PMID: 29218878 PMCID: PMC5760988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Differential expression experiments or other analyses often end in a list of genes. Pathway enrichment analysis is one method to discern important biological signals and patterns from noisy expression data. However, pathway enrichment analysis may perform suboptimally in situations where there are multiple implicated pathways - such as in the case of genes that define subtypes of complex diseases. Our simulation study shows that in this setting, standard overrepresentation analysis identifies many false positive pathways along with the true positives. These false positives hamper investigators' attempts to glean biological insights from enrichment analysis. We develop and evaluate an approach that combines community detection over functional networks with pathway enrichment to reduce false positives. Our simulation study demonstrates that a large reduction in false positives can be obtained with a small decrease in power. Though we hypothesized that multiple communities might underlie previously described subtypes of high-grade serous ovarian cancer and applied this approach, our results do not support this hypothesis. In summary, applying community detection before enrichment analysis may ease interpretation for complex gene sets that represent multiple distinct pathways.
Collapse
Affiliation(s)
- Lia X Harrington
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Hanover 03784, USA,
| | | | | | | |
Collapse
|
38
|
Zyla J, Marczyk M, Weiner J, Polanska J. Ranking metrics in gene set enrichment analysis: do they matter? BMC Bioinformatics 2017; 18:256. [PMID: 28499413 PMCID: PMC5427619 DOI: 10.1186/s12859-017-1674-0] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Accepted: 05/03/2017] [Indexed: 11/29/2022] Open
Abstract
Background There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. Methods and results In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA. Conclusions Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1674-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Joanna Zyla
- Data Mining Group, Institute of Automatic Control, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, Gliwice, 44-100, Poland
| | - Michal Marczyk
- Data Mining Group, Institute of Automatic Control, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, Gliwice, 44-100, Poland.
| | - January Weiner
- Max Planck Institute for Infection Biology, Charitéplatz 1, Berlin, 10117, Germany
| | - Joanna Polanska
- Data Mining Group, Institute of Automatic Control, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, Gliwice, 44-100, Poland
| |
Collapse
|
39
|
Lee H, Shin M. Mining pathway associations for disease-related pathway activity analysis based on gene expression and methylation data. BioData Min 2017; 10:3. [PMID: 28168005 PMCID: PMC5286825 DOI: 10.1186/s13040-017-0127-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Accepted: 01/26/2017] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND The problem of discovering genetic markers as disease signatures is of great significance for the successful diagnosis, treatment, and prognosis of complex diseases. Even if many earlier studies worked on identifying disease markers from a variety of biological resources, they mostly focused on the markers of genes or gene-sets (i.e., pathways). However, these markers may not be enough to explain biological interactions between genetic variables that are related to diseases. Thus, in this study, our aim is to investigate distinctive associations among active pathways (i.e., pathway-sets) shown each in case and control samples which can be observed from gene expression and/or methylation data. RESULTS The pathway-sets are obtained by identifying a set of associated pathways that are often active together over a significant number of class samples. For this purpose, gene expression or methylation profiles are first analyzed to identify significant (active) pathways via gene-set enrichment analysis. Then, regarding these active pathways, an association rule mining approach is applied to examine interesting pathway-sets in each class of samples (case or control). By doing so, the sets of associated pathways often working together in activity profiles are finally chosen as our distinctive signature of each class. The identified pathway-sets are aggregated into a pathway activity network (PAN), which facilitates the visualization of differential pathway associations between case and control samples. From our experiments with two publicly available datasets, we could find interesting PAN structures as the distinctive signatures of breast cancer and uterine leiomyoma cancer, respectively. CONCLUSIONS Our pathway-set markers were shown to be superior or very comparable to other genetic markers (such as genes or gene-sets) in disease classification. Furthermore, the PAN structure, which can be constructed from the identified markers of pathway-sets, could provide deeper insights into distinctive associations between pathway activities in case and control samples.
Collapse
Affiliation(s)
- Hyeonjeong Lee
- Bio-Intelligence & Data Mining Laboratory, Graduate School of Electronics Engineering, Kyungpook National University, 80, Daehak-ro, Buk-gu, Daegu, 41566 Republic of Korea
| | - Miyoung Shin
- School of Electronics Engineering, Kyungpook National University, 80, Daehak-ro, Buk-gu, Daegu, 41566 Republic of Korea
| |
Collapse
|
40
|
Ozerov IV, Lezhnina KV, Izumchenko E, Artemov AV, Medintsev S, Vanhaelen Q, Aliper A, Vijg J, Osipov AN, Labat I, West MD, Buzdin A, Cantor CR, Nikolsky Y, Borisov N, Irincheeva I, Khokhlovich E, Sidransky D, Camargo ML, Zhavoronkov A. In silico Pathway Activation Network Decomposition Analysis (iPANDA) as a method for biomarker development. Nat Commun 2016; 7:13427. [PMID: 27848968 PMCID: PMC5116087 DOI: 10.1038/ncomms13427] [Citation(s) in RCA: 85] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2016] [Accepted: 10/03/2016] [Indexed: 01/02/2023] Open
Abstract
Signalling pathway activation analysis is a powerful approach for extracting biologically relevant features from large-scale transcriptomic and proteomic data. However, modern pathway-based methods often fail to provide stable pathway signatures of a specific phenotype or reliable disease biomarkers. In the present study, we introduce the in silico Pathway Activation Network Decomposition Analysis (iPANDA) as a scalable robust method for biomarker identification using gene expression data. The iPANDA method combines precalculated gene coexpression data with gene importance factors based on the degree of differential gene expression and pathway topology decomposition for obtaining pathway activation scores. Using Microarray Analysis Quality Control (MAQC) data sets and pretreatment data on Taxol-based neoadjuvant breast cancer therapy from multiple sources, we demonstrate that iPANDA provides significant noise reduction in transcriptomic data and identifies highly robust sets of biologically relevant pathway signatures. We successfully apply iPANDA for stratifying breast cancer patients according to their sensitivity to neoadjuvant therapy.
Collapse
Affiliation(s)
- Ivan V Ozerov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA
| | - Ksenia V Lezhnina
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA
| | - Evgeny Izumchenko
- The Johns Hopkins University, School of Medicine, Department of Otolaryngology, Head and Neck Cancer Research, 1550 Orleans Street, Baltimore, Maryland 21231, USA
| | - Artem V Artemov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA
| | - Sergey Medintsev
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA
| | - Quentin Vanhaelen
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA
| | - Alexander Aliper
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA.,Laboratory of Bioinformatics, D. Rogachev Federal Research and Clinical Center for Pediatric Hematology, Oncology and Immunology, Samory Mashela 1, Moscow 117997, Russia
| | - Jan Vijg
- Department of Genetics, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, New York 10461, USA
| | - Andreyan N Osipov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA.,Laboratory of Bioinformatics, D. Rogachev Federal Research and Clinical Center for Pediatric Hematology, Oncology and Immunology, Samory Mashela 1, Moscow 117997, Russia
| | - Ivan Labat
- BioTime, Inc., 1010 Atlantic Avenue, Alameda, California 94501, USA
| | - Michael D West
- BioTime, Inc., 1010 Atlantic Avenue, Alameda, California 94501, USA
| | - Anton Buzdin
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA.,Laboratory of Bioinformatics, D. Rogachev Federal Research and Clinical Center for Pediatric Hematology, Oncology and Immunology, Samory Mashela 1, Moscow 117997, Russia.,National Research Centre 'Kurchatov Institute', Centre for Convergence of Nano-, Bio-, Information and Cognitive Sciences and Technologies, 1, Akademika Kurchatova square, Moscow 123182, Russia
| | - Charles R Cantor
- Boston University, Department of Biomedical Engineering, 44 Cummington Street, Boston, Massachusetts 02215, USA
| | - Yuri Nikolsky
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA.,Skolkovo Foundation, 5 Nobelya street, Skolkovo Innovation Centre, Mozhajskij region, Moscow 143026, Russia
| | - Nikolay Borisov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA.,Laboratory of Bioinformatics, D. Rogachev Federal Research and Clinical Center for Pediatric Hematology, Oncology and Immunology, Samory Mashela 1, Moscow 117997, Russia.,National Research Centre 'Kurchatov Institute', Centre for Convergence of Nano-, Bio-, Information and Cognitive Sciences and Technologies, 1, Akademika Kurchatova square, Moscow 123182, Russia
| | - Irina Irincheeva
- Nutrition and Metabolic Health group, Nestlé Institute of Health Sciences, CH-1015 Lausanne, Switzerland
| | - Edward Khokhlovich
- Novartis Institutes for BioMedical Research, 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - David Sidransky
- The Johns Hopkins University, School of Medicine, Department of Otolaryngology, Head and Neck Cancer Research, 1550 Orleans Street, Baltimore, Maryland 21231, USA
| | - Miguel Luiz Camargo
- Novartis Institutes for BioMedical Research, 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Alex Zhavoronkov
- Pharmaceutical Artificial Intelligence Department, Insilico Medicine, Inc., Emerging Technology Centers, Johns Hopkins University at Eastern, B301, 1101 33rd Street, Baltimore, Maryland 21218, USA.,Laboratory of Bioinformatics, D. Rogachev Federal Research and Clinical Center for Pediatric Hematology, Oncology and Immunology, Samory Mashela 1, Moscow 117997, Russia.,The Biogerontology Research Foundation, 2354 Chynoweth House, Trevissome Park, Truro TR4 8UN, UK
| |
Collapse
|
41
|
Disrupted pathways associated with neonatal sepsis: Combination of protein-protein interactions and pathway data. BIOCHIP JOURNAL 2016. [DOI: 10.1007/s13206-016-1101-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
42
|
Dong X, Hao Y, Wang X, Tian W. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights. Sci Rep 2016; 6:18871. [PMID: 26750448 PMCID: PMC4707541 DOI: 10.1038/srep18871] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Accepted: 11/27/2015] [Indexed: 12/27/2022] Open
Abstract
Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.
Collapse
Affiliation(s)
- Xinran Dong
- State Key Laboratory of Genetic Engineering, Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai 200436, P.R. China
| | - Yun Hao
- State Key Laboratory of Genetic Engineering, Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai 200436, P.R. China
| | - Xiao Wang
- State Key Laboratory of Genetic Engineering, Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai 200436, P.R. China
| | - Weidong Tian
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai 100433, P.R. China.,Children's Hospital of Fudan University, Shanghai 200433, P.R. China
| |
Collapse
|
43
|
Bayerlová M, Jung K, Kramer F, Klemm F, Bleckmann A, Beißbarth T. Comparative study on gene set and pathway topology-based enrichment methods. BMC Bioinformatics 2015; 16:334. [PMID: 26489510 PMCID: PMC4618947 DOI: 10.1186/s12859-015-0751-5] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2015] [Accepted: 09/29/2015] [Indexed: 01/08/2023] Open
Abstract
Background Enrichment analysis is a popular approach to identify pathways or sets of genes which are significantly enriched in the context of differentially expressed genes. The traditional gene set enrichment approach considers a pathway as a simple gene list disregarding any knowledge of gene or protein interactions. In contrast, the new group of so called pathway topology-based methods integrates the topological structure of a pathway into the analysis. Methods We comparatively investigated gene set and pathway topology-based enrichment approaches, considering three gene set and four topological methods. These methods were compared in two extensive simulation studies and on a benchmark of 36 real datasets, providing the same pathway input data for all methods. Results In the benchmark data analysis both types of methods showed a comparable ability to detect enriched pathways. The first simulation study was conducted with KEGG pathways, which showed considerable gene overlaps between each other. In this study with original KEGG pathways, none of the topology-based methods outperformed the gene set approach. Therefore, a second simulation study was performed on non-overlapping pathways created by unique gene IDs. Here, methods accounting for pathway topology reached higher accuracy than the gene set methods, however their sensitivity was lower. Conclusions We conducted one of the first comprehensive comparative works on evaluating gene set against pathway topology-based enrichment methods. The topological methods showed better performance in the simulation scenarios with non-overlapping pathways, however, they were not conclusively better in the other scenarios. This suggests that simple gene set approach might be sufficient to detect an enriched pathway under realistic circumstances. Nevertheless, more extensive studies and further benchmark data are needed to systematically evaluate these methods and to assess what gain and cost pathway topology information introduces into enrichment analysis. Both types of methods for enrichment analysis require further improvements in order to deal with the problem of pathway overlaps. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0751-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Michaela Bayerlová
- Department of Medical Statistics, University Medical Center Göttingen, 37099, Göttingen, Germany.
| | - Klaus Jung
- Department of Medical Statistics, University Medical Center Göttingen, 37099, Göttingen, Germany.
| | - Frank Kramer
- Department of Medical Statistics, University Medical Center Göttingen, 37099, Göttingen, Germany.
| | - Florian Klemm
- Department of Hematology and Medical Oncology, University Medical Center Göttingen, 37099, Göttingen, Germany.
| | - Annalen Bleckmann
- Department of Medical Statistics, University Medical Center Göttingen, 37099, Göttingen, Germany. .,Department of Hematology and Medical Oncology, University Medical Center Göttingen, 37099, Göttingen, Germany.
| | - Tim Beißbarth
- Department of Medical Statistics, University Medical Center Göttingen, 37099, Göttingen, Germany.
| |
Collapse
|
44
|
Xu L, Ziegelbauer J, Wang R, Wu WW, Shen RF, Juhl H, Zhang Y, Rosenberg A. Distinct Profiles for Mitochondrial t-RNAs and Small Nucleolar RNAs in Locally Invasive and Metastatic Colorectal Cancer. Clin Cancer Res 2015; 22:773-84. [PMID: 26384739 DOI: 10.1158/1078-0432.ccr-15-0737] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Accepted: 09/02/2015] [Indexed: 01/01/2023]
Abstract
PURPOSE To gain insight into factors involved in tumor progression and metastasis, we examined the role of noncoding RNAs in the biologic characteristics of colorectal carcinoma, in paired samples of tumor together with normal mucosa from the same colorectal carcinoma patient. The tumor and healthy tissue samples were collected and stored under stringent conditions, thereby minimizing warm ischemic time. EXPERIMENTAL DESIGN We focused particularly on distinctions among high-stage tumors and tumors with known metastases, performing RNA-Seq analysis that quantifies transcript abundance and identifies novel transcripts. RESULTS In comparing 35 colorectal carcinomas, including 9 metastatic tumors (metastases to lymph nodes and lymphatic vessels), with their matched healthy control mucosa, we found a distinct signature of mitochondrial transfer RNAs (MT-tRNA) and small nucleolar RNAs (snoRNA) for metastatic and high-stage colorectal carcinoma. We also found the following: (i) MT-TF (phenylalanine) and snord12B expression correlated with a substantial number of miRNAs and mRNAs in 14 colorectal carcinomas examined; (ii) an miRNA signature of oxidative stress, hypoxia, and a shift to glycolytic metabolism in 14 colorectal carcinomas, regardless of grade and stage; and (iii) heterogeneous MT-tRNA/snoRNA fingerprints for 35 pairs. CONCLUSIONS These findings could potentially assist in more accurate and predictive staging of colorectal carcinoma, including identification of those colorectal carcinomas likely to metastasize.
Collapse
Affiliation(s)
- Lai Xu
- OBP/DBRR-III, CDER, FDA, Silver Spring, Maryland
| | | | - Rong Wang
- OBP/DBRR-III, CDER, FDA, Silver Spring, Maryland
| | - Wells W Wu
- Facility for Biotechnology Resources, CBER, FDA, Silver Spring, Maryland
| | - Rong-Fong Shen
- Facility for Biotechnology Resources, CBER, FDA, Silver Spring, Maryland
| | | | - Yaqin Zhang
- OBP/DBRR-III, CDER, FDA, Silver Spring, Maryland
| | | |
Collapse
|
45
|
Abstract
Multiple methods have been proposed to estimate pathway activities from expression profiles, and yet, there is not enough information available about the performance of those methods. This makes selection of a suitable tool for pathway analysis difficult. Although methods based on simple gene lists have remained the most common approach, various methods that also consider pathway structure have emerged. To provide practical insight about the performance of both list-based and structure-based methods, we tested six different approaches to estimate pathway activities in two different case study settings of different characteristics. The first case study setting involved six renal cell cancer data sets, and the differences between expression profiles of case and control samples were relatively big. The second case study setting involved four type 1 diabetes data sets, and the profiles of case and control samples were more similar to each other. In general, there were marked differences in the outcomes of the different pathway tools even with the same input data. In the cancer studies, the results of a tested method were typically consistent across the different data sets, yet different between the methods. In the more challenging diabetes studies, almost all the tested methods detected as significant only few pathways if any.
Collapse
|
46
|
Talukder AK, Ravishankar S, Sasmal K, Gandham S, Prabhukumar J, Achutharao PH, Barh D, Blasi F. XomAnnotate: Analysis of Heterogeneous and Complex Exome- A Step towards Translational Medicine. PLoS One 2015; 10:e0123569. [PMID: 25905921 PMCID: PMC4408095 DOI: 10.1371/journal.pone.0123569] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2014] [Accepted: 02/20/2015] [Indexed: 12/14/2022] Open
Abstract
In translational cancer medicine, implicated pathways and the relevant master genes are of focus. Exome's specificity, processing-time, and cost advantage makes it a compelling tool for this purpose. However, analysis of exome lacks reliable combinatory analysis tools and techniques. In this paper we present XomAnnotate – a meta- and functional-analysis software for exome. We compared UnifiedGenotyper, Freebayes, Delly, and Lumpy algorithms that were designed for whole-genome and combined their strengths in XomAnnotate for exome data through meta-analysis to identify comprehensive mutation profile (SNPs/SNVs, short inserts/deletes, and SVs) of patients. The mutation profile is annotated followed by functional analysis through pathway enrichment and network analysis to identify most critical genes and pathways implicated in the disease genesis. The efficacy of the software is verified through MDS and clustering and tested with available 11 familial non-BRCA1/BRCA2 breast cancer exome data. The results showed that the most significantly affected pathways across all samples are cell communication and antigen processing and presentation. ESCO1, HYAL1, RAF1 and PRKCA emerged as the key genes. Network analysis further showed the purine and propanotate metabolism pathways along with RAF1 and PRKCA genes to be master regulators in these patients. Therefore, XomAnnotate is able to use exome data to identify entire mutation landscape, pathways, and the master genes accurately with wide concordance from earlier microarray and whole-genome studies -- making it a suitable biomedical software for using exome in next-generation translational medicine.
Collapse
Affiliation(s)
- Asoke K. Talukder
- InterpretOmics India Pvt Ltd, #329, 7 Main, HAL 2 Stage, Indiranagar, Bangalore, 560 008, Karnataka, India
- * E-mail:
| | - Shashidhar Ravishankar
- InterpretOmics India Pvt Ltd, #329, 7 Main, HAL 2 Stage, Indiranagar, Bangalore, 560 008, Karnataka, India
| | - Krittika Sasmal
- InterpretOmics India Pvt Ltd, #329, 7 Main, HAL 2 Stage, Indiranagar, Bangalore, 560 008, Karnataka, India
| | - Santhosh Gandham
- InterpretOmics India Pvt Ltd, #329, 7 Main, HAL 2 Stage, Indiranagar, Bangalore, 560 008, Karnataka, India
| | - Jyothsna Prabhukumar
- InterpretOmics India Pvt Ltd, #329, 7 Main, HAL 2 Stage, Indiranagar, Bangalore, 560 008, Karnataka, India
| | - Prahalad H. Achutharao
- InterpretOmics India Pvt Ltd, #329, 7 Main, HAL 2 Stage, Indiranagar, Bangalore, 560 008, Karnataka, India
| | - Debmalya Barh
- InterpretOmics India Pvt Ltd, #329, 7 Main, HAL 2 Stage, Indiranagar, Bangalore, 560 008, Karnataka, India
- Centre for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied Biotechnology (IIOAB), Nonakuri, Purba Medinipur, West Bengal, 721172, India
| | - Francesco Blasi
- Laboratory of Transcriptional Regulation in Development and Cancer, IFOM (Fondazione Istituto FIRC di Oncologia Molecolare), Milano, Italy
| |
Collapse
|
47
|
Mirenda M, Toffali L, Montresor A, Scardoni G, Sorio C, Laudanna C. Protein tyrosine phosphatase receptor type γ is a JAK phosphatase and negatively regulates leukocyte integrin activation. THE JOURNAL OF IMMUNOLOGY 2015; 194:2168-79. [PMID: 25624455 DOI: 10.4049/jimmunol.1401841] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Regulation of signal transduction networks depends on protein kinase and phosphatase activities. Protein tyrosine kinases of the JAK family have been shown to regulate integrin affinity modulation by chemokines and mediated homing to secondary lymphoid organs of human T lymphocytes. However, the role of protein tyrosine phosphatases in leukocyte recruitment is still elusive. In this study, we address this issue by focusing on protein tyrosine phosphatase receptor type γ (PTPRG), a tyrosine phosphatase highly expressed in human primary monocytes. We developed a novel methodology to study the signaling role of receptor type tyrosine phosphatases and found that activated PTPRG blocks chemoattractant-induced β2 integrin activation. Specifically, triggering of LFA-1 to high-affinity state is prevented by PTPRG activation. High-throughput phosphoproteomics and computational analyses show that PTPRG activation affects the phosphorylation state of at least 31 signaling proteins. Deeper examination shows that JAKs are critically involved in integrin-mediated monocyte adhesion and that PTPRG activation leads to JAK2 dephosphorylation on the critical 1007-1008 phosphotyrosine residues, implying JAK2 inhibition and thus explaining the antiadhesive role of PTPRG. Overall, the data validate a new approach to study receptor tyrosine phosphatases and show that, by targeting JAKs, PTPRG downmodulates the rapid activation of integrin affinity in human monocytes, thus emerging as a potential novel critical regulator of leukocyte trafficking.
Collapse
Affiliation(s)
- Michela Mirenda
- Division of General Pathology, Department of Pathology and Diagnostics, School of Medicine, University of Verona, Verona 37134, Italy; and
| | - Lara Toffali
- Division of General Pathology, Department of Pathology and Diagnostics, School of Medicine, University of Verona, Verona 37134, Italy; and Center for Biomedical Computing, University of Verona, Verona 37134, Italy
| | - Alessio Montresor
- Division of General Pathology, Department of Pathology and Diagnostics, School of Medicine, University of Verona, Verona 37134, Italy; and Center for Biomedical Computing, University of Verona, Verona 37134, Italy
| | - Giovanni Scardoni
- Center for Biomedical Computing, University of Verona, Verona 37134, Italy
| | - Claudio Sorio
- Division of General Pathology, Department of Pathology and Diagnostics, School of Medicine, University of Verona, Verona 37134, Italy; and
| | - Carlo Laudanna
- Division of General Pathology, Department of Pathology and Diagnostics, School of Medicine, University of Verona, Verona 37134, Italy; and Center for Biomedical Computing, University of Verona, Verona 37134, Italy
| |
Collapse
|
48
|
Mooney MA, Nigg JT, McWeeney SK, Wilmot B. Functional and genomic context in pathway analysis of GWAS data. Trends Genet 2014; 30:390-400. [PMID: 25154796 DOI: 10.1016/j.tig.2014.07.004] [Citation(s) in RCA: 86] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 07/18/2014] [Accepted: 07/18/2014] [Indexed: 02/07/2023]
Abstract
Gene set analysis (GSA) is a promising tool for uncovering the polygenic effects associated with complex diseases. However, the available techniques reflect a wide variety of hypotheses about how genetic effects interact to contribute to disease susceptibility. The lack of consensus about the best way to perform GSA has led to confusion in the field and has made it difficult to compare results across methods. A clear understanding of the various choices made during GSA - such as how gene sets are defined, how single-nucleotide polymorphisms (SNPs) are assigned to genes, and how individual SNP-level effects are aggregated to produce gene- or pathway-level effects - will improve the interpretability and comparability of results across methods and studies. In this review we provide an overview of the various data sources used to construct gene sets and the statistical methods used to test for gene set association, as well as provide guidelines for ensuring the comparability of results.
Collapse
Affiliation(s)
- Michael A Mooney
- Division of Bioinformatics and Computational Biology, Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA; OHSU Knight Cancer Institute, Portland, OR, USA
| | - Joel T Nigg
- Division of Psychology, Department of Psychiatry, Oregon Health & Science University, Portland, OR, USA; Department of Behavioral Neuroscience, Oregon Health & Science University, Portland, OR, USA
| | - Shannon K McWeeney
- Division of Bioinformatics and Computational Biology, Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA; Oregon Clinical and Translational Research Institute, Portland, OR, USA; OHSU Knight Cancer Institute, Portland, OR, USA.
| | - Beth Wilmot
- Division of Bioinformatics and Computational Biology, Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA; Oregon Clinical and Translational Research Institute, Portland, OR, USA; OHSU Knight Cancer Institute, Portland, OR, USA
| |
Collapse
|
49
|
Peng Q, Schork NJ. Utility of network integrity methods in therapeutic target identification. Front Genet 2014; 5:12. [PMID: 24550933 PMCID: PMC3909879 DOI: 10.3389/fgene.2014.00012] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Accepted: 01/13/2014] [Indexed: 01/05/2023] Open
Abstract
Analysis of the biological gene networks involved in a disease may lead to the identification of therapeutic targets. Such analysis requires exploring network properties, in particular the importance of individual network nodes (i.e., genes). There are many measures that consider the importance of nodes in a network and some may shed light on the biological significance and potential optimality of a gene or set of genes as therapeutic targets. This has been shown to be the case in cancer therapy. A dilemma exists, however, in finding the best therapeutic targets based on network analysis since the optimal targets should be nodes that are highly influential in, but not toxic to, the functioning of the entire network. In addition, cancer therapeutics targeting a single gene often result in relapse since compensatory, feedback and redundancy loops in the network may offset the activity associated with the targeted gene. Thus, multiple genes reflecting parallel functional cascades in a network should be targeted simultaneously, but require the identification of such targets. We propose a methodology that exploits centrality statistics characterizing the importance of nodes within a gene network that is constructed from the gene expression patterns in that network. We consider centrality measures based on both graph theory and spectral graph theory. We also consider the origins of a network topology, and show how different available representations yield different node importance results. We apply our techniques to tumor gene expression data and suggest that the identification of optimal therapeutic targets involving particular genes, pathways and sub-networks based on an analysis of the nodes in that network is possible and can facilitate individualized cancer treatments. The proposed methods also have the potential to identify candidate cancer therapeutic targets that are not thought to be oncogenes but nonetheless play important roles in the functioning of a cancer-related network or pathway.
Collapse
Affiliation(s)
- Qian Peng
- Department of Molecular and Experimental Medicine, The Scripps Research Institute La Jolla, CA, USA ; Scripps Genomic Medicine, The Scripps Translational Science Institute La Jolla, CA, USA
| | - Nicholas J Schork
- Department of Molecular and Experimental Medicine, The Scripps Research Institute La Jolla, CA, USA ; Scripps Genomic Medicine, The Scripps Translational Science Institute La Jolla, CA, USA
| |
Collapse
|
50
|
Rahmatallah Y, Emmert-Streib F, Glazko G. Gene Sets Net Correlations Analysis (GSNCA): a multivariate differential coexpression test for gene sets. ACTA ACUST UNITED AC 2013; 30:360-8. [PMID: 24292935 PMCID: PMC4023302 DOI: 10.1093/bioinformatics/btt687] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION To date, gene set analysis approaches primarily focus on identifying differentially expressed gene sets (pathways). Methods for identifying differentially coexpressed pathways also exist but are mostly based on aggregated pairwise correlations or other pairwise measures of coexpression. Instead, we propose Gene Sets Net Correlations Analysis (GSNCA), a multivariate differential coexpression test that accounts for the complete correlation structure between genes. RESULTS In GSNCA, weight factors are assigned to genes in proportion to the genes' cross-correlations (intergene correlations). The problem of finding the weight vectors is formulated as an eigenvector problem with a unique solution. GSNCA tests the null hypothesis that for a gene set there is no difference in the weight vectors of the genes between two conditions. In simulation studies and the analyses of experimental data, we demonstrate that GSNCA captures changes in the structure of genes' cross-correlations rather than differences in the averaged pairwise correlations. Thus, GSNCA infers differences in coexpression networks, however, bypassing method-dependent steps of network inference. As an additional result from GSNCA, we define hub genes as genes with the largest weights and show that these genes correspond frequently to major and specific pathway regulators, as well as to genes that are most affected by the biological difference between two conditions. In summary, GSNCA is a new approach for the analysis of differentially coexpressed pathways that also evaluates the importance of the genes in the pathways, thus providing unique information that may result in the generation of novel biological hypotheses. AVAILABILITY AND IMPLEMENTATION Implementation of the GSNCA test in R is available upon request from the authors.
Collapse
Affiliation(s)
- Yasir Rahmatallah
- Division of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA and Computational Biology and Machine Learning Laboratory, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast BT9 7BL, UK
| | | | | |
Collapse
|