1
|
Almohaywi M, Sugita BM, Centa A, Fonseca AS, Antunes VC, Fadda P, Mannion CM, Abijo T, Goldberg SL, Campbell MC, Copeland RL, Kanaan Y, Cavalli LR. Deregulated miRNA Expression in Triple-Negative Breast Cancer of Ancestral Genomic-Characterized Latina Patients. Int J Mol Sci 2023; 24:13046. [PMID: 37685851 PMCID: PMC10487916 DOI: 10.3390/ijms241713046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 08/03/2023] [Accepted: 08/05/2023] [Indexed: 09/10/2023] Open
Abstract
Among patients with triple-negative breast cancer (TNBC), several studies have suggested that deregulated microRNA (miRNA) expression may be associated with a more aggressive phenotype. Although tumor molecular signatures may be race- and/or ethnicity-specific, there is limited information on the molecular profiles in women with TNBC of Hispanic and Latin American ancestry. We simultaneously profiled TNBC biopsies for the genome-wide copy number and miRNA global expression from 28 Latina women and identified a panel of 28 miRNAs associated with copy number alterations (CNAs). Four selected miRNAs (miR-141-3p, miR-150-5p, miR-182-5p, and miR-661) were validated in a subset of tumor and adjacent non-tumor tissue samples, with miR-182-5p being the most discriminatory among tissue groups (AUC value > 0.8). MiR-141-3p up-regulation was associated with increased cancer recurrence; miR-661 down-regulation with larger tumor size; and down-regulation of miR-150-5p with larger tumor size, high p53 expression, increased cancer recurrence, presence of distant metastasis, and deceased status. This study reinforces the importance of integration analysis of CNAs and miRNAs in TNBC, allowing for the identification of interactions among molecular mechanisms. Additionally, this study emphasizes the significance of considering the patients ancestral background when examining TNBC, as it can influence the relationship between intrinsic tumor molecular characteristics and clinical manifestations of the disease.
Collapse
Affiliation(s)
- Maram Almohaywi
- Microbiology Department, Howard University Cancer Center, Howard University, Washington, DC 20059, USA
| | - Bruna M. Sugita
- Research Institute Pelé Pequeno Príncipe, Faculdades Pequeno Príncipe, Curitiba 80250-060, PR, Brazil
| | - Ariana Centa
- Research Institute Pelé Pequeno Príncipe, Faculdades Pequeno Príncipe, Curitiba 80250-060, PR, Brazil
| | - Aline S. Fonseca
- Research Institute Pelé Pequeno Príncipe, Faculdades Pequeno Príncipe, Curitiba 80250-060, PR, Brazil
| | - Valquiria C. Antunes
- Research Institute Pelé Pequeno Príncipe, Faculdades Pequeno Príncipe, Curitiba 80250-060, PR, Brazil
| | - Paolo Fadda
- Genomics Shared Resource, Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Ciaran M. Mannion
- Department of Pathology, Hackensack University Medical Center, Hackensack, NJ 07701, USA
| | - Tomilowo Abijo
- National Institute of Diabetes and Kidney Diseases, National Institute of Health, Bethesda, MD 20814, USA
| | - Stuart L. Goldberg
- John Theurer Cancer Center, Hackensack Meridian School of Medicine, Hackensack, NJ 07701, USA
- COTA, Inc., New York, NY 10014, USA
| | - Michael C. Campbell
- Department of Biological Sciences Human and Evolutionary Biology Section, University of Southern California, Los Angeles, CA 90089, USA
| | - Robert L. Copeland
- Pharmacology Department, Howard University Cancer Center, Howard University, Washington, DC 20059, USA
| | - Yasmine Kanaan
- Microbiology Department, Howard University Cancer Center, Howard University, Washington, DC 20059, USA
| | - Luciane R. Cavalli
- Research Institute Pelé Pequeno Príncipe, Faculdades Pequeno Príncipe, Curitiba 80250-060, PR, Brazil
- Oncology Department, Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 20007, USA
| |
Collapse
|
2
|
Yue Z, Zheng Q, Neylon MT, Yoo M, Shin J, Zhao Z, Tan AC, Chen JY. PAGER 2.0: an update to the pathway, annotated-list and gene-signature electronic repository for Human Network Biology. Nucleic Acids Res 2019; 46:D668-D676. [PMID: 29126216 PMCID: PMC5753198 DOI: 10.1093/nar/gkx1040] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Accepted: 11/03/2017] [Indexed: 12/14/2022] Open
Abstract
Integrative Gene-set, Network and Pathway Analysis (GNPA) is a powerful data analysis approach developed to help interpret high-throughput omics data. In PAGER 1.0, we demonstrated that researchers can gain unbiased and reproducible biological insights with the introduction of PAGs (Pathways, Annotated-lists and Gene-signatures) as the basic data representation elements. In PAGER 2.0, we improve the utility of integrative GNPA by significantly expanding the coverage of PAGs and PAG-to-PAG relationships in the database, defining a new metric to quantify PAG data qualities, and developing new software features to simplify online integrative GNPA. Specifically, we included 84 282 PAGs spanning 24 different data sources that cover human diseases, published gene-expression signatures, drug-gene, miRNA-gene interactions, pathways and tissue-specific gene expressions. We introduced a new normalized Cohesion Coefficient (nCoCo) score to assess the biological relevance of genes inside a PAG, and RP-score to rank genes and assign gene-specific weights inside a PAG. The companion web interface contains numerous features to help users query and navigate the database content. The database content can be freely downloaded and is compatible with third-party Gene Set Enrichment Analysis tools. We expect PAGER 2.0 to become a major resource in integrative GNPA. PAGER 2.0 is available at http://discovery.informatics.uab.edu/PAGER/.
Collapse
Affiliation(s)
- Zongliang Yue
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35294, USA
| | - Qi Zheng
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35294, USA.,School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, Guangdong 510006, China
| | - Michael T Neylon
- Indiana University School of Informatics and Computing, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| | - Minjae Yoo
- Division of Medical Oncology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Jimin Shin
- Division of Medical Oncology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Zhiying Zhao
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35294, USA.,School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
| | - Aik Choon Tan
- Division of Medical Oncology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Jake Y Chen
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35294, USA
| |
Collapse
|
3
|
Stoney RA, Schwartz JM, Robertson DL, Nenadic G. Using set theory to reduce redundancy in pathway sets. BMC Bioinformatics 2018; 19:386. [PMID: 30340461 PMCID: PMC6194563 DOI: 10.1186/s12859-018-2355-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2017] [Accepted: 08/31/2018] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND The consolidation of pathway databases, such as KEGG, Reactome and ConsensusPathDB, has generated widespread biological interest, however the issue of pathway redundancy impedes the use of these consolidated datasets. Attempts to reduce this redundancy have focused on visualizing pathway overlap or merging pathways, but the resulting pathways may be of heterogeneous sizes and cover multiple biological functions. Efforts have also been made to deal with redundancy in pathway data by consolidating enriched pathways into a number of clusters or concepts. We present an alternative approach, which generates pathway subsets capable of covering all of genes presented within either pathway databases or enrichment results, generating substantial reductions in redundancy. RESULTS We propose a method that uses set cover to reduce pathway redundancy, without merging pathways. The proposed approach considers three objectives: removal of pathway redundancy, controlling pathway size and coverage of the gene set. By applying set cover to the ConsensusPathDB dataset we were able to produce a reduced set of pathways, representing 100% of the genes in the original data set with 74% less redundancy, or 95% of the genes with 88% less redundancy. We also developed an algorithm to simplify enrichment data and applied it to a set of enriched osteoarthritis pathways, revealing that within the top ten pathways, five were redundant subsets of more enriched pathways. Applying set cover to the enrichment results removed these redundant pathways allowing more informative pathways to take their place. CONCLUSION Our method provides an alternative approach for handling pathway redundancy, while ensuring that the pathways are of homogeneous size and gene coverage is maximised. Pathways are not altered from their original form, allowing biological knowledge regarding the data set to be directly applicable. We demonstrate the ability of the algorithms to prioritise redundancy reduction, pathway size control or gene set coverage. The application of set cover to pathway enrichment results produces an optimised summary of the pathways that best represent the differentially regulated gene set.
Collapse
Affiliation(s)
| | - Jean-Marc Schwartz
- School of Biological Sciences, University of Manchester, Manchester, M13 9PT UK
| | - David L Robertson
- School of Biological Sciences, University of Manchester, Manchester, M13 9PT UK
- MRC-University of Glasgow Centre for Virus Research, Garscube Campus, Glasgow, G61 1QH UK
| | - Goran Nenadic
- School of Computer Science, University of Manchester, Manchester, M13 9PL UK
- Manchester Institute of Biotechnology, University of Manchester, Manchester, M1 7DN UK
| |
Collapse
|
4
|
Chen KM, Tan J, Way GP, Doing G, Hogan DA, Greene CS. PathCORE-T: identifying and visualizing globally co-occurring pathways in large transcriptomic compendia. BioData Min 2018; 11:14. [PMID: 29988723 PMCID: PMC6029133 DOI: 10.1186/s13040-018-0175-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Accepted: 06/18/2018] [Indexed: 12/29/2022] Open
Abstract
Background Investigators often interpret genome-wide data by analyzing the expression levels of genes within pathways. While this within-pathway analysis is routine, the products of any one pathway can affect the activity of other pathways. Past efforts to identify relationships between biological processes have evaluated overlap in knowledge bases or evaluated changes that occur after specific treatments. Individual experiments can highlight condition-specific pathway-pathway relationships; however, constructing a complete network of such relationships across many conditions requires analyzing results from many studies. Results We developed PathCORE-T framework by implementing existing methods to identify pathway-pathway transcriptional relationships evident across a broad data compendium. PathCORE-T is applied to the output of feature construction algorithms; it identifies pairs of pathways observed in features more than expected by chance as functionally co-occurring. We demonstrate PathCORE-T by analyzing an existing eADAGE model of a microbial compendium and building and analyzing NMF features from the TCGA dataset of 33 cancer types. The PathCORE-T framework includes a demonstration web interface, with source code, that users can launch to (1) visualize the network and (2) review the expression levels of associated genes in the original data. PathCORE-T creates and displays the network of globally co-occurring pathways based on features observed in a machine learning analysis of gene expression data. Conclusions The PathCORE-T framework identifies transcriptionally co-occurring pathways from the results of unsupervised analysis of gene expression data and visualizes the relationships between pathways as a network. PathCORE-T recapitulated previously described pathway-pathway relationships and suggested experimentally testable additional hypotheses that remain to be explored. Electronic supplementary material The online version of this article (10.1186/s13040-018-0175-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kathleen M Chen
- 1Department of Systems Pharmacology and Translational Therapeutics. Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA 19104 USA
| | - Jie Tan
- 2Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755 USA
| | - Gregory P Way
- 1Department of Systems Pharmacology and Translational Therapeutics. Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA 19104 USA
| | - Georgia Doing
- 3Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755 USA
| | - Deborah A Hogan
- 3Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755 USA
| | - Casey S Greene
- 1Department of Systems Pharmacology and Translational Therapeutics. Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA 19104 USA
| |
Collapse
|
5
|
Barradas-Bautista D, Rosell M, Pallara C, Fernández-Recio J. Structural Prediction of Protein–Protein Interactions by Docking: Application to Biomedical Problems. PROTEIN-PROTEIN INTERACTIONS IN HUMAN DISEASE, PART A 2018; 110:203-249. [DOI: 10.1016/bs.apcsb.2017.06.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
6
|
Chen JY, Pandey R, Nguyen TM. HAPPI-2: a Comprehensive and High-quality Map of Human Annotated and Predicted Protein Interactions. BMC Genomics 2017; 18:182. [PMID: 28212602 PMCID: PMC5314692 DOI: 10.1186/s12864-017-3512-1] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Accepted: 01/24/2017] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Human protein-protein interaction (PPI) data is essential to network and systems biology studies. PPI data can help biochemists hypothesize how proteins form complexes by binding to each other, how extracellular signals propagate through post-translational modification of de-activated signaling molecules, and how chemical reactions are coupled by enzymes involved in a complex biological process. Our capability to develop good public database resources for human PPI data has a direct impact on the quality of future research on genome biology and medicine. RESULTS The database of Human Annotated and Predicted Protein Interactions (HAPPI) version 2.0 is a major update to the original HAPPI 1.0 database. It contains 2,922,202 unique protein-protein interactions (PPI) linked by 23,060 human proteins, making it the most comprehensive database covering human PPI data today. These PPIs contain both physical/direct interactions and high-quality functional/indirect interactions. Compared with the HAPPI 1.0 database release, HAPPI database version 2.0 (HAPPI-2) represents a 485% of human PPI data coverage increase and a 73% protein coverage increase. The revamped HAPPI web portal provides users with a friendly search, curation, and data retrieval interface, allowing them to retrieve human PPIs and available annotation information on the interaction type, interaction quality, interacting partner drug targeting data, and disease information. The updated HAPPI-2 can be freely accessed by Academic users at http://discovery.informatics.uab.edu/HAPPI . CONCLUSIONS While the underlying data for HAPPI-2 are integrated from a diverse data sources, the new HAPPI-2 release represents a good balance between data coverage and data quality of human PPIs, making it ideally suited for network biology.
Collapse
Affiliation(s)
- Jake Y Chen
- Wenzhou Medical University First Affiliate Hospital, Wenzhou, Zhejiang Province, China. .,Medeolinx, LLC, Indianapolis, IN, 46280, USA. .,The Informatics Institute, University of Alabama at Birmingham School of Medicine, Birmingham, AL, 35294, USA. .,Indiana Center for Systems Biology and Personalized Medicine, Indiana University School of Informatics and Computing, Indianapolis, IN, 46202, USA.
| | | | - Thanh M Nguyen
- Indiana Center for Systems Biology and Personalized Medicine, Indiana University School of Informatics and Computing, Indianapolis, IN, 46202, USA
| |
Collapse
|
7
|
Simple and complex retinal dystrophies are associated with profoundly different disease networks. Sci Rep 2017; 7:41835. [PMID: 28139756 PMCID: PMC5282568 DOI: 10.1038/srep41835] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Accepted: 12/28/2016] [Indexed: 12/20/2022] Open
Abstract
Retinopathies are a group of monogenetic or complex retinal diseases associated with high unmet medical need. Monogenic disorders are caused by rare genetic variation and usually arise early in life. Other diseases, such as age-related macular degeneration (AMD), develop late in life and are considered to be of complex origin as they develop from a combination of genetic, ageing, environmental and lifestyle risk factors. Here, we contrast the underlying disease networks and pathological mechanisms of monogenic as opposed to complex retinopathies, using AMD as an example of the latter. We show that, surprisingly, genes associated with the different forms of retinopathies in general do not overlap despite their overlapping retinal phenotypes. Further, AMD risk genes participate in multiple networks with interaction partners that link to different ubiquitous pathways affecting general tissue integrity and homeostasis. Thus AMD most likely represents an endophenotype with differing underlying pathogenesis in different subjects. Localising these pathomechanisms and processes within and across different retinal anatomical compartments provides a novel representation of AMD that may be extended to complex disease in general. This approach may generate improved treatment options that target multiple processes with the aim of restoring tissue homeostasis and maintaining vision.
Collapse
|
8
|
Yue Z, Kshirsagar MM, Nguyen T, Suphavilai C, Neylon MT, Zhu L, Ratliff T, Chen JY. PAGER: constructing PAGs and new PAG-PAG relationships for network biology. Bioinformatics 2015; 31:i250-7. [PMID: 26072489 PMCID: PMC4553834 DOI: 10.1093/bioinformatics/btv265] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
In this article, we described a new database framework to perform integrative “gene-set, network, and pathway analysis” (GNPA). In this framework, we integrated heterogeneous data on pathways, annotated list, and gene-sets (PAGs) into a PAG electronic repository (PAGER). PAGs in the PAGER database are organized into P-type, A-type and G-type PAGs with a three-letter-code standard naming convention. The PAGER database currently compiles 44 313 genes from 5 species including human, 38 663 PAGs, 324 830 gene–gene relationships and two types of 3 174 323 PAG–PAG regulatory relationships—co-membership based and regulatory relationship based. To help users assess each PAG’s biological relevance, we developed a cohesion measure called Cohesion Coefficient (CoCo), which is capable of disambiguating between biologically significant PAGs and random PAGs with an area-under-curve performance of 0.98. PAGER database was set up to help users to search and retrieve PAGs from its online web interface. PAGER enable advanced users to build PAG–PAG regulatory networks that provide complementary biological insights not found in gene set analysis or individual gene network analysis. We provide a case study using cancer functional genomics data sets to demonstrate how integrative GNPA help improve network biology data coverage and therefore biological interpretability. The PAGER database can be accessible openly at http://discovery.informatics.iupui.edu/PAGER/. Contact: jakechen@iupui.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zongliang Yue
- Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China
| | - Madhura M Kshirsagar
- Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China
| | - Thanh Nguyen
- Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China
| | - Chayaporn Suphavilai
- Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China
| | - Michael T Neylon
- Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China
| | - Liugen Zhu
- Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China
| | - Timothy Ratliff
- Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China
| | - Jake Y Chen
- Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China Indiana University School of Informatics and Computing, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, Purdue University Center for Cancer Research, West Lafayette, IN 47906 and Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical University, WenZhou, Zhe Jiang Province, China
| |
Collapse
|
9
|
Suphavilai C, Zhu L, Chen JY. A method for developing regulatory gene set networks to characterize complex biological systems. BMC Genomics 2015; 16 Suppl 11:S4. [PMID: 26576648 PMCID: PMC4652563 DOI: 10.1186/1471-2164-16-s11-s4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Background Traditional approaches to studying molecular networks are based on linking genes or proteins. Higher-level networks linking gene sets or pathways have been proposed recently. Several types of gene set networks have been used to study complex molecular networks such as co-membership gene set networks (M-GSNs) and co-enrichment gene set networks (E-GSNs). Gene set networks are useful for studying biological mechanism of diseases and drug perturbations. Results In this study, we proposed a new approach for constructing directed, regulatory gene set networks (R-GSNs) to reveal novel relationships among gene sets or pathways. We collected several gene set collections and high-quality gene regulation data in order to construct R-GSNs in a comparative study with co-membership gene set networks (M-GSNs). We described a method for constructing both global and disease-specific R-GSNs and determining their significance. To demonstrate the potential applications to disease biology studies, we constructed and analysed an R-GSN specifically built for Alzheimer's disease. Conclusions R-GSNs can provide new biological insights complementary to those derived at the protein regulatory network level or M-GSNs. When integrated properly to functional genomics data, R-GSNs can help enable future research on systems biology and translational bioinformatics.
Collapse
|
10
|
Karimpour-Fard A, Epperson LE, Hunter LE. A survey of computational tools for downstream analysis of proteomic and other omic datasets. Hum Genomics 2015; 9:28. [PMID: 26510531 PMCID: PMC4624643 DOI: 10.1186/s40246-015-0050-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 10/06/2015] [Indexed: 12/19/2022] Open
Abstract
Proteomics is an expanding area of research into biological systems with significance for biomedical and therapeutic applications ranging from understanding the molecular basis of diseases to testing new treatments, studying the toxicity of drugs, or biotechnological improvements in agriculture. Progress in proteomic technologies and growing interest has resulted in rapid accumulation of proteomic data, and consequently, a great number of tools have become available. In this paper, we review the well-known and ready-to-use tools for classification, clustering and validation, interpretation, and generation of biological information from experimental data. We suggest some rules of thumb for the reader on choosing the best suitable learning method for a particular dataset and conclude with pathway and functional analysis and then provide information about submitting final results to a repository.
Collapse
Affiliation(s)
- Anis Karimpour-Fard
- Department of Pharmacology, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| | - L Elaine Epperson
- Integrated Center for Genes, Environment, and Health, National Jewish Health, Denver, CO, 80206, USA
| | - Lawrence E Hunter
- Department of Pharmacology, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| |
Collapse
|
11
|
Belinky F, Nativ N, Stelzer G, Zimmerman S, Iny Stein T, Safran M, Lancet D. PathCards: multi-source consolidation of human biological pathways. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav006. [PMID: 25725062 PMCID: PMC4343183 DOI: 10.1093/database/bav006] [Citation(s) in RCA: 174] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The study of biological pathways is key to a large number of systems analyses. However, many relevant tools consider a limited number of pathway sources, missing out on many genes and gene-to-gene connections. Simply pooling several pathways sources would result in redundancy and the lack of systematic pathway interrelations. To address this, we exercised a combination of hierarchical clustering and nearest neighbor graph representation, with judiciously selected cutoff values, thereby consolidating 3215 human pathways from 12 sources into a set of 1073 SuperPaths. Our unification algorithm finds a balance between reducing redundancy and optimizing the level of pathway-related informativeness for individual genes. We show a substantial enhancement of the SuperPaths’ capacity to infer gene-to-gene relationships when compared with individual pathway sources, separately or taken together. Further, we demonstrate that the chosen 12 sources entail nearly exhaustive gene coverage. The computed SuperPaths are presented in a new online database, PathCards, showing each SuperPath, its constituent network of pathways, and its contained genes. This provides researchers with a rich, searchable systems analysis resource.Database URL:http://pathcards.genecards.org/
Collapse
Affiliation(s)
- Frida Belinky
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Noam Nativ
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Gil Stelzer
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Shahar Zimmerman
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Tsippi Iny Stein
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Marilyn Safran
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Doron Lancet
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
12
|
Chen YA, Tripathi LP, Dessailly BH, Nyström-Persson J, Ahmad S, Mizuguchi K. Integrated pathway clusters with coherent biological themes for target prioritisation. PLoS One 2014; 9:e99030. [PMID: 24918583 PMCID: PMC4053319 DOI: 10.1371/journal.pone.0099030] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2014] [Accepted: 05/07/2014] [Indexed: 12/15/2022] Open
Abstract
Prioritising candidate genes for further experimental characterisation is an essential, yet challenging task in biomedical research. One way of achieving this goal is to identify specific biological themes that are enriched within the gene set of interest to obtain insights into the biological phenomena under study. Biological pathway data have been particularly useful in identifying functional associations of genes and/or gene sets. However, biological pathway information as compiled in varied repositories often differs in scope and content, preventing a more effective and comprehensive characterisation of gene sets. Here we describe a new approach to constructing biologically coherent gene sets from pathway data in major public repositories and employing them for functional analysis of large gene sets. We first revealed significant overlaps in gene content between different pathways and then defined a clustering method based on the shared gene content and the similarity of gene overlap patterns. We established the biological relevance of the constructed pathway clusters using independent quantitative measures and we finally demonstrated the effectiveness of the constructed pathway clusters in comparative functional enrichment analysis of gene sets associated with diverse human diseases gathered from the literature. The pathway clusters and gene mappings have been integrated into the TargetMine data warehouse and are likely to provide a concise, manageable and biologically relevant means of functional analysis of gene sets and to facilitate candidate gene prioritisation.
Collapse
Affiliation(s)
- Yi-An Chen
- National Institute of Biomedical Innovation, Ibaraki, Osaka, Japan
| | | | | | | | - Shandar Ahmad
- National Institute of Biomedical Innovation, Ibaraki, Osaka, Japan
| | - Kenji Mizuguchi
- National Institute of Biomedical Innovation, Ibaraki, Osaka, Japan
| |
Collapse
|
13
|
Pathway and network analysis in proteomics. J Theor Biol 2014; 362:44-52. [PMID: 24911777 DOI: 10.1016/j.jtbi.2014.05.031] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2014] [Revised: 05/15/2014] [Accepted: 05/21/2014] [Indexed: 12/14/2022]
Abstract
Proteomics is inherently a systems science that studies not only measured protein and their expressions in a cell, but also the interplay of proteins, protein complexes, signaling pathways, and network modules. There is a rapid accumulation of Proteomics data in recent years. However, Proteomics data are highly variable, with results sensitive to data preparation methods, sample condition, instrument types, and analytical methods. To address the challenge in Proteomics data analysis, we review current tools being developed to incorporate biological function and network topological information. We categorize these tools into four types: tools with basic functional information and little topological features (e.g., GO category analysis), tools with rich functional information and little topological features (e.g., GSEA), tools with basic functional information and rich topological features (e.g., Cytoscape), and tools with rich functional information and rich topological features (e.g., PathwayExpress). We first review the potential application of these tools to Proteomics; then we review tools that can achieve automated learning of pathway modules and features, and tools that help perform integrated network visual analytics.
Collapse
|
14
|
Mitrea C, Taghavi Z, Bokanizad B, Hanoudi S, Tagett R, Donato M, Voichiţa C, Drăghici S. Methods and approaches in the topology-based analysis of biological pathways. Front Physiol 2013; 4:278. [PMID: 24133454 PMCID: PMC3794382 DOI: 10.3389/fphys.2013.00278] [Citation(s) in RCA: 136] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Accepted: 09/15/2013] [Indexed: 11/21/2022] Open
Abstract
The goal of pathway analysis is to identify the pathways significantly impacted in a given phenotype. Many current methods are based on algorithms that consider pathways as simple gene lists, dramatically under-utilizing the knowledge that such pathways are meant to capture. During the past few years, a plethora of methods claiming to incorporate various aspects of the pathway topology have been proposed. These topology-based methods, sometimes referred to as “third generation,” have the potential to better model the phenomena described by pathways. Although there is now a large variety of approaches used for this purpose, no review is currently available to offer guidance for potential users and developers. This review covers 22 such topology-based pathway analysis methods published in the last decade. We compare these methods based on: type of pathways analyzed (e.g., signaling or metabolic), input (subset of genes, all genes, fold changes, gene p-values, etc.), mathematical models, pathway scoring approaches, output (one or more pathway scores, p-values, etc.) and implementation (web-based, standalone, etc.). We identify and discuss challenges, arising both in methodology and in pathway representation, including inconsistent terminology, different data formats, lack of meaningful benchmarks, and the lack of tissue and condition specificity.
Collapse
Affiliation(s)
- Cristina Mitrea
- Department of Computer Science, Wayne State University Detroit, MI, USA
| | | | | | | | | | | | | | | |
Collapse
|
15
|
Abstract
BACKGROUND Early detection of breast cancer in blood is both appealing clinically and challenging technically due to the disease's illusive nature and heterogeneity. Today, even though major breast cancer subtypes have been characterized, i.e., luminal A, luminal B, HER2+, and basal-like, little is known about the heterogeneity of breast cancer in blood, which could help to discover minimally invasive protein biomarkers with which clinical researchers can detect, classify, and monitor different breast cancer subtypes. RESULTS In this study, we performed an integrative pathway-assisted clustering analysis of breast cancer subtypes from plasma proteome samples collected from 80 patients diagnosed with breast cancer and 80 healthy women. First, four breast cancer subtypes and additionally unknown subtype (according to existing annotation) were determined based on pathology lab test results in primary tumors of enrolled patients. Next, we developed and applied four distance metrics, i.e., Protein Intensity, Q-Value, Pathway Profile, and Distance Score Function, to measure and characterize these cancer subtypes. Then, we developed a permutation test to evaluate the significant protein level changes in each biological pathway for each breast cancer subtype, using q-value. Lastly, we developed a pathway-protein matrix for each of the four distance methods to estimate the distance between breast cancer subtypes, for which further Pathway Association Network analysis were performed. CONCLUSIONS We found that 1) the luminal group (luminal A and luminal B) are clustered together, as well as the basal group (basal-like and HER2+) and 2) luminal A and luminal B are more close to each other than basal-like and HER2+ to each other. Our results were consistent with a recent independent breast cancer research from the Cancer Genome Atlas Network using genomic DNA copy number arrays, DNA methylation, exome sequencing, messenger RNA arrays, microRNA sequencing and reverse-phase protein arrays. Our results showed that changes of different breast cancer subtypes at the pathway level are more profound and less variable than those at the molecular level. Similar subtypes share distinct yet similar pathway activation networks, while dissimilar subtypes are different also at the level of pathway activation networks. The results also showed that distance or similarity of cancer subtypes based on pathway analysis might be able to provide further insight into the intrinsic relationship of breast cancer subtypes. We believe integrative pathway-assisted proteomics analysis described here can become a model for reliable clustering or classification of other cancer subtypes.
Collapse
Affiliation(s)
- Fan Zhang
- Department of Academic and Institutional Resources and Technology, University of North Texas Health Science Center, Fort Worth 76107, USA
| | - Jake Y Chen
- School of Informatics, Indiana University, Indianapolis, IN 46202, USA
- Department of Computer and Information Science, School of Science, Purdue University, Indianapolis, IN 46202, USA
- Indiana Center for Systems Biology and Personalized Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|
16
|
Friedman Y, Balaga O, Linial M. Working together: combinatorial regulation by microRNAs. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2013; 774:317-37. [PMID: 23377980 DOI: 10.1007/978-94-007-5590-1_16] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
MicroRNAs (miRNAs) negatively regulate gene expression level of mRNA post-transcriptionally. Deep sequencing and large-scale screening methods have yielded about 1,500 miRNA sequences in human. Each miRNA contains a seed sequence that is required, but not sufficient, for the correct matching with its targets. Recent technological advances make it possible to capture the miRNAs with their cognate mRNAs at the RISC complex. These experiments have revealed thousands of validated mRNA-miRNA pairing events. In the context of human stem cells, 90% of the identified transcripts appear to be paired with at least two different miRNAs.In this chapter, we present a comprehensive outline for a combinatorial regulation mode by miRNAs. Initially, we summarize the computational and experimental evidence that support a combined effect of multiple miRNAs. Then, we describe miRror2.0, a platform specifically convened to consider the likelihood of miRNAs cooperativity in view of the targets, tissues and cell lines. We show that results from miRror2.0 can be further refined by an iterative procedure, calls Psi-miRror that gauges the robustness of the regulation. We illustrate the combinatorial regulation projected onto graphs of human pathways and show that these pathways are amenable to disruption by a small set of miRNAs. Finally, we propose that miRNA combinatorial regulation is an attractive regulatory strategy not only at the level of single target, but also at the level of pathways and cellular homeostasis. The joint operation of miRNAs is a powerful means to overcome the low specificity inherent in each individual miRNA.
Collapse
Affiliation(s)
- Yitzhak Friedman
- Department of Biological Chemistry, The Hebrew University of Jerusalem, Jerusalem, Israel
| | | | | |
Collapse
|
17
|
Doderer MS, Anguiano Z, Suresh U, Dashnamoorthy R, Bishop AJR, Chen Y. Pathway Distiller - multisource biological pathway consolidation. BMC Genomics 2012; 13 Suppl 6:S18. [PMID: 23134636 PMCID: PMC3481446 DOI: 10.1186/1471-2164-13-s6-s18] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Background One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets. Methods After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment. Results We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods. Conclusions By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments.
Collapse
Affiliation(s)
- Mark S Doderer
- Greehey Children's Cancer Research Institute, The University of Texas Health Science Center at San Antonio, San Antonio, TX, USA
| | | | | | | | | | | |
Collapse
|
18
|
Zhang F, Drabier R. IPAD: the Integrated Pathway Analysis Database for Systematic Enrichment Analysis. BMC Bioinformatics 2012; 13 Suppl 15:S7. [PMID: 23046449 PMCID: PMC3439721 DOI: 10.1186/1471-2105-13-s15-s7] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Background Next-Generation Sequencing (NGS) technologies and Genome-Wide Association Studies (GWAS) generate millions of reads and hundreds of datasets, and there is an urgent need for a better way to accurately interpret and distill such large amounts of data. Extensive pathway and network analysis allow for the discovery of highly significant pathways from a set of disease vs. healthy samples in the NGS and GWAS. Knowledge of activation of these processes will lead to elucidation of the complex biological pathways affected by drug treatment, to patient stratification studies of new and existing drug treatments, and to understanding the underlying anti-cancer drug effects. There are approximately 141 biological human pathway resources as of Jan 2012 according to the Pathguide database. However, most currently available resources do not contain disease, drug or organ specificity information such as disease-pathway, drug-pathway, and organ-pathway associations. Systematically integrating pathway, disease, drug and organ specificity together becomes increasingly crucial for understanding the interrelationships between signaling, metabolic and regulatory pathway, drug action, disease susceptibility, and organ specificity from high-throughput omics data (genomics, transcriptomics, proteomics and metabolomics). Results We designed the Integrated Pathway Analysis Database for Systematic Enrichment Analysis (IPAD, http://bioinfo.hsc.unt.edu/ipad), defining inter-association between pathway, disease, drug and organ specificity, based on six criteria: 1) comprehensive pathway coverage; 2) gene/protein to pathway/disease/drug/organ association; 3) inter-association between pathway, disease, drug, and organ; 4) multiple and quantitative measurement of enrichment and inter-association; 5) assessment of enrichment and inter-association analysis with the context of the existing biological knowledge and a "gold standard" constructed from reputable and reliable sources; and 6) cross-linking of multiple available data sources. IPAD is a comprehensive database covering about 22,498 genes, 25,469 proteins, 1956 pathways, 6704 diseases, 5615 drugs, and 52 organs integrated from databases including the BioCarta, KEGG, NCI-Nature curated, Reactome, CTD, PharmGKB, DrugBank, PharmGKB, and HOMER. The database has a web-based user interface that allows users to perform enrichment analysis from genes/proteins/molecules and inter-association analysis from a pathway, disease, drug, and organ. Moreover, the quality of the database was validated with the context of the existing biological knowledge and a "gold standard" constructed from reputable and reliable sources. Two case studies were also presented to demonstrate: 1) self-validation of enrichment analysis and inter-association analysis on brain-specific markers, and 2) identification of previously undiscovered components by the enrichment analysis from a prostate cancer study. Conclusions IPAD is a new resource for analyzing, identifying, and validating pathway, disease, drug, organ specificity and their inter-associations. The statistical method we developed for enrichment and similarity measurement and the two criteria we described for setting the threshold parameters can be extended to other enrichment applications. Enriched pathways, diseases, drugs, organs and their inter-associations can be searched, displayed, and downloaded from our online user interface. The current IPAD database can help users address a wide range of biological pathway related, disease susceptibility related, drug target related and organ specificity related questions in human disease studies.
Collapse
Affiliation(s)
- Fan Zhang
- Department of Academic and Institutional Resources and Technology, University of North Texas Health Science Center, Fort Worth, USA
| | | |
Collapse
|
19
|
Huang H, Wu X, Sonachalam M, Mandape SN, Pandey R, MacDorman KF, Wan P, Chen JY. PAGED: a pathway and gene-set enrichment database to enable molecular phenotype discoveries. BMC Bioinformatics 2012; 13 Suppl 15:S2. [PMID: 23046413 PMCID: PMC3439733 DOI: 10.1186/1471-2105-13-s15-s2] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Background Over the past decade, pathway and gene-set enrichment analysis has evolved into the study of high-throughput functional genomics. Owing to poorly annotated and incomplete pathway data, researchers have begun to combine pathway and gene-set enrichment analysis as well as network module-based approaches to identify crucial relationships between different molecular mechanisms. Methods To meet the new challenge of molecular phenotype discovery, in this work, we have developed an integrated online database, the Pathway And Gene Enrichment Database (PAGED), to enable comprehensive searches for disease-specific pathways, gene signatures, microRNA targets, and network modules by integrating gene-set-based prior knowledge as molecular patterns from multiple levels: the genome, transcriptome, post-transcriptome, and proteome. Results The online database we developed, PAGED http://bio.informatics.iupui.edu/PAGED is by far the most comprehensive public compilation of gene sets. In its current release, PAGED contains a total of 25,242 gene sets, 61,413 genes, 20 organisms, and 1,275,560 records from five major categories. Beyond its size, the advantage of PAGED lies in the explorations of relationships between gene sets as gene-set association networks (GSANs). Using colorectal cancer expression data analysis as a case study, we demonstrate how to query this database resource to discover crucial pathways, gene signatures, and gene network modules specific to colorectal cancer functional genomics. Conclusions This integrated online database lays a foundation for developing tools beyond third-generation pathway analysis approaches on for discovering molecular phenotypes, especially for disease-associated pathway/gene-set enrichment analysis.
Collapse
Affiliation(s)
- Hui Huang
- School of Informatics, Indiana University, Indianapolis, IN 46202, USA
| | | | | | | | | | | | | | | |
Collapse
|
20
|
Kwofie SK, Schaefer U, Sundararajan VS, Bajic VB, Christoffels A. HCVpro: Hepatitis C virus protein interaction database. INFECTION GENETICS AND EVOLUTION 2011; 11:1971-7. [DOI: 10.1016/j.meegid.2011.09.001] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2011] [Revised: 08/24/2011] [Accepted: 09/02/2011] [Indexed: 02/07/2023]
|
21
|
Yu N, Seo J, Rho K, Jang Y, Park J, Kim WK, Lee S. hiPathDB: a human-integrated pathway database with facile visualization. Nucleic Acids Res 2011; 40:D797-802. [PMID: 22123737 PMCID: PMC3245021 DOI: 10.1093/nar/gkr1127] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
One of the biggest challenges in the study of biological regulatory networks is the systematic organization and integration of complex interactions taking place within various biological pathways. Currently, the information of the biological pathways is dispersed in multiple databases in various formats. hiPathDB is an integrated pathway database that combines the curated human pathway data of NCI-Nature PID, Reactome, BioCarta and KEGG. In total, it includes 1661 pathways consisting of 8976 distinct physical entities. hiPathDB provides two different types of integration. The pathway-level integration, conceptually a simple collection of individual pathways, was achieved by devising an elaborate model that takes distinct features of four databases into account and subsequently reformatting all pathways in accordance with our model. The entity-level integration creates a single unified pathway that encompasses all pathways by merging common components. Even though the detailed molecular-level information such as complex formation or post-translational modifications tends to be lost, such integration makes it possible to investigate signaling network over the entire pathways and allows identification of pathway cross-talks. Another strong merit of hiPathDB is the built-in pathway visualization module that supports explorative studies of complex networks in an interactive fashion. The layout algorithm is optimized for virtually automatic visualization of the pathways. hiPathDB is available at http://hiPathDB.kobic.re.kr.
Collapse
Affiliation(s)
- Namhee Yu
- Korean Bioinformation Center, KRIBB, Daejeon 305-806, Korea
| | | | | | | | | | | | | |
Collapse
|
22
|
Abstract
BACKGROUND Each organ has a specific function in the body. "Organ-specificity" refers to differential expressions of the same gene across different organs. An organ-specific gene/protein is defined as a gene/protein whose expression is significantly elevated in a specific human organ. An "organ-specific marker" is defined as an organ-specific gene/protein that is also implicated in human diseases related to the organ. Previous studies have shown that identifying specificity for the organ in which a gene or protein is significantly differentially expressed, can lead to discovery of its function. Most currently available resources for organ-specific genes/proteins either allow users to access tissue-specific expression over a limited range of organs, or do not contain disease information such as disease-organ relationship and disease-gene relationship. RESULTS We designed an integrated Human Organ-specific Molecular Electronic Repository (HOMER, http://bio.informatics.iupui.edu/homer), defining human organ-specific genes/proteins, based on five criteria: 1) comprehensive organ coverage; 2) gene/protein to disease association; 3) disease-organ association; 4) quantification of organ-specificity; and 5) cross-linking of multiple available data sources.HOMER is a comprehensive database covering about 22,598 proteins, 52 organs, and 4,290 diseases integrated and filtered from organ-specific proteins/genes and disease databases like dbEST, TiSGeD, HPA, CTD, and Disease Ontology. The database has a Web-based user interface that allows users to find organ-specific genes/proteins by gene, protein, organ or disease, to explore the histogram of an organ-specific gene/protein, and to identify disease-related organ-specific genes by browsing the disease data online.Moreover, the quality of the database was validated with comparison to other known databases and two case studies: 1) an association analysis of organ-specific genes with disease and 2) a gene set enrichment analysis of organ-specific gene expression data. CONCLUSIONS HOMER is a new resource for analyzing, identifying, and characterizing organ-specific molecules in association with disease-organ and disease-gene relationships. The statistical method we developed for organ-specific gene identification can be applied to other organism. The current HOMER database can successfully answer a variety of questions related to organ specificity in human diseases and can help researchers in discovering and characterizing organ-specific genes/proteins with disease relevance.
Collapse
Affiliation(s)
- Fan Zhang
- School of Informatics, Indiana University, Indianapolis, IN 46202, USA
| | | |
Collapse
|
23
|
Stobbe MD, Houten SM, Jansen GA, van Kampen AHC, Moerland PD. Critical assessment of human metabolic pathway databases: a stepping stone for future integration. BMC SYSTEMS BIOLOGY 2011; 5:165. [PMID: 21999653 PMCID: PMC3271347 DOI: 10.1186/1752-0509-5-165] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2011] [Accepted: 10/14/2011] [Indexed: 01/17/2023]
Abstract
Background Multiple pathway databases are available that describe the human metabolic network and have proven their usefulness in many applications, ranging from the analysis and interpretation of high-throughput data to their use as a reference repository. However, so far the various human metabolic networks described by these databases have not been systematically compared and contrasted, nor has the extent to which they differ been quantified. For a researcher using these databases for particular analyses of human metabolism, it is crucial to know the extent of the differences in content and their underlying causes. Moreover, the outcomes of such a comparison are important for ongoing integration efforts. Results We compared the genes, EC numbers and reactions of five frequently used human metabolic pathway databases. The overlap is surprisingly low, especially on reaction level, where the databases agree on 3% of the 6968 reactions they have combined. Even for the well-established tricarboxylic acid cycle the databases agree on only 5 out of the 30 reactions in total. We identified the main causes for the lack of overlap. Importantly, the databases are partly complementary. Other explanations include the number of steps a conversion is described in and the number of possible alternative substrates listed. Missing metabolite identifiers and ambiguous names for metabolites also affect the comparison. Conclusions Our results show that each of the five networks compared provides us with a valuable piece of the puzzle of the complete reconstruction of the human metabolic network. To enable integration of the networks, next to a need for standardizing the metabolite names and identifiers, the conceptual differences between the databases should be resolved. Considerable manual intervention is required to reach the ultimate goal of a unified and biologically accurate model for studying the systems biology of human metabolism. Our comparison provides a stepping stone for such an endeavor.
Collapse
Affiliation(s)
- Miranda D Stobbe
- Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, 1100 DE, Amsterdam, the Netherlands
| | | | | | | | | |
Collapse
|
24
|
Zhang GL, DeLuca DS, Brusic V. Database resources for proteomics-based analysis of cancer. Methods Mol Biol 2011; 723:349-64. [PMID: 21370076 DOI: 10.1007/978-1-61779-043-0_22] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Biological/bioinformatics databases are essential for medical and biological studies. They integrate and organize biologically related information in a structured format and provide researchers with easy access to a variety of relevant data. This review presents an overview of publicly available databases relevant to proteomics studies in cancer research. They include gene/protein expression databases, gene mutation and single nucleotide polymorphisms databases, tumor antigen databases, protein-protein interaction, and biological pathway databases. Automated information retrieval from these databases enables efficient large-scale proteomics data analysis.
Collapse
Affiliation(s)
- Guang Lan Zhang
- Cancer Vaccine Center, Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | | | | |
Collapse
|
25
|
Chowbina S, Deng Y, Ai J, Wu X, Guan X, Wilbanks MS, Escalon BL, Meyer SA, Perkins EJ, Chen JY. A new approach to construct pathway connected networks and its application in dose responsive gene expression profiles of rat liver regulated by 2,4DNT. BMC Genomics 2010; 11 Suppl 3:S4. [PMID: 21143786 PMCID: PMC2999349 DOI: 10.1186/1471-2164-11-s3-s4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Abstract Background Military and industrial activities have lead to reported release of 2,4-dinitrotoluene (2,4DNT) into soil, groundwater or surface water. It has been reported that 2,4DNT can induce toxic effects on humans and other organisms. However the mechanism of 2,4DNT induced toxicity is still unclear. Although a series of methods for gene network construction have been developed, few instances of applying such technology to generate pathway connected networks have been reported. Results Microarray analyses were conducted using liver tissue of rats collected 24h after exposure to a single oral gavage with one of five concentrations of 2,4DNT. We observed a strong dose response of differentially expressed genes after 2,4DNT treatment. The most affected pathways included: long term depression, breast cancer regulation by stathmin1, WNT Signaling; and PI3K signaling pathways. In addition, we propose a new approach to construct pathway connected networks regulated by 2,4DNT. We also observed clear dose response pathway networks regulated by 2,4DNT. Conclusions We developed a new method for constructing pathway connected networks. This new method was successfully applied to microarray data from liver tissue of 2,4DNT exposed animals and resulted in the identification of unique dose responsive biomarkers in regards to affected pathways.
Collapse
Affiliation(s)
- Sudhir Chowbina
- Indiana University School of Informatics, Indianapolis, IN 46202, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Zhang F, Chen JY. Discovery of pathway biomarkers from coupled proteomics and systems biology methods. BMC Genomics 2010; 11 Suppl 2:S12. [PMID: 21047379 PMCID: PMC2975409 DOI: 10.1186/1471-2164-11-s2-s12] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Breast cancer is worldwide the second most common type of cancer after lung cancer. Plasma proteome profiling may have a higher chance to identify protein changes between plasma samples such as normal and breast cancer tissues. Breast cancer cell lines have long been used by researches as model system for identifying protein biomarkers. A comparison of the set of proteins which change in plasma with previously published findings from proteomic analysis of human breast cancer cell lines may identify with a higher confidence a subset of candidate protein biomarker. Results In this study, we analyzed a liquid chromatography (LC) coupled tandem mass spectrometry (MS/MS) proteomics dataset from plasma samples of 40 healthy women and 40 women diagnosed with breast cancer. Using a two-sample t-statistics and permutation procedure, we identified 254 statistically significant, differentially expressed proteins, among which 208 are over-expressed and 46 are under-expressed in breast cancer plasma. We validated this result against previously published proteomic results of human breast cancer cell lines and signaling pathways to derive 25 candidate protein biomarkers in a panel. Using the pathway analysis, we observed that the 25 “activated” plasma proteins were present in several cancer pathways, including ‘Complement and coagulation cascades’, ‘Regulation of actin cytoskeleton’, and ‘Focal adhesion’, and match well with previously reported studies. Additional gene ontology analysis of the 25 proteins also showed that cellular metabolic process and response to external stimulus (especially proteolysis and acute inflammatory response) were enriched functional annotations of the proteins identified in the breast cancer plasma samples. By cross-validation using two additional proteomics studies, we obtained 86% and 83% similarities in pathway-protein matrix between the first study and the two testing studies, which is much better than the similarity we measured with proteins. Conclusions We presented a ‘systems biology’ method to identify, characterize, analyze and validate panel biomarkers in breast cancer proteomics data, which includes 1) t statistics and permutation process, 2) network, pathway and function annotation analysis, and 3) cross-validation of multiple studies. Our results showed that the systems biology approach is essential to the understanding molecular mechanisms of panel protein biomarkers.
Collapse
Affiliation(s)
- Fan Zhang
- Indiana University School of Informatics, Indianapolis, IN 46202, USA.
| | | |
Collapse
|
27
|
Zhou A, Zhang F, Chen JY. PEPPI: a peptidomic database of human protein isoforms for proteomics experiments. BMC Bioinformatics 2010; 11 Suppl 6:S7. [PMID: 20946618 PMCID: PMC3026381 DOI: 10.1186/1471-2105-11-s6-s7] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Abstract
Collapse
Affiliation(s)
- Ao Zhou
- School of Informatics, Indiana University, Indianapolis, IN 46202, USA
| | | | | |
Collapse
|
28
|
Wren JD, Kupfer DM, Perkins EJ, Bridges S, Berleant D. Proceedings of the 2010 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference. BMC Bioinformatics 2010; 11 Suppl 6:S1. [PMID: 20946592 PMCID: PMC3026356 DOI: 10.1186/1471-2105-11-s6-s1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
29
|
Li HM, Sun L, Mittapalli O, Muir WM, Xie J, Wu J, Schemerhorn BJ, Jannasch A, Chen JY, Zhang F, Adamec J, Murdock LL, Pittendrigh BR. Bowman-Birk inhibitor affects pathways associated with energy metabolism in Drosophila melanogaster. INSECT MOLECULAR BIOLOGY 2010; 19:303-313. [PMID: 20113373 DOI: 10.1111/j.1365-2583.2009.00984.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Bowman-Birk inhibitor (BBI) is toxic when fed to certain insects, including the fruit fly, Drosophila melanogaster. Dietary BBI has been demonstrated to slow growth and increase insect mortality by inhibiting the digestive enzymes trypsin and chymotrypsin, resulting in a reduced supply of amino acids. In mammals, BBI influences cellular energy metabolism. Therefore, we tested the hypothesis that dietary BBI affects energy-associated pathways in the D. melanogaster midgut. Through microarray and metabolomic analyses, we show that dietary BBI affects energy utilization pathways in the midgut cells of D. melanogaster. In addition, ultrastructure studies indicate that microvilli are significantly shortened in BBI-fed larvae. These data provide further insights into the complex cellular response of insects to dietary protease inhibitors.
Collapse
Affiliation(s)
- H-M Li
- Department of Entomology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Naylor S, Chen JY. Unraveling human complexity and disease with systems biology and personalized medicine. Per Med 2010; 7:275-289. [PMID: 20577569 PMCID: PMC2888109 DOI: 10.2217/pme.10.16] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
We are all perplexed that current medical practice often appears maladroit in curing our individual illnesses or disease. However, as is often the case, a lack of understanding, tools and technologies are the root cause of such situations. Human individuality is an often-quoted term but, in the context of human biology, it is poorly understood. This is compounded when there is a need to consider the variability of human populations. In the case of the former, it is possible to quantify human complexity as determined by the 35,000 genes of the human genome, the 1-10 million proteins (including antibodies) and the 2000-3000 metabolites of the human metabolome. Human variability is much more difficult to assess, since many of the variables, such as the definition of race, are not even clearly agreed on. In order to accommodate human complexity, variability and its influence on health and disease, it is necessary to undertake a systematic approach. In the past decade, the emergence of analytical platforms and bioinformatics tools has led to the development of systems biology. Such an approach offers enormous potential in defining key pathways and networks involved in optimal human health, as well as disease onset, progression and treatment. The tools and technologies now available in systems biology analyses offer exciting opportunities to exploit the emerging areas of personalized medicine. In this article, we discuss the current status of human complexity, and how systems biology and personalized medicine can impact at the individual and population level.
Collapse
Affiliation(s)
- Stephen Naylor
- Predictive Physiology & Medicine (PPM) Inc., 409 Patterson Street, Bloomington, IN 47403, USA
| | - Jake Y Chen
- School of Informatics, Indiana University, Indianapolis, IN 46202, USA
- Indiana Center for Systems Biology & Personalized Medicine, IN 46202, USA
- Department of Computer & Information Science, School of Science, Purdue University, Indianapolis, IN 46202, USA
| |
Collapse
|
31
|
Huan T, Wu X, Chen JY. Systems biology visualization tools for drug target discovery. Expert Opin Drug Discov 2010; 5:425-39. [DOI: 10.1517/17460441003725102] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
32
|
Wren JD, Gusev Y, Isokpehi RD, Berleant D, Braga-Neto U, Wilkins D, Bridges S. Proceedings of the 2009 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference. BMC Bioinformatics 2009; 10 Suppl 11:S1. [PMID: 19811674 PMCID: PMC3313274 DOI: 10.1186/1471-2105-10-s11-s1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|