1
|
Smell Detection Agent Optimisation Framework and Systems Biology Approach to Detect Dys-Regulated Subnetwork in Cancer Data. Biomolecules 2021; 12:biom12010037. [PMID: 35053185 PMCID: PMC8774275 DOI: 10.3390/biom12010037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Revised: 12/01/2021] [Accepted: 12/02/2021] [Indexed: 11/23/2022] Open
Abstract
Network biology has become a key tool in unravelling the mechanisms of complex diseases. Detecting dys-regulated subnetworks from molecular networks is a task that needs efficient computational methods. In this work, we constructed an integrated network using gene interaction data as well as protein–protein interaction data of differentially expressed genes derived from the microarray gene expression data. We considered the level of differential expression as well as the topological weight of proteins in interaction network to quantify dys-regulation. Then, a nature-inspired Smell Detection Agent (SDA) optimisation algorithm is designed with multiple agents traversing through various paths in the network. Finally, the algorithm provides a maximum weighted module as the optimum dys-regulated subnetwork. The analysis is performed for samples of triple-negative breast cancer as well as colorectal cancer. Biological significance analysis of module genes is also done to validate the results. The breast cancer subnetwork is found to contain (i) valid biomarkers including PIK3CA, PTEN, BRCA1, AR and EGFR; (ii) validated drug targets TOP2A, CDK4, HDAC1, IL6, BRCA1, HSP90AA1 and AR; (iii) synergistic drug targets EGFR and BIRC5. Moreover, based on the weight values assigned to nodes in the subnetwork, PLK1, CTNNB1, IGF1, AURKA, PCNA, HSPA4 and GAPDH are proposed as drug targets for further studies. For colorectal cancer module, the analysis revealed the occurrence of approved drug targets TYMS, TOP1, BRAF and EGFR. Considering the higher weight values, HSP90AA1, CCNB1, AKT1 and CXCL8 are proposed as drug targets for experimentation. The derived subnetworks possess cancer-related pathways as well. The SDA-derived breast cancer subnetwork is compared with that of tools such as MCODE and Minimum Spanning Tree, and observed a higher enrichment (75%) of significant elements. Thus, the proposed nature-inspired algorithm is a novel approach to derive the optimum dys-regulated subnetwork from huge molecular network.
Collapse
|
2
|
|
3
|
Jung S. KEDDY: a knowledge-based statistical gene set test method to detect differential functional protein-protein interactions. Bioinformatics 2019; 35:619-627. [PMID: 30101275 DOI: 10.1093/bioinformatics/bty686] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Revised: 07/18/2018] [Accepted: 08/06/2018] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Identifying differential patterns between conditions is a popular approach to understanding the discrepancy between different biological contexts. Although many statistical tests were proposed for identifying gene sets with differential patterns based on different definitions of differentiality, few methods were suggested to identify gene sets with differential functional protein networks due to computational complexity. RESULTS We propose a method of Knowledge-based Evaluation of Dependency DifferentialitY (KEDDY), which is a statistical test for differential functional protein networks of a set of genes between two conditions with utilizing known functional protein-protein interaction information. Unlike other approaches focused on differential expressions of individual genes or differentiality of individual interactions, KEDDY compares two conditions by evaluating the probability distributions of functional protein networks based on known functional protein-protein interactions. The method has been evaluated and compared with previous methods through simulation studies, where KEDDY achieves significantly improved performance in accuracy and speed than the previous method that does not use prior knowledge and better performance in identifying gene sets with differential interactions than other methods evaluating changes in gene expressions. Applications to cancer data sets show that KEDDY identifies alternative cancer subtype-related differential gene sets compared to other differential expression-based methods, and the results also provide detailed gene regulatory information that drives the differentiality of the gene sets. AVAILABILITY AND IMPLEMENTATION The Java implementation of KEDDY is freely available to non-commercial users at https://sites.google.com/site/sjunggsm/keddy. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sungwon Jung
- Department of Genome Medicine and Science, Gachon University College of Medicine, Incheon, Republic of Korea.,Gachon Institute of Genome Medicine and Science, Gachon University Gil Medical Center, Incheon, Republic of Korea
| |
Collapse
|
4
|
Farahmand S, Foroughmand-Araabi MH, Goliaei S, Razaghi-Moghadam Z. CytoGTA: A cytoscape plugin for identifying discriminative subnetwork markers using a game theoretic approach. PLoS One 2017; 12:e0185016. [PMID: 28968407 PMCID: PMC5624584 DOI: 10.1371/journal.pone.0185016] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Accepted: 09/04/2017] [Indexed: 01/07/2023] Open
Abstract
In recent years, analyzing genome-wide expression profiles to find genetic markers has received much attention as a challenging field of research aiming at unveiling biological mechanisms behind complex disorders. The identification of reliable and reproducible markers has lately been achieved by integrating genome-scale functional relationships and transcriptome datasets, and a number of algorithms have been developed to support this strategy. In this paper, we present a promising and easily applicable tool to accomplish this goal, namely CytoGTA, which is a Cytoscape plug-in that relies on an optimistic game theoretic approach (GTA) for identifying subnetwork markers. Given transcriptomic data of two phenotype classes and interactome data, this plug-in offers discriminative markers for the two classes. The high performance of CytoGTA would not have been achieved if the strategy of GTA was not implemented in Cytoscape. This plug-in provides a simple-to-use platform, convenient for biological researchers to interactively work with and visualize the structure of subnetwork markers. CytoGTA is one of the few available Cytoscape plug-ins for marker identification, which shows superior performance to existing methods.
Collapse
Affiliation(s)
- S. Farahmand
- Faculty of New Sciences and Technologies, University of Tehran, Tehran, Iran
- College of Science and Mathematics, University of Massachusetts Boston, Boston, Massachusetts, United States of America
| | | | | | - Z. Razaghi-Moghadam
- Faculty of New Sciences and Technologies, University of Tehran, Tehran, Iran
- * E-mail:
| |
Collapse
|
5
|
Prior knowledge guided active modules identification: an integrated multi-objective approach. BMC SYSTEMS BIOLOGY 2017; 11:8. [PMID: 28361699 PMCID: PMC5374590 DOI: 10.1186/s12918-017-0388-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
BACKGROUND Active module, defined as an area in biological network that shows striking changes in molecular activity or phenotypic signatures, is important to reveal dynamic and process-specific information that is correlated with cellular or disease states. METHODS A prior information guided active module identification approach is proposed to detect modules that are both active and enriched by prior knowledge. We formulate the active module identification problem as a multi-objective optimisation problem, which consists two conflicting objective functions of maximising the coverage of known biological pathways and the activity of the active module simultaneously. Network is constructed from protein-protein interaction database. A beta-uniform-mixture model is used to estimate the distribution of p-values and generate scores for activity measurement from microarray data. A multi-objective evolutionary algorithm is used to search for Pareto optimal solutions. We also incorporate a novel constraints based on algebraic connectivity to ensure the connectedness of the identified active modules. RESULTS Application of proposed algorithm on a small yeast molecular network shows that it can identify modules with high activities and with more cross-talk nodes between related functional groups. The Pareto solutions generated by the algorithm provides solutions with different trade-off between prior knowledge and novel information from data. The approach is then applied on microarray data from diclofenac-treated yeast cells to build network and identify modules to elucidate the molecular mechanisms of diclofenac toxicity and resistance. Gene ontology analysis is applied to the identified modules for biological interpretation. CONCLUSIONS Integrating knowledge of functional groups into the identification of active module is an effective method and provides a flexible control of balance between pure data-driven method and prior information guidance.
Collapse
|
6
|
Speyer G, Mahendra D, Tran HJ, Kiefer J, Schreiber SL, Clemons PA, Dhruv H, Berens M, Kim S. DIFFERENTIAL PATHWAY DEPENDENCY DISCOVERY ASSOCIATED WITH DRUG RESPONSE ACROSS CANCER CELL LINES. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017; 22:497-508. [PMID: 27897001 PMCID: PMC5180601 DOI: 10.1142/9789813207813_0046] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The effort to personalize treatment plans for cancer patients involves the identification of drug treatments that can effectively target the disease while minimizing the likelihood of adverse reactions. In this study, the gene-expression profile of 810 cancer cell lines and their response data to 368 small molecules from the Cancer Therapeutics Research Portal (CTRP) are analyzed to identify pathways with significant rewiring between genes, or differential gene dependency, between sensitive and non-sensitive cell lines. Identified pathways and their corresponding differential dependency networks are further analyzed to discover essentiality and specificity mediators of cell line response to drugs/compounds. For analysis we use the previously published method EDDY (Evaluation of Differential DependencY). EDDY first constructs likelihood distributions of gene-dependency networks, aided by known genegene interaction, for two given conditions, for example, sensitive cell lines vs. non-sensitive cell lines. These sets of networks yield a divergence value between two distributions of network likelihoods that can be assessed for significance using permutation tests. Resulting differential dependency networks are then further analyzed to identify genes, termed mediators, which may play important roles in biological signaling in certain cell lines that are sensitive or non-sensitive to the drugs. Establishing statistical correspondence between compounds and mediators can improve understanding of known gene dependencies associated with drug response while also discovering new dependencies. Millions of compute hours resulted in thousands of these statistical discoveries. EDDY identified 8,811 statistically significant pathways leading to 26,822 compound-pathway-mediator triplets. By incorporating STITCH and STRING databases, we could construct evidence networks for 14,415 compound-pathway-mediator triplets for support. The results of this analysis are presented in a searchable website to aid researchers in studying potential molecular mechanisms underlying cells' drug response as well as in designing experiments for the purpose of personalized treatment regimens.
Collapse
Affiliation(s)
- Gil Speyer
- The Translational Genomics Research Institute, Phoenix, AZ 85004, U.S.A.,
| | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Doungpan N, Engchuan W, Chan JH, Meechai A. GSNFS: Gene subnetwork biomarker identification of lung cancer expression data. BMC Med Genomics 2016; 9:70. [PMID: 28117655 PMCID: PMC5260788 DOI: 10.1186/s12920-016-0231-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Gene expression has been used to identify disease gene biomarkers, but there are ongoing challenges. Single gene or gene-set biomarkers are inadequate to provide sufficient understanding of complex disease mechanisms and the relationship among those genes. Network-based methods have thus been considered for inferring the interaction within a group of genes to further study the disease mechanism. Recently, the Gene-Network-based Feature Set (GNFS), which is capable of handling case-control and multiclass expression for gene biomarker identification, has been proposed, partly taking into account of network topology. However, its performance relies on a greedy search for building subnetworks and thus requires further improvement. In this work, we establish a new approach named Gene Sub-Network-based Feature Selection (GSNFS) by implementing the GNFS framework with two proposed searching and scoring algorithms, namely gene-set-based (GS) search and parent-node-based (PN) search, to identify subnetworks. An additional dataset is used to validate the results. Methods The two proposed searching algorithms of the GSNFS method for subnetwork expansion are concerned with the degree of connectivity and the scoring scheme for building subnetworks and their topology. For each iteration of expansion, the neighbour genes of a current subnetwork, whose expression data improved the overall subnetwork score, is recruited. While the GS search calculated the subnetwork score using an activity score of a current subnetwork and the gene expression values of its neighbours, the PN search uses the expression value of the corresponding parent of each neighbour gene. Four lung cancer expression datasets were used for subnetwork identification. In addition, using pathway data and protein-protein interaction as network data in order to consider the interaction among significant genes were discussed. Classification was performed to compare the performance of the identified gene subnetworks with three subnetwork identification algorithms. Results The two searching algorithms resulted in better classification and gene/gene-set agreement compared to the original greedy search of the GNFS method. The identified lung cancer subnetwork using the proposed searching algorithm resulted in an improvement of the cross-dataset validation and an increase in the consistency of findings between two independent datasets. The homogeneity measurement of the datasets was conducted to assess dataset compatibility in cross-dataset validation. The lung cancer dataset with higher homogeneity showed a better result when using the GS search while the dataset with low homogeneity showed a better result when using the PN search. The 10-fold cross-dataset validation on the independent lung cancer datasets showed higher classification performance of the proposed algorithms when compared with the greedy search in the original GNFS method. Conclusions The proposed searching algorithms provide a higher number of genes in the subnetwork expansion step than the greedy algorithm. As a result, the performance of the subnetworks identified from the GSNFS method was improved in terms of classification performance and gene/gene-set level agreement depending on the homogeneity of the datasets used in the analysis. Some common genes obtained from the four datasets using different searching algorithms are genes known to play a role in lung cancer. The improvement of classification performance and the gene/gene-set level agreement, and the biological relevance indicated the effectiveness of the GSNFS method for gene subnetwork identification using expression data.
Collapse
Affiliation(s)
- Narumol Doungpan
- Biological Engineering Program, Faculty of Engineering, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
| | - Worrawat Engchuan
- The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada
| | - Jonathan H Chan
- Data Science and Engineering Laboratory, School of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
| | - Asawin Meechai
- Department of Chemical Engineering, Faculty of Engineering, King Mongkut's University of Technology Thonburi, Bangkok, Thailand.
| |
Collapse
|
8
|
A comprehensive assessment of networks and pathways of hypoxia-associated proteins and identification of responsive protein modules. ACTA ACUST UNITED AC 2016. [DOI: 10.1007/s13721-016-0123-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
9
|
Farahmand S, Goliaei S, Ansari-Pour N, Razaghi-Moghadam Z. GTA: a game theoretic approach to identifying cancer subnetwork markers. MOLECULAR BIOSYSTEMS 2016; 12:818-25. [DOI: 10.1039/c5mb00684h] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
The identification of genetic markers (e.g. genes, pathways and subnetworks) for cancer has been one of the most challenging research areas in recent years.
Collapse
Affiliation(s)
- S. Farahmand
- Research Laboratory of Computational Biology
- Faculty of New Sciences and Technology
- University of Tehran
- Tehran
- Iran
| | - S. Goliaei
- Research Laboratory of Computational Biology
- Faculty of New Sciences and Technology
- University of Tehran
- Tehran
- Iran
| | - N. Ansari-Pour
- Faculty of New Sciences and Technology
- University of Tehran
- Tehran
- Iran
- School of Biological Sciences
| | - Z. Razaghi-Moghadam
- Faculty of New Sciences and Technology
- University of Tehran
- Tehran
- Iran
- School of Biological Sciences
| |
Collapse
|
10
|
Speyer G, Kiefer J, Dhruv H, Berens M, Kim S. KNOWLEDGE-ASSISTED APPROACH TO IDENTIFY PATHWAYS WITH DIFFERENTIAL DEPENDENCIES. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 21:33-44. [PMID: 26776171 PMCID: PMC4721243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We have previously developed a statistical method to identify gene sets enriched with condition-specific genetic dependencies. The method constructs gene dependency networks from bootstrapped samples in one condition and computes the divergence between distributions of network likelihood scores from different conditions. It was shown to be capable of sensitive and specific identification of pathways with phenotype-specific dysregulation, i.e., rewiring of dependencies between genes in different conditions. We now present an extension of the method by incorporating prior knowledge into the inference of networks. The degree of prior knowledge incorporation has substantial effect on the sensitivity of the method, as the data is the source of condition specificity while prior knowledge incorporation can provide additional support for dependencies that are only partially supported by the data. Use of prior knowledge also significantly improved the interpretability of the results. Further analysis of topological characteristics of gene differential dependency networks provides a new approach to identify genes that could play important roles in biological signaling in a specific condition, hence, promising targets customized to a specific condition. Through analysis of TCGA glioblastoma multiforme data, we demonstrate the method can identify not only potentially promising targets but also underlying biology for new targets.
Collapse
Affiliation(s)
- Gil Speyer
- Integrated Cancer Genomics Division, The Translational Genomics
Research Institute, Phoenix, AZ 85004, U.S.A
| | - Jeff Kiefer
- Integrated Cancer Genomics Division, The Translational Genomics
Research Institute, Phoenix, AZ 85004, U.S.A
| | - Harshil Dhruv
- Cancer Cell Biology Division, The Translational Genomics Research
Institute, Phoenix, AZ 85004, U.S.A
| | - Michael Berens
- Cancer Cell Biology Division, The Translational Genomics Research
Institute, Phoenix, AZ 85004, U.S.A
| | - Seungchan Kim
- Integrated Cancer Genomics Division, The Translational Genomics
Research Institute, Phoenix, AZ 85004, U.S.A
| |
Collapse
|
11
|
Zhang Y, Liu ZL, Song M. ChiNet uncovers rewired transcription subnetworks in tolerant yeast for advanced biofuels conversion. Nucleic Acids Res 2015; 43:4393-407. [PMID: 25897127 PMCID: PMC4482087 DOI: 10.1093/nar/gkv358] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Accepted: 04/06/2015] [Indexed: 12/14/2022] Open
Abstract
Analysis of rewired upstream subnetworks impacting downstream differential gene expression aids the delineation of evolving molecular mechanisms. Cumulative statistics based on conventional differential correlation are limited for subnetwork rewiring analysis since rewiring is not necessarily equivalent to change in correlation coefficients. Here we present a computational method ChiNet to quantify subnetwork rewiring by statistical heterogeneity that enables detection of potential genotype changes causing altered transcription regulation in evolving organisms. Given a differentially expressed downstream gene set, ChiNet backtracks a rewired upstream subnetwork from a super-network including gene interactions known to occur under various molecular contexts. We benchmarked ChiNet for its high accuracy in distinguishing rewired artificial subnetworks, in silico yeast transcription-metabolic subnetworks, and rewired transcription subnetworks for Candida albicans versus Saccharomyces cerevisiae, against two differential-correlation based subnetwork rewiring approaches. Then, using transcriptome data from tolerant S. cerevisiae strain NRRL Y-50049 and a wild-type intolerant strain, ChiNet identified 44 metabolic pathways affected by rewired transcription subnetworks anchored to major adaptively activated transcription factor genes YAP1, RPN4, SFP1 and ROX1, in response to toxic chemical challenges involved in lignocellulose-to-biofuels conversion. These findings support the use of ChiNet in rewiring analysis of subnetworks where differential interaction patterns resulting from divergent nonlinear dynamics abound.
Collapse
Affiliation(s)
- Yang Zhang
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA
| | - Z Lewis Liu
- National Center for Agricultural Utilization Research, Agricultural Research Service, U.S. Department of Agriculture, Peoria, IL 61604, USA
| | - Mingzhou Song
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA
| |
Collapse
|
12
|
Chen H, Zhu Z, Zhu Y, Wang J, Mei Y, Cheng Y. Pathway mapping and development of disease-specific biomarkers: protein-based network biomarkers. J Cell Mol Med 2015; 19:297-314. [PMID: 25560835 PMCID: PMC4407592 DOI: 10.1111/jcmm.12447] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Accepted: 08/22/2014] [Indexed: 01/06/2023] Open
Abstract
It is known that a disease is rarely a consequence of an abnormality of a single gene, but reflects the interactions of various processes in a complex network. Annotated molecular networks offer new opportunities to understand diseases within a systems biology framework and provide an excellent substrate for network-based identification of biomarkers. The network biomarkers and dynamic network biomarkers (DNBs) represent new types of biomarkers with protein-protein or gene-gene interactions that can be monitored and evaluated at different stages and time-points during development of disease. Clinical bioinformatics as a new way to combine clinical measurements and signs with human tissue-generated bioinformatics is crucial to translate biomarkers into clinical application, validate the disease specificity, and understand the role of biomarkers in clinical settings. In this article, the recent advances and developments on network biomarkers and DNBs are comprehensively reviewed. How network biomarkers help a better understanding of molecular mechanism of diseases, the advantages and constraints of network biomarkers for clinical application, clinical bioinformatics as a bridge to the development of diseases-specific, stage-specific, severity-specific and therapy predictive biomarkers, and the potentials of network biomarkers are also discussed.
Collapse
Affiliation(s)
- Hao Chen
- Department of Cardiothoracic Surgery, Tongji Hospital, Tongji University, Shanghai, China
| | | | | | | | | | | |
Collapse
|
13
|
Identification of phenotype deterministic genes using systemic analysis of transcriptional response. Sci Rep 2014; 4:4413. [PMID: 24642983 PMCID: PMC3958917 DOI: 10.1038/srep04413] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2013] [Accepted: 03/03/2014] [Indexed: 11/09/2022] Open
Abstract
Systemic identification of deterministic genes for different phenotypes is a primary application of high-throughput expression profiles. However, gene expression differences cannot be used when the differences between groups are not significant. Therefore, novel methods incorporating features other than expression differences are required. We developed a promising method using transcriptional response as an operational feature, which is quantified as the correlation between expression levels of pathway genes and target genes of the pathway. We applied this method to identify causative genes associated with chemo-sensitivity to tamoxifen and epirubicin. Genes whose transcriptional response was dysregulated only in the drug-resistant patient group were chosen for in vitro validation in human breast cancer cells. Finally, we discovered two genes responsible for tamoxifen sensitivity and three genes associated with epirubicin sensitivity. The method we propose here can be widely applied to identify deterministic genes for different phenotypes with only minor differences in gene expression levels.
Collapse
|
14
|
Jung S, Kim S. EDDY: a novel statistical gene set test method to detect differential genetic dependencies. Nucleic Acids Res 2014; 42:e60. [PMID: 24500204 PMCID: PMC3985670 DOI: 10.1093/nar/gku099] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Identifying differential features between conditions is a popular approach to understanding molecular features and their mechanisms underlying a biological process of particular interest. Although many tests for identifying differential expression of gene or gene sets have been proposed, there was limited success in developing methods for differential interactions of genes between conditions because of its computational complexity. We present a method for Evaluation of Dependency DifferentialitY (EDDY), which is a statistical test for differential dependencies of a set of genes between two conditions. Unlike previous methods focused on differential expression of individual genes or correlation changes of individual gene–gene interactions, EDDY compares two conditions by evaluating the probability distributions of dependency networks from genes. The method has been evaluated and compared with other methods through simulation studies, and application to glioblastoma multiforme data resulted in informative cancer and glioblastoma multiforme subtype-related findings. The comparison with Gene Set Enrichment Analysis, a differential expression-based method, revealed that EDDY identifies the gene sets that are complementary to those identified by Gene Set Enrichment Analysis. EDDY also showed much lower false positives than Gene Set Co-expression Analysis, a method based on correlation changes of individual gene–gene interactions, thus providing more informative results. The Java implementation of the algorithm is freely available to noncommercial users. Download from: http://biocomputing.tgen.org/software/EDDY.
Collapse
Affiliation(s)
- Sungwon Jung
- Integrated Cancer Genomics Division, Biocomputing Unit, Translational Genomics Research Institute, 445 North 5th Street, Phoenix, AZ 85004, USA
| | | |
Collapse
|
15
|
Integrative approaches for finding modular structure in biological networks. Nat Rev Genet 2013; 14:719-32. [PMID: 24045689 DOI: 10.1038/nrg3552] [Citation(s) in RCA: 351] [Impact Index Per Article: 31.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
A central goal of systems biology is to elucidate the structural and functional architecture of the cell. To this end, large and complex networks of molecular interactions are being rapidly generated for humans and model organisms. A recent focus of bioinformatics research has been to integrate these networks with each other and with diverse molecular profiles to identify sets of molecules and interactions that participate in a common biological function - that is, 'modules'. Here, we classify such integrative approaches into four broad categories, describe their bioinformatic principles and review their applications.
Collapse
|
16
|
Peng CH, Jiang YZ, Tai AS, Liu CB, Peng SC, Liao CT, Yen TC, Hsieh WP. Causal inference of gene regulation with subnetwork assembly from genetical genomics data. Nucleic Acids Res 2013; 42:2803-19. [PMID: 24322297 PMCID: PMC3950678 DOI: 10.1093/nar/gkt1277] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Deciphering the causal networks of gene interactions is critical for identifying disease pathways and disease-causing genes. We introduce a method to reconstruct causal networks based on exploring phenotype-specific modules in the human interactome and including the expression quantitative trait loci (eQTLs) that underlie the joint expression variation of each module. Closely associated eQTLs help anchor the orientation of the network. To overcome the inherent computational complexity of causal network reconstruction, we first deduce the local causality of individual subnetworks using the selected eQTLs and module transcripts. These subnetworks are then integrated to infer a global causal network using a random-field ranking method, which was motivated by animal sociology. We demonstrate how effectively the inferred causality restores the regulatory structure of the networks that mediate lymph node metastasis in oral cancer. Network rewiring clearly characterizes the dynamic regulatory systems of distinct disease states. This study is the first to associate an RXRB-causal network with increased risks of nodal metastasis, tumor relapse, distant metastases and poor survival for oral cancer. Thus, identifying crucial upstream drivers of a signal cascade can facilitate the discovery of potential biomarkers and effective therapeutic targets.
Collapse
Affiliation(s)
- Chien-Hua Peng
- Departments of Resource Center for Clinical Research, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan, Republic of China, Institute of Statistics, National Tsing Hua University, Hsinchu 30013, Taiwan, Republic of China, Nuclear Medicine and Molecular Imaging Center, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan, Republic of China and Department of Otorhinolaryngology, Head and Neck Surgery, Chang Gung Memorial Hospital, Taoyuan 33305, Taiwan, Republic of China
| | | | | | | | | | | | | | | |
Collapse
|
17
|
Chung FH, Lee HHC, Lee HC. ToP: a trend-of-disease-progression procedure works well for identifying cancer genes from multi-state cohort gene expression data for human colorectal cancer. PLoS One 2013; 8:e65683. [PMID: 23799036 PMCID: PMC3683052 DOI: 10.1371/journal.pone.0065683] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Accepted: 04/26/2013] [Indexed: 12/22/2022] Open
Abstract
Significantly expressed genes extracted from microarray gene expression data have proved very useful for identifying genetic biomarkers of diseases, including cancer. However, deriving a disease related inference from a list of differentially expressed genes has proven less than straightforward. In a systems disease such as cancer, how genes interact with each other should matter just as much as the level of gene expression. Here, in a novel approach, we used the network and disease progression properties of individual genes in state-specific gene-gene interaction networks (GGINs) to select cancer genes for human colorectal cancer (CRC) and obtain a much higher hit rate of known cancer genes when compared with methods not based on network theory. We constructed GGINs by integrating gene expression microarray data from multiple states--healthy control (Nor), adenoma (Ade), inflammatory bowel disease (IBD) and CRC--with protein-protein interaction database and Gene Ontology. We tracked changes in the network degrees and clustering coefficients of individual genes in the GGINs as the disease state changed from one to another. From these we inferred the state sequences Nor-Ade-CRC and Nor-IBD-CRC both exhibited a trend of (disease) progression (ToP) toward CRC, and devised a ToP procedure for selecting cancer genes for CRC. Of the 141 candidates selected using ToP, ∼50% had literature support as cancer genes, compared to hit rates of 20% to 30% for standard methods using only gene expression data. Among the 16 candidate cancer genes that encoded transcription factors, 13 were known to be tumorigenic and three were novel: CDK1, SNRPF, and ILF2. We identified 13 of the 141 predicted cancer genes as candidate markers for early detection of CRC, 11 and 2 at the Ade and IBD states, respectively.
Collapse
Affiliation(s)
- Feng-Hsiang Chung
- Institute of Systems Biology and Bioinformatics, National Central University, Zhongli, Taiwan
- Center for Dynamical Biomarkers and Translational Medicine, National Central University, Zhongli, Taiwan
- * E-mail: (HCL); (FHC)
| | - Henry Hsin-Chung Lee
- Institute of Systems Biology and Bioinformatics, National Central University, Zhongli, Taiwan
- Cathay Medical Research Institute, Cathay General Hospital, Taipei, Taiwan
| | - Hoong-Chien Lee
- Institute of Systems Biology and Bioinformatics, National Central University, Zhongli, Taiwan
- Cathay Medical Research Institute, Cathay General Hospital, Taipei, Taiwan
- * E-mail: (HCL); (FHC)
| |
Collapse
|
18
|
Zybailov BL, Glazko GV, Jaiswal M, Raney KD. Large Scale Chemical Cross-linking Mass Spectrometry Perspectives. ACTA ACUST UNITED AC 2013; 6:001. [PMID: 25045217 PMCID: PMC4101816 DOI: 10.4172/jpb.s2-001] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The spectacular heterogeneity of a complex protein mixture from biological samples becomes even more difficult to tackle when one’s attention is shifted towards different protein complex topologies, transient interactions, or localization of PPIs. Meticulous protein-by-protein affinity pull-downs and yeast-two-hybrid screens are the two approaches currently used to decipher proteome-wide interaction networks. Another method is to employ chemical cross-linking, which gives not only identities of interactors, but could also provide information on the sites of interactions and interaction interfaces. Despite significant advances in mass spectrometry instrumentation over the last decade, mapping Protein-Protein Interactions (PPIs) using chemical cross-linking remains time consuming and requires substantial expertise, even in the simplest of systems. While robust methodologies and software exist for the analysis of binary PPIs and also for the single protein structure refinement using cross-linking-derived constraints, undertaking a proteome-wide cross-linking study is highly complex. Difficulties include i) identifying cross-linkers of the right length and selectivity that could capture interactions of interest; ii) enrichment of the cross-linked species; iii) identification and validation of the cross-linked peptides and cross-linked sites. In this review we examine existing literature aimed at the large-scale protein cross-linking and discuss possible paths for improvement. We also discuss short-length cross-linkers of broad specificity such as formaldehyde and diazirine-based photo-cross-linkers. These cross-linkers could potentially capture many types of interactions, without strict requirement for a particular amino-acid to be present at a given protein-protein interface. How these shortlength, broad specificity cross-linkers be applied to proteome-wide studies? We will suggest specific advances in methodology, instrumentation and software that are needed to make such a leap.
Collapse
Affiliation(s)
- Boris L Zybailov
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Galina V Glazko
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Mihir Jaiswal
- UALR/UAMS Joint Bioinformatics Program, University of Arkansas Little Rock, Little Rock, AR, USA
| | - Kevin D Raney
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| |
Collapse
|
19
|
State of the art in silico tools for the study of signaling pathways in cancer. Int J Mol Sci 2012; 13:6561-6581. [PMID: 22837650 PMCID: PMC3397482 DOI: 10.3390/ijms13066561] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2012] [Revised: 05/03/2012] [Accepted: 05/10/2012] [Indexed: 12/18/2022] Open
Abstract
In the last several years, researchers have exhibited an intense interest in the evolutionarily conserved signaling pathways that have crucial roles during embryonic development. Interestingly, the malfunctioning of these signaling pathways leads to several human diseases, including cancer. The chemical and biophysical events that occur during cellular signaling, as well as the number of interactions within a signaling pathway, make these systems complex to study. In silico resources are tools used to aid the understanding of cellular signaling pathways. Systems approaches have provided a deeper knowledge of diverse biochemical processes, including individual metabolic pathways, signaling networks and genome-scale metabolic networks. In the future, these tools will be enormously valuable, if they continue to be developed in parallel with growing biological knowledge. In this study, an overview of the bioinformatics resources that are currently available for the analysis of biological networks is provided.
Collapse
|
20
|
Nibbe RK, Chowdhury SA, Koyutürk M, Ewing R, Chance MR. Protein-protein interaction networks and subnetworks in the biology of disease. WILEY INTERDISCIPLINARY REVIEWS. SYSTEMS BIOLOGY AND MEDICINE 2011; 3:357-67. [PMID: 20865778 DOI: 10.1002/wsbm.121] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The main goal of systems medicine is to provide predictive models of the patho-physiology of complex diseases as well as define healthy states. The reason is clear--we hope accurate models will ultimately lead to more specific and sensitive markers of disease that will help clinicians better stratify their patient populations and optimize treatment plans. In addition, we expect that these models will define novel targets for combating disease. However, for many complex diseases, particularly at the clinical level, it is becoming increasingly clear that one or a few genomic variations alone (e.g., simple models) cannot adequately explain the multiple phenotypes related to disease states, or the variable risks that attend disease progression. We suggest that models that account for the activities of many interacting proteins will explain a wider range of variability inherent in these phenotypes. These models, which encompass protein interaction networks dysregulated for specific diseases and specific patient sub-populations, will be constructed by integrating protein interaction data with multiple types of other relevant cellular information. Protein interaction databases are thus playing an increasingly important role in systems biology approaches to the study of disease. They present us with a static, but highly functional view of the cellular state, and thus give us a better understanding of not only the normal phenotype, but also the overall disease phenotype at the level of the whole organism when certain interactions become dysregulated.
Collapse
Affiliation(s)
- Rod K Nibbe
- Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH, USA.
| | | | | | | | | |
Collapse
|
21
|
He Z, Yu W. Stable feature selection for biomarker discovery. Comput Biol Chem 2010; 34:215-25. [PMID: 20702140 DOI: 10.1016/j.compbiolchem.2010.07.002] [Citation(s) in RCA: 131] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2010] [Revised: 06/27/2010] [Accepted: 07/10/2010] [Indexed: 12/27/2022]
|
22
|
Minguez P, Dopazo J. Functional genomics and networks: new approaches in the extraction of complex gene modules. Expert Rev Proteomics 2010; 7:55-63. [PMID: 20121476 DOI: 10.1586/epr.09.103] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The engine that makes the cell work is made of an intricate network of molecular interactions. Nowadays, the elements and relationships of this complex network can be studied with several types of high-throughput techniques. The dream of having a global picture of the cell from different perspectives that can jointly explain cell behavior is, at least technically, feasible. However, this task can only be accomplished by filling the gap between data and information. The availability of methods capable of accurately managing, integrating and analyzing the results from these experiments is crucial for this purpose. Here, we review the new challenges raised by the availability of different genomic data, as well as the new proposals presented to cope with the increasing data complexity. Special emphasis is given to approaches that explore the transcriptome trying to describe the modules of genes that account for the traits studied.
Collapse
Affiliation(s)
- Pablo Minguez
- Department of Bioinformatics and Genomics, Centro de Investigación Príncipe Felipe, Valencia, Spain
| | | |
Collapse
|
23
|
Gu J, Chen Y, Li S, Li Y. Identification of responsive gene modules by network-based gene clustering and extending: application to inflammation and angiogenesis. BMC SYSTEMS BIOLOGY 2010; 4:47. [PMID: 20406493 PMCID: PMC2873318 DOI: 10.1186/1752-0509-4-47] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2009] [Accepted: 04/21/2010] [Indexed: 12/20/2022]
Abstract
BACKGROUND Cell responses to environmental stimuli are usually organized as relatively separate responsive gene modules at the molecular level. Identification of responsive gene modules rather than individual differentially expressed (DE) genes will provide important information about the underlying molecular mechanisms. Most of current methods formulate module identification as an optimization problem: find the active sub-networks in the genome-wide gene network by maximizing the objective function considering the gene differential expression and/or the gene-gene co-expression information. Here we presented a new formulation of this task: a group of closely-connected and co-expressed DE genes in the gene network are regarded as the signatures of the underlying responsive gene modules; the modules can be identified by finding the signatures and then recovering the "missing parts" by adding the intermediate genes that connect the DE genes in the gene network. RESULTS ClustEx, a two-step method based on the new formulation, was developed and applied to identify the responsive gene modules of human umbilical vein endothelial cells (HUVECs) in inflammation and angiogenesis models by integrating the time-course microarray data and genome-wide PPI data. It shows better performance than several available module identification tools by testing on the reference responsive gene sets. Gene set analysis of KEGG pathways, GO terms and microRNAs (miRNAs) target gene sets further supports the ClustEx predictions. CONCLUSION Taking the closely-connected and co-expressed DE genes in the condition-specific gene network as the signatures of the underlying responsive gene modules provides a new strategy to solve the module identification problem. The identified responsive gene modules of HUVECs and the corresponding enriched pathways/miRNAs provide useful resources for understanding the inflammatory and angiogenic responses of vascular systems.
Collapse
Affiliation(s)
- Jin Gu
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, Tsinghua National Laboratory for Information Science and Technology (TNLIST) and Department of Automation, Tsinghua University, Beijing 100084, China
| | | | | | | |
Collapse
|
24
|
Fortney K, Kotlyar M, Jurisica I. Inferring the functions of longevity genes with modular subnetwork biomarkers of Caenorhabditis elegans aging. Genome Biol 2010; 11:R13. [PMID: 20128910 PMCID: PMC2872873 DOI: 10.1186/gb-2010-11-2-r13] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2009] [Revised: 01/29/2010] [Accepted: 02/03/2010] [Indexed: 02/02/2023] Open
Abstract
An algorithm for determining networks from gene expression data enables the identification of genes potentially linked to aging in worms. A central goal of biogerontology is to identify robust gene-expression biomarkers of aging. Here we develop a method where the biomarkers are networks of genes selected based on age-dependent activity and a graph-theoretic property called modularity. Tested on Caenorhabditis elegans, our algorithm yields better biomarkers than previous methods - they are more conserved across studies and better predictors of age. We apply these modular biomarkers to assign novel aging-related functions to poorly characterized longevity genes.
Collapse
Affiliation(s)
- Kristen Fortney
- Department of Medical Biophysics, University of Toronto, 610 University Avenue, Toronto, M5G 2M9, Canada
| | | | | |
Collapse
|