1
|
Wu W, Yang T, Ma X, Zhang W, Li H, Huang J, Li Y, Cui J. Learning Specific and Conserved Features of Multi-layer Networks. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
2
|
|
3
|
Horvath A, Daniel B, Szeles L, Cuaranta-Monroy I, Czimmerer Z, Ozgyin L, Steiner L, Kiss M, Simandi Z, Poliska S, Giannakis N, Raineri E, Gut IG, Nagy B, Nagy L. Labelled regulatory elements are pervasive features of the macrophage genome and are dynamically utilized by classical and alternative polarization signals. Nucleic Acids Res 2019; 47:2778-2792. [PMID: 30799488 PMCID: PMC6451134 DOI: 10.1093/nar/gkz118] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Accepted: 02/14/2019] [Indexed: 01/09/2023] Open
Abstract
The concept of tissue-specific gene expression posits that lineage-determining transcription factors (LDTFs) determine the open chromatin profile of a cell via collaborative binding, providing molecular beacons to signal-dependent transcription factors (SDTFs). However, the guiding principles of LDTF binding, chromatin accessibility and enhancer activity have not yet been systematically evaluated. We sought to study these features of the macrophage genome by the combination of experimental (ChIP-seq, ATAC-seq and GRO-seq) and computational approaches. We show that Random Forest and Support Vector Regression machine learning methods can accurately predict chromatin accessibility using the binding patterns of the LDTF PU.1 and four other key TFs of macrophages (IRF8, JUNB, CEBPA and RUNX1). Any of these TFs alone were not sufficient to predict open chromatin, indicating that TF binding is widespread at closed or weakly opened chromatin regions. Analysis of the PU.1 cistrome revealed that two-thirds of PU.1 binding occurs at low accessible chromatin. We termed these sites labelled regulatory elements (LREs), which may represent a dormant state of a future enhancer and contribute to macrophage cellular plasticity. Collectively, our work demonstrates the existence of LREs occupied by various key TFs, regulating specific gene expression programs triggered by divergent macrophage polarizing stimuli.
Collapse
Affiliation(s)
- Attila Horvath
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, H-4032 Debrecen, Hungary
| | - Bence Daniel
- Johns Hopkins University School of Medicine, Department of Medicine and Biological Chemistry, Institute for Fundamental Biomedical Research, Johns Hopkins All Children's Hospital, Saint Petersburg, FL 33701, USA
| | - Lajos Szeles
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, H-4032 Debrecen, Hungary
| | - Ixchelt Cuaranta-Monroy
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, H-4032 Debrecen, Hungary
| | - Zsolt Czimmerer
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, H-4032 Debrecen, Hungary
| | - Lilla Ozgyin
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, H-4032 Debrecen, Hungary
| | - Laszlo Steiner
- UD-GenoMed Medical Genomic Technologies Ltd., Nagyerdei krt. 98., H-4032 Debrecen, Hungary
| | - Mate Kiss
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, H-4032 Debrecen, Hungary
| | - Zoltan Simandi
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, H-4032 Debrecen, Hungary
| | - Szilard Poliska
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, H-4032 Debrecen, Hungary.,UD-GenoMed Medical Genomic Technologies Ltd., Nagyerdei krt. 98., H-4032 Debrecen, Hungary
| | - Nikolas Giannakis
- UD-GenoMed Medical Genomic Technologies Ltd., Nagyerdei krt. 98., H-4032 Debrecen, Hungary
| | - Emanuele Raineri
- Centro Nacional de Analisis Genomico (CNAG-CRG), Center for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), C/Baldiri Reixac 4, 08028 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Plaça de la Mercè 10, 08002, Barcelona, Spain
| | - Ivo G Gut
- Centro Nacional de Analisis Genomico (CNAG-CRG), Center for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), C/Baldiri Reixac 4, 08028 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Plaça de la Mercè 10, 08002, Barcelona, Spain
| | - Benedek Nagy
- Department of Mathematics, Eastern Mediterranean University, Famagusta, North Cyprus, Mersin 10, Turkey
| | - Laszlo Nagy
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, H-4032 Debrecen, Hungary.,Johns Hopkins University School of Medicine, Department of Medicine and Biological Chemistry, Institute for Fundamental Biomedical Research, Johns Hopkins All Children's Hospital, Saint Petersburg, FL 33701, USA
| |
Collapse
|
4
|
Turki T, Wei Z, Wang JTL. A transfer learning approach via procrustes analysis and mean shift for cancer drug sensitivity prediction. J Bioinform Comput Biol 2019; 16:1840014. [PMID: 29945499 DOI: 10.1142/s0219720018400140] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Transfer learning (TL) algorithms aim to improve the prediction performance in a target task (e.g. the prediction of cisplatin sensitivity in triple-negative breast cancer patients) via transferring knowledge from auxiliary data of a related task (e.g. the prediction of docetaxel sensitivity in breast cancer patients), where the distribution and even the feature space of the data pertaining to the tasks can be different. In real-world applications, we sometimes have a limited training set in a target task while we have auxiliary data from a related task. To obtain a better prediction performance in the target task, supervised learning requires a sufficiently large training set in the target task to perform well in predicting future test examples of the target task. In this paper, we propose a TL approach for cancer drug sensitivity prediction, where our approach combines three techniques. First, we shift the representation of a subset of examples from auxiliary data of a related task to a representation closer to a target training set of a target task. Second, we align the shifted representation of the selected examples of the auxiliary data to the target training set to obtain examples with representation aligned to the target training set. Third, we train machine learning algorithms using both the target training set and the aligned examples. We evaluate the performance of our approach against baseline approaches using the Area Under the receiver operating characteristic (ROC) Curve (AUC) on real clinical trial datasets pertaining to multiple myeloma, nonsmall cell lung cancer, triple-negative breast cancer, and breast cancer. Experimental results show that our approach is better than the baseline approaches in terms of performance and statistical significance.
Collapse
Affiliation(s)
- Turki Turki
- * Department of Computer Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Zhi Wei
- † Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | - Jason T L Wang
- † Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA
| |
Collapse
|
5
|
Mounika Inavolu S, Renbarger J, Radovich M, Vasudevaraja V, Kinnebrew GH, Zhang S, Cheng L. IODNE: An integrated optimization method for identifying the deregulated subnetwork for precision medicine in cancer. CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY 2017; 6:168-176. [PMID: 28266149 PMCID: PMC5351413 DOI: 10.1002/psp4.12167] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Revised: 01/05/2017] [Accepted: 01/06/2017] [Indexed: 12/18/2022]
Abstract
Subnetwork analysis can explore complex patterns of entire molecular pathways for the purpose of drug target identification. In this article, the gene expression profiles of a cohort of patients with breast cancer are integrated with protein‐protein interaction (PPI) networks using, simultaneously, both edge scoring and node scoring. A novel optimization algorithm, integrated optimization method to identify deregulated subnetwork (IODNE), is developed to search for the optimal dysregulated subnetwork of the merged gene and protein network. IODNE is applied to select subnetworks for Luminal‐A breast cancer from The Cancer Genome Atlas (TCGA) data. A large fraction of cancer‐related genes and the well‐known clinical targets, ER1/PR and HER2, are found by IODNE. This validates the utility of IODNE. When applying IODNE to the triple‐negative breast cancer (TNBC) subtype data, we identified subnetworks that contain genes such as ERBB2, HRAS, PGR, CAD, POLE, and SLC2A1.
Collapse
Affiliation(s)
- S Mounika Inavolu
- Center for Computational Biology and Bioinformatics, School of Medicine, Indiana University, Indianapolis, Indiana, USA.,Department of Medical and Molecular Genetics, School of Medicine, Indiana University, Indianapolis, Indiana, USA
| | - J Renbarger
- Department of Pediatrics, Hematology/Oncology, School of Medicine, Indiana University, Indianapolis, Indiana, USA
| | - M Radovich
- Center for Computational Biology and Bioinformatics, School of Medicine, Indiana University, Indianapolis, Indiana, USA.,Department of Medical and Molecular Genetics, School of Medicine, Indiana University, Indianapolis, Indiana, USA
| | - V Vasudevaraja
- Center for Computational Biology and Bioinformatics, School of Medicine, Indiana University, Indianapolis, Indiana, USA.,Department of Medical and Molecular Genetics, School of Medicine, Indiana University, Indianapolis, Indiana, USA
| | - G H Kinnebrew
- Center for Computational Biology and Bioinformatics, School of Medicine, Indiana University, Indianapolis, Indiana, USA.,Department of Medical and Molecular Genetics, School of Medicine, Indiana University, Indianapolis, Indiana, USA
| | - S Zhang
- Center for Computational Biology and Bioinformatics, School of Medicine, Indiana University, Indianapolis, Indiana, USA.,Department of Medical and Molecular Genetics, School of Medicine, Indiana University, Indianapolis, Indiana, USA
| | - L Cheng
- Center for Computational Biology and Bioinformatics, School of Medicine, Indiana University, Indianapolis, Indiana, USA.,Department of Medical and Molecular Genetics, School of Medicine, Indiana University, Indianapolis, Indiana, USA.,Department of Pediatrics, Hematology/Oncology, School of Medicine, Indiana University, Indianapolis, Indiana, USA
| |
Collapse
|
6
|
Xiao X, Zhang W, Zou X. A new asynchronous parallel algorithm for inferring large-scale gene regulatory networks. PLoS One 2015; 10:e0119294. [PMID: 25807392 PMCID: PMC4373852 DOI: 10.1371/journal.pone.0119294] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Accepted: 01/29/2015] [Indexed: 11/18/2022] Open
Abstract
The reconstruction of gene regulatory networks (GRNs) from high-throughput experimental data has been considered one of the most important issues in systems biology research. With the development of high-throughput technology and the complexity of biological problems, we need to reconstruct GRNs that contain thousands of genes. However, when many existing algorithms are used to handle these large-scale problems, they will encounter two important issues: low accuracy and high computational cost. To overcome these difficulties, the main goal of this study is to design an effective parallel algorithm to infer large-scale GRNs based on high-performance parallel computing environments. In this study, we proposed a novel asynchronous parallel framework to improve the accuracy and lower the time complexity of large-scale GRN inference by combining splitting technology and ordinary differential equation (ODE)-based optimization. The presented algorithm uses the sparsity and modularity of GRNs to split whole large-scale GRNs into many small-scale modular subnetworks. Through the ODE-based optimization of all subnetworks in parallel and their asynchronous communications, we can easily obtain the parameters of the whole network. To test the performance of the proposed approach, we used well-known benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge (DREAM), experimentally determined GRN of Escherichia coli and one published dataset that contains more than 10 thousand genes to compare the proposed approach with several popular algorithms on the same high-performance computing environments in terms of both accuracy and time complexity. The numerical results demonstrate that our parallel algorithm exhibits obvious superiority in inferring large-scale GRNs.
Collapse
Affiliation(s)
- Xiangyun Xiao
- School of Mathematics and Statistics, Wuhan University, Wuhan, China
| | - Wei Zhang
- School of Science, East China Jiaotong University, Nanchang, China
| | - Xiufen Zou
- School of Mathematics and Statistics, Wuhan University, Wuhan, China
- * E-mail:
| |
Collapse
|
7
|
Amgalan B, Lee H. WMAXC: a weighted maximum clique method for identifying condition-specific sub-network. PLoS One 2014; 9:e104993. [PMID: 25148538 PMCID: PMC4141761 DOI: 10.1371/journal.pone.0104993] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 07/07/2014] [Indexed: 11/19/2022] Open
Abstract
Sub-networks can expose complex patterns in an entire bio-molecular network by extracting interactions that depend on temporal or condition-specific contexts. When genes interact with each other during cellular processes, they may form differential co-expression patterns with other genes across different cell states. The identification of condition-specific sub-networks is of great importance in investigating how a living cell adapts to environmental changes. In this work, we propose the weighted MAXimum clique (WMAXC) method to identify a condition-specific sub-network. WMAXC first proposes scoring functions that jointly measure condition-specific changes to both individual genes and gene-gene co-expressions. It then employs a weaker formula of a general maximum clique problem and relates the maximum scored clique of a weighted graph to the optimization of a quadratic objective function under sparsity constraints. We combine a continuous genetic algorithm and a projection procedure to obtain a single optimal sub-network that maximizes the objective function (scoring function) over the standard simplex (sparsity constraints). We applied the WMAXC method to both simulated data and real data sets of ovarian and prostate cancer. Compared with previous methods, WMAXC selected a large fraction of cancer-related genes, which were enriched in cancer-related pathways. The results demonstrated that our method efficiently captured a subset of genes relevant under the investigated condition.
Collapse
Affiliation(s)
- Bayarbaatar Amgalan
- School of Information and Communications, Gwangju Institute of Science and Technology, Gwangju, South Korea
| | - Hyunju Lee
- School of Information and Communications, Gwangju Institute of Science and Technology, Gwangju, South Korea
| |
Collapse
|
8
|
Chen X, Xuan J, Wang C, Shajahan AN, Riggins RB, Clarke R. Reconstruction of transcriptional regulatory networks by stability-based network component analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1347-1358. [PMID: 24407294 PMCID: PMC3652899 DOI: 10.1109/tcbb.2012.146] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Reliable inference of transcription regulatory networks is a challenging task in computational biology. Network component analysis (NCA) has become a powerful scheme to uncover regulatory networks behind complex biological processes. However, the performance of NCA is impaired by the high rate of false connections in binding information. In this paper, we integrate stability analysis with NCA to form a novel scheme, namely stability-based NCA (sNCA), for regulatory network identification. The method mainly addresses the inconsistency between gene expression data and binding motif information. Small perturbations are introduced to prior regulatory network, and the distance among multiple estimated transcript factor (TF) activities is computed to reflect the stability for each TF's binding network. For target gene identification, multivariate regression and t-statistic are used to calculate the significance for each TF-gene connection. Simulation studies are conducted and the experimental results show that sNCA can achieve an improved and robust performance in TF identification as compared to NCA. The approach for target gene identification is also demonstrated to be suitable for identifying true connections between TFs and their target genes. Furthermore, we have successfully applied sNCA to breast cancer data to uncover the role of TFs in regulating endocrine resistance in breast cancer.
Collapse
Affiliation(s)
- Xi Chen
- Virginia Polytechnic Institute and State University, Arlington
| | - Jianhua Xuan
- Virginia Polytechnic Institute and State University, Arlington
| | - Chen Wang
- Virginia Polytechnic Institute and State University, Arlington
| | | | | | | |
Collapse
|
9
|
Cui Y, Zheng CH, Yang J. Identifying subspace gene clusters from microarray data using low-rank representation. PLoS One 2013; 8:e59377. [PMID: 23527177 PMCID: PMC3602020 DOI: 10.1371/journal.pone.0059377] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Accepted: 02/13/2013] [Indexed: 12/23/2022] Open
Abstract
Identifying subspace gene clusters from the gene expression data is useful for discovering novel functional gene interactions. In this paper, we propose to use low-rank representation (LRR) to identify the subspace gene clusters from microarray data. LRR seeks the lowest-rank representation among all the candidates that can represent the genes as linear combinations of the bases in the dataset. The clusters can be extracted based on the block diagonal representation matrix obtained using LRR, and they can well capture the intrinsic patterns of genes with similar functions. Meanwhile, the parameter of LRR can balance the effect of noise so that the method is capable of extracting useful information from the data with high level of background noise. Compared with traditional methods, our approach can identify genes with similar functions yet without similar expression profiles. Also, it could assign one gene into different clusters. Moreover, our method is robust to the noise and can identify more biologically relevant gene clusters. When applied to three public datasets, the results show that the LRR based method is superior to existing methods for identifying subspace gene clusters.
Collapse
Affiliation(s)
- Yan Cui
- School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Chun-Hou Zheng
- College of Electrical Engineering and Automation, Anhui University, Hefei, Anhui, China
| | - Jian Yang
- School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
- * E-mail:
| |
Collapse
|
10
|
PROSPER: an integrated feature-based tool for predicting protease substrate cleavage sites. PLoS One 2012; 7:e50300. [PMID: 23209700 PMCID: PMC3510211 DOI: 10.1371/journal.pone.0050300] [Citation(s) in RCA: 222] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Accepted: 10/18/2012] [Indexed: 12/04/2022] Open
Abstract
The ability to catalytically cleave protein substrates after synthesis is fundamental for all forms of life. Accordingly, site-specific proteolysis is one of the most important post-translational modifications. The key to understanding the physiological role of a protease is to identify its natural substrate(s). Knowledge of the substrate specificity of a protease can dramatically improve our ability to predict its target protein substrates, but this information must be utilized in an effective manner in order to efficiently identify protein substrates by in silico approaches. To address this problem, we present PROSPER, an integrated feature-based server for in silico identification of protease substrates and their cleavage sites for twenty-four different proteases. PROSPER utilizes established specificity information for these proteases (derived from the MEROPS database) with a machine learning approach to predict protease cleavage sites by using different, but complementary sequence and structure characteristics. Features used by PROSPER include local amino acid sequence profile, predicted secondary structure, solvent accessibility and predicted native disorder. Thus, for proteases with known amino acid specificity, PROSPER provides a convenient, pre-prepared tool for use in identifying protein substrates for the enzymes. Systematic prediction analysis for the twenty-four proteases thus far included in the database revealed that the features we have included in the tool strongly improve performance in terms of cleavage site prediction, as evidenced by their contribution to performance improvement in terms of identifying known cleavage sites in substrates for these enzymes. In comparison with two state-of-the-art prediction tools, PoPS and SitePrediction, PROSPER achieves greater accuracy and coverage. To our knowledge, PROSPER is the first comprehensive server capable of predicting cleavage sites of multiple proteases within a single substrate sequence using machine learning techniques. It is freely available at http://lightning.med.monash.edu.au/PROSPER/.
Collapse
|
11
|
Chen L, Xuan J, Riggins RB, Wang Y, Clarke R. Identifying protein interaction subnetworks by a bagging Markov random field-based method. Nucleic Acids Res 2012; 41:e42. [PMID: 23161673 PMCID: PMC3553975 DOI: 10.1093/nar/gks951] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Identification of differentially expressed subnetworks from protein–protein interaction (PPI) networks has become increasingly important to our global understanding of the molecular mechanisms that drive cancer. Several methods have been proposed for PPI subnetwork identification, but the dependency among network member genes is not explicitly considered, leaving many important hub genes largely unidentified. We present a new method, based on a bagging Markov random field (BMRF) framework, to improve subnetwork identification for mechanistic studies of breast cancer. The method follows a maximum a posteriori principle to form a novel network score that explicitly considers pairwise gene interactions in PPI networks, and it searches for subnetworks with maximal network scores. To improve their robustness across data sets, a bagging scheme based on bootstrapping samples is implemented to statistically select high confidence subnetworks. We first compared the BMRF-based method with existing methods on simulation data to demonstrate its improved performance. We then applied our method to breast cancer data to identify PPI subnetworks associated with breast cancer progression and/or tamoxifen resistance. The experimental results show that not only an improved prediction performance can be achieved by the BMRF approach when tested on independent data sets, but biologically meaningful subnetworks can also be revealed that are relevant to breast cancer and tamoxifen resistance.
Collapse
Affiliation(s)
- Li Chen
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | | | | | | | | |
Collapse
|
12
|
Zhang X, Liu K, Liu ZP, Duval B, Richer JM, Zhao XM, Hao JK, Chen L. NARROMI: a noise and redundancy reduction technique improves accuracy of gene regulatory network inference. Bioinformatics 2012; 29:106-13. [DOI: 10.1093/bioinformatics/bts619] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
|
13
|
Gu J, Xuan J, Riggins RB, Chen L, Wang Y, Clarke R. Robust identification of transcriptional regulatory networks using a Gibbs sampler on outlier sum statistic. Bioinformatics 2012; 28:1990-7. [PMID: 22595208 DOI: 10.1093/bioinformatics/bts296] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive 'noise' in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context. RESULTS In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. AVAILABILITY AND IMPLEMENTATION The Gibbs sampler MATLAB package is freely available at http://www.cbil.ece.vt.edu/software.htm. CONTACT xuan@vt.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jinghua Gu
- Bradley Department of Electrical and Computer Engineering, Virginia Tech, Arlington, VA 22203, USA
| | | | | | | | | | | |
Collapse
|
14
|
Song J, Tan H, Wang M, Webb GI, Akutsu T. TANGLE: two-level support vector regression approach for protein backbone torsion angle prediction from primary sequences. PLoS One 2012; 7:e30361. [PMID: 22319565 PMCID: PMC3271071 DOI: 10.1371/journal.pone.0030361] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Accepted: 12/14/2011] [Indexed: 12/29/2022] Open
Abstract
Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/.
Collapse
Affiliation(s)
- Jiangning Song
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan
- * E-mail: (JS); (GIW); (TA)
| | - Hao Tan
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
| | - Mingjun Wang
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
| | - Geoffrey I. Webb
- Faculty of Information Technology, Monash University, Melbourne, Victoria, Australia
- * E-mail: (JS); (GIW); (TA)
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan
- * E-mail: (JS); (GIW); (TA)
| |
Collapse
|
15
|
Clarke R, Shajahan AN, Wang Y, Tyson JJ, Riggins RB, Weiner LM, Bauman WT, Xuan J, Zhang B, Facey C, Aiyer H, Cook K, Hickman FE, Tavassoly I, Verdugo A, Chen C, Zwart A, Wärri A, Hilakivi-Clarke LA. Endoplasmic reticulum stress, the unfolded protein response, and gene network modeling in antiestrogen resistant breast cancer. Horm Mol Biol Clin Investig 2011; 5:35-44. [PMID: 23930139 DOI: 10.1515/hmbci.2010.073] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Lack of understanding of endocrine resistance remains one of the major challenges for breast cancer researchers, clinicians, and patients. Current reductionist approaches to understanding the molecular signaling driving resistance have offered mostly incremental progress over the past 10 years. As the field of systems biology has begun to mature, the approaches and network modeling tools being developed and applied therein offer a different way to think about how molecular signaling and the regulation of critical cellular functions are integrated. To gain novel insights, we first describe some of the key challenges facing network modeling of endocrine resistance, many of which arise from the properties of the data spaces being studied. We then use activation of the unfolded protein response (UPR) following induction of endoplasmic reticulum stress in breast cancer cells by antiestrogens, to illustrate our approaches to computational modeling. Activation of UPR is a key determinant of cell fate decision making and regulation of autophagy and apoptosis. These initial studies provide insight into a small subnetwork topology obtained using differential dependency network analysis and focused on the UPR gene XBP1. The XBP1 subnetwork topology incorporates BCAR3, BCL2, BIK, NFκB, and other genes as nodes; the connecting edges represent the dependency structures amongst these nodes. As data from ongoing cellular and molecular studies become available, we will build detailed mathematical models of this XBP1-UPR network.
Collapse
Affiliation(s)
- Robert Clarke
- Department of Oncology, Georgetown University School of Medicine, Washington, DC 20057, U.S.A. ; Lombardi Comprehensive Cancer Center, Georgetown University School of Medicine, Washington, DC 20057, U.S.A
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|