1
|
Chatterjee K, Mal S, Ghosh M, Chattopadhyay NR, Roy SD, Chakraborty K, Mukherjee S, Aier M, Choudhuri T. Blood-based DNA methylation in advanced Nasopharyngeal Carcinoma exhibited distinct CpG methylation signature. Sci Rep 2023; 13:22086. [PMID: 38086861 PMCID: PMC10716134 DOI: 10.1038/s41598-023-45001-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 10/14/2023] [Indexed: 12/18/2023] Open
Abstract
The TNM staging system is currently used to detect cancer stages. Regardless, a small proportion of cancer patients recur even after therapy, suggesting more specific molecular tools are required to justify the stage-specific detection and prompt cancer diagnosis. Thus, we aimed to explore the blood-based DNA methylation signature of metastatic nasopharyngeal carcinoma (NPC) to establish a holistic methylation biomarker panel. For the identification of methylation signature, the EPIC BeadChip-based array was performed. Comparative analysis for identifying unique probes, validation, and functional studies was investigated by analyzing GEO and TCGA datasets. We observed 4093 differentially methylated probes (DMPs), 1232 hydroxymethylated probes, and 25 CpG islands. Gene expression study revealed both upregulated and downregulated genes. Correlation analysis suggested a positive (with a positive r, p ≤ 0.05) and negative (with a negative r, p ≤ 0.05) association with different cancers. TFBS analysis exhibited the binding site for many TFs. Furthermore, gene enrichment analysis indicated the involvement of those identified genes in biological pathways. However, blood-based DNA methylation data uncovered a distinct DNA methylation pattern, which might have an additive role in NPC progression by altering the TFs binding. Moreover, based on tissue-specificity, a variation of correlation between methylation and gene expression was noted in different cancers.
Collapse
Affiliation(s)
- Koustav Chatterjee
- Department of Biotechnology, Visva-Bharati, Santiniketan, Birbhum, West Bengal, India, 731235
| | - Sudipa Mal
- Department of Biotechnology, Visva-Bharati, Santiniketan, Birbhum, West Bengal, India, 731235
| | - Monalisha Ghosh
- Department of Biotechnology, Visva-Bharati, Santiniketan, Birbhum, West Bengal, India, 731235
| | | | - Sankar Deb Roy
- Department of Radiation Oncology, Eden Medical Center, Dimapur, Nagaland, India
| | - Koushik Chakraborty
- Department of Biotechnology, Visva-Bharati, Santiniketan, Birbhum, West Bengal, India, 731235
| | - Syamantak Mukherjee
- Department of Biotechnology, Visva-Bharati, Santiniketan, Birbhum, West Bengal, India, 731235
| | - Moatoshi Aier
- Department of Pathology, Eden Medical Center, Dimapur, Nagaland, India
| | - Tathagata Choudhuri
- Department of Biotechnology, Visva-Bharati, Santiniketan, Birbhum, West Bengal, India, 731235.
| |
Collapse
|
2
|
Sghaier N, Essemine J, Ayed RB, Gorai M, Ben Marzoug R, Rebai A, Qu M. An Evidence Theory and Fuzzy Logic Combined Approach for the Prediction of Potential ARF-Regulated Genes in Quinoa. PLANTS (BASEL, SWITZERLAND) 2022; 12:71. [PMID: 36616201 PMCID: PMC9824623 DOI: 10.3390/plants12010071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 11/26/2022] [Indexed: 06/17/2023]
Abstract
Quinoa constitutes among the tolerant plants to the challenging and harmful abiotic environmental factors. Quinoa was selected as among the model crops destined for bio-saline agriculture that could contribute to the staple food security for an ever-growing worldwide population under various climate change scenarios. The auxin response factors (ARFs) constitute the main contributors in the plant adaptation to severe environmental conditions. Thus, the determination of the ARF-binding sites represents the major step that could provide promising insights helping in plant breeding programs and improving agronomic traits. Hence, determining the ARF-binding sites is a challenging task, particularly in species with large genome sizes. In this report, we present a data fusion approach based on Dempster-Shafer evidence theory and fuzzy set theory to predict the ARF-binding sites. We then performed an "In-silico" identification of the ARF-binding sites in Chenopodium quinoa. The characterization of some known pathways implicated in the auxin signaling in other higher plants confirms our prediction reliability. Furthermore, several pathways with no or little available information about their functions were identified to play important roles in the adaptation of quinoa to environmental conditions. The predictive auxin response genes associated with the detected ARF-binding sites may certainly help to explore the biological roles of some unknown genes newly identified in quinoa.
Collapse
Affiliation(s)
- Nesrine Sghaier
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya 572024, China
- CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
- Laboratory of Advanced Technology and Intelligent Systems, National Engineering School of Sousse, Sousse 4023, Tunisia
| | - Jemaa Essemine
- CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| | - Rayda Ben Ayed
- Department of Agronomy and Plant Biotechnology, National Institute of Agronomy of Tunisia (INAT), 43 Avenue Charles Nicolle, 1082 El Mahrajène, University of Carthage-Tunis, Tunis 1082, Tunisia
- Laboratory of Extremophile Plants, Centre of Biotechnology of Borj-Cédria, B.P. 901, Hammam Lif 2050, Tunisia
| | - Mustapha Gorai
- Higher Institute of Applied Biology Medenine, University of Gabes, Medenine 4119, Tunisia
| | - Riadh Ben Marzoug
- Laboratory of Molecular and Cellular Screening Processes, Sfax Biotechnology Center, B.P 1177, Sfax 3018, Tunisia
| | - Ahmed Rebai
- Laboratory of Molecular and Cellular Screening Processes, Sfax Biotechnology Center, B.P 1177, Sfax 3018, Tunisia
| | - Mingnan Qu
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya 572024, China
- CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| |
Collapse
|
3
|
Xu Y, Chen J, Lyu A, Cheung WK, Zhang L. dynDeepDRIM: a dynamic deep learning model to infer direct regulatory interactions using time-course single-cell gene expression data. Brief Bioinform 2022; 23:6720420. [PMID: 36168811 DOI: 10.1093/bib/bbac424] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 08/02/2022] [Accepted: 09/01/2022] [Indexed: 12/14/2022] Open
Abstract
Time-course single-cell RNA sequencing (scRNA-seq) data have been widely used to explore dynamic changes in gene expression of transcription factors (TFs) and their target genes. This information is useful to reconstruct cell-type-specific gene regulatory networks (GRNs). However, the existing tools are commonly designed to analyze either time-course bulk gene expression data or static scRNA-seq data via pseudo-time cell ordering. A few methods successfully utilize the information from multiple time points while also considering the characteristics of scRNA-seq data. We proposed dynDeepDRIM, a novel deep learning model to reconstruct GRNs using time-course scRNA-seq data. It represents the joint expression of a gene pair as an image and utilizes the image of the target TF-gene pair and the ones of the potential neighbors to reconstruct GRNs from time-course scRNA-seq data. dynDeepDRIM can effectively remove the transitive TF-gene interactions by considering neighborhood context and model the gene expression dynamics using high-dimensional tensors. We compared dynDeepDRIM with six GRN reconstruction methods on both simulation and four real time-course scRNA-seq data. dynDeepDRIM achieved substantially better performance than the other methods in inferring TF-gene interactions and eliminated the false positives effectively. We also applied dynDeepDRIM to annotate gene functions and found it achieved evidently better performance than the other tools due to considering the neighbor genes.
Collapse
Affiliation(s)
- Yu Xu
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Jiaxing Chen
- Computer Science and Technology, Division of Science and Technology, BNU-HKBU United International College, Jintong Road, 519087, Zhuhai, China
| | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - William K Cheung
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| |
Collapse
|
4
|
Poon SHL, Cheung JJC, Shih KC, Chan YK. A systematic review of multimodal clinical biomarkers in the management of thyroid eye disease. Rev Endocr Metab Disord 2022; 23:541-567. [PMID: 35066781 DOI: 10.1007/s11154-021-09702-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/07/2021] [Indexed: 12/25/2022]
Abstract
Thyroid Eye Disease (TED) is an autoimmune disease that affects the extraocular muscles and periorbital fat. It most commonly occurs with Graves' Disease (GD) as an extrathyroidal manifestation, hence, it is also sometimes used interchangeably with Graves' Ophthalmopathy (GO). Well-known autoimmune markers for GD include thyroid stimulating hormone (TSH) receptor antibodies (TSH-R-Ab) which contribute to hyperthyroidism and ocular signs. Currently, apart from radiological investigations, detection of TED is based on clinical signs and symptoms which is largely subjective, with no established biomarkers which could differentiate TED from merely GD. We evaluated a total of 28 studies on potential biomarkers for diagnosis of TED. Articles included were published in English, which investigated clinical markers in tear fluid, orbital adipose-connective tissues, orbital fibroblasts and extraocular muscles, serum, thyroid tissue, as well as imaging biomarkers. Results demonstrated that biomarkers with reported diagnostic power have high sensitivity and specificity for TED, including those using a combination of biomarkers to differentiate between TED and GD, as well as the use of magnetic resonance imaging (MRI). Other biomarkers which were upregulated include cytokines, proinflammatory markers, and acute phase reactants in subjects with TED, which are however, deemed less specific to TED. Further clinical investigations for these biomarkers, scrutinising their specificity and sensitivity on a larger sample of patients, may point towards selection of suitable biomarkers for aiding detection and prognosis of TED in the future.
Collapse
Affiliation(s)
- Stephanie Hiu Ling Poon
- Department of Ophthalmology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, 301B Cyberport 4, 100 Cyberport Road, Pokfulam, Hong Kong SAR
| | | | - Kendrick Co Shih
- Department of Ophthalmology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, 301B Cyberport 4, 100 Cyberport Road, Pokfulam, Hong Kong SAR.
| | - Yau Kei Chan
- Department of Ophthalmology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, 301B Cyberport 4, 100 Cyberport Road, Pokfulam, Hong Kong SAR
| |
Collapse
|
5
|
Nian X, Li L, Ma X, Li X, Li W, Zhang N, Ohiolei JA, Li L, Dai G, Liu Y, Yan H, Fu B, Xiao S, Jia W. Understanding pathogen–host interplay by expression profiles of lncRNA and mRNA in the liver of Echinococcus multilocularis-infected mice. PLoS Negl Trop Dis 2022; 16:e0010435. [PMID: 35639780 PMCID: PMC9187083 DOI: 10.1371/journal.pntd.0010435] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 06/10/2022] [Accepted: 04/20/2022] [Indexed: 11/18/2022] Open
Abstract
Almost all Echinococcus multilocularis (Em) infections occur in the liver of the intermediate host, causing a lethal zoonotic helminthic disease, alveolar echinococcosis (AE). However, the long non-coding RNAs (lncRNAs) expression profiles of the host and the potential regulatory function of lncRNA during Em infection are poorly understood. In this study, the profiles of lncRNAs and mRNAs in the liver of mice at different time points after Em infection were explored by microarray. Thirty-one differentially expressed mRNAs (DEMs) and 68 differentially expressed lncRNAs (DELs) were found continuously dysregulated. These DEMs were notably enriched in “antigen processing and presentation”, “Th1 and Th2 cell differentiation” and “Th17 cell differentiation” pathways. The potential predicted function of DELs revealed that most DELs might influence Th17 cell differentiation and TGF-β/Smad pathway of host by trans-regulating SMAD3, STAT1, and early growth response (EGR) genes. At 30 days post-infection (dpi), up-regulated DEMs were enriched in Toll-like and RIG-I-like receptor signaling pathways, which were validated by qRT-PCR, Western blotting and downstream cytokines detection. Furthermore, flow cytometric analysis and serum levels of the corresponding cytokines confirmed the changes in cell-mediated immunity in host during Em infection that showed Th1 and Th17-type CD4+ T-cells were predominant at the early infection stage whereas Th2-type CD4+ T-cells were significantly higher at the middle/late stage. Collectively, our study revealed the potential regulatory functions of lncRNAs in modulating host Th cell subsets and provide novel clues in understanding the influence of Em infection on host innate and adaptive immune response.
Collapse
Affiliation(s)
- Xiaofeng Nian
- State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
- College of Veterinary Medicine, Northwest A&F University, Yangling, Shaanxi, P. R. China
| | - Li Li
- State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
| | - Xusheng Ma
- State Key Laboratory of Veterinary Etiological Biology, National Foot and Mouth Diseases Reference Laboratory, Key Laboratory of Animal Virology of Ministry of Agriculture, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, P. R. China
| | - Xiurong Li
- State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
| | - Wenhui Li
- State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
| | - Nianzhang Zhang
- State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
| | - John Asekhaen Ohiolei
- State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
| | - Le Li
- State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
| | - Guodong Dai
- State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
| | - Yanhong Liu
- The Instrument Centre of State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, P. R. China
| | - Hongbin Yan
- State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
- * E-mail: (HY); (SX); (WJ)
| | - Baoquan Fu
- State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
- Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Disease, Yangzhou, Jiangsu, P. R. China
| | - Sa Xiao
- College of Veterinary Medicine, Northwest A&F University, Yangling, Shaanxi, P. R. China
- * E-mail: (HY); (SX); (WJ)
| | - Wanzhong Jia
- State Key Laboratory of Veterinary Etiological Biology, National Professional Laboratory for Animal Echinococcosis, Key Laboratory of Veterinary Parasitology of Gansu Province, Key Laboratory of Zoonoses of Agriculture Ministry, Lanzhou Veterinary Research Institute, CAAS, Lanzhou, Gansu, P. R. China
- Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Disease, Yangzhou, Jiangsu, P. R. China
- * E-mail: (HY); (SX); (WJ)
| |
Collapse
|
6
|
Jeong D, Lim S, Lee S, Oh M, Cho C, Seong H, Jung W, Kim S. Construction of Condition-Specific Gene Regulatory Network Using Kernel Canonical Correlation Analysis. Front Genet 2021; 12:652623. [PMID: 34093651 PMCID: PMC8172963 DOI: 10.3389/fgene.2021.652623] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 03/26/2021] [Indexed: 01/01/2023] Open
Abstract
Gene expression profile or transcriptome can represent cellular states, thus understanding gene regulation mechanisms can help understand how cells respond to external stress. Interaction between transcription factor (TF) and target gene (TG) is one of the representative regulatory mechanisms in cells. In this paper, we present a novel computational method to construct condition-specific transcriptional networks from transcriptome data. Regulatory interaction between TFs and TGs is very complex, specifically multiple-to-multiple relations. Experimental data from TF Chromatin Immunoprecipitation sequencing is useful but produces one-to-multiple relations between TF and TGs. On the other hand, co-expression networks of genes can be useful for constructing condition transcriptional networks, but there are many false positive relations in co-expression networks. In this paper, we propose a novel method to construct a condition-specific and combinatorial transcriptional network, applying kernel canonical correlation analysis (kernel CCA) to identify multiple-to-multiple TF-TG relations in certain biological condition. Kernel CCA is a well-established statistical method for computing the correlation of a group of features vs. another group of features. We, therefore, employed kernel CCA to embed TFs and TGs into a new space where the correlation of TFs and TGs are reflected. To demonstrate the usefulness of our network construction method, we used the blood transcriptome data for the investigation on the response to high fat diet in a human and an arabidopsis data set for the investigation on the response to cold/heat stress. Our method detected not only important regulatory interactions reported in previous studies but also novel TF-TG relations where a module of TF is regulating a module of TGs upon specific stress.
Collapse
Affiliation(s)
- Dabin Jeong
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Sangsoo Lim
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
| | - Sangseon Lee
- BK21 FOUR Intelligence Computing, Seoul National University, Seoul, South Korea
| | - Minsik Oh
- Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea
| | - Changyun Cho
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Hyeju Seong
- Department of Crop Science, Konkuk University, Seoul, South Korea
| | - Woosuk Jung
- Department of Crop Science, Konkuk University, Seoul, South Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul, South Korea
| |
Collapse
|
7
|
Jo K, Santos-Buitrago B, Kim M, Rhee S, Talcott C, Kim S. Logic-based analysis of gene expression data predicts association between TNF, TGFB1 and EGF pathways in basal-like breast cancer. Methods 2020; 179:89-100. [PMID: 32445696 DOI: 10.1016/j.ymeth.2020.05.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 04/30/2020] [Accepted: 05/13/2020] [Indexed: 12/16/2022] Open
Abstract
For breast cancer, clinically important subtypes are well characterized at the molecular level in terms of gene expression profiles. In addition, signaling pathways in breast cancer have been extensively studied as therapeutic targets due to their roles in tumor growth and metastasis. However, it is challenging to put signaling pathways and gene expression profiles together to characterize biological mechanisms of breast cancer subtypes since many signaling events result from post-translational modifications, rather than gene expression differences. We designed a logic-based computational framework to explain the differences in gene expression profiles among breast cancer subtypes using Pathway Logic and transcriptional network information. Pathway Logic is a rewriting-logic-based formal system for modeling biological pathways including post-translational modifications. Our method demonstrated its utility by constructing subtype-specific path from key receptors (TNFR, TGFBR1 and EGFR) to key transcription factor (TF) regulators (RELA, ATF2, SMAD3 and ELK1) and identifying potential association between pathways via TFs in basal-specific paths, which could provide a novel insight on aggressive breast cancer subtypes. Codes and results are available at http://epigenomics.snu.ac.kr/PL/.
Collapse
Affiliation(s)
- Kyuri Jo
- Department of Computer Engineering, Chungbuk National University, Cheongju, Republic of Korea
| | - Beatriz Santos-Buitrago
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
| | - Minsu Kim
- Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Sungmin Rhee
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
| | | | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea; Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea; Institute of Engineering Research, Seoul National University, Seoul, Republic of Korea; Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
8
|
Deep learning for inferring gene relationships from single-cell expression data. Proc Natl Acad Sci U S A 2019; 116:27151-27158. [PMID: 31822622 DOI: 10.1073/pnas.1911536116] [Citation(s) in RCA: 100] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Several methods were developed to mine gene-gene relationships from expression data. Examples include correlation and mutual information methods for coexpression analysis, clustering and undirected graphical models for functional assignments, and directed graphical models for pathway reconstruction. Using an encoding for gene expression data, followed by deep neural networks analysis, we present a framework that can successfully address all of these diverse tasks. We show that our method, convolutional neural network for coexpression (CNNC), improves upon prior methods in tasks ranging from predicting transcription factor targets to identifying disease-related genes to causality inference. CNNC's encoding provides insights about some of the decisions it makes and their biological basis. CNNC is flexible and can easily be extended to integrate additional types of genomics data, leading to further improvements in its performance.
Collapse
|
9
|
Zhao Z, Dong Q, Liu X, Wei L, Liu L, Li Y, Wang X. Dynamic transcriptome profiling in DNA damage-induced cellular senescence and transient cell-cycle arrest. Genomics 2019; 112:1309-1317. [PMID: 31376528 DOI: 10.1016/j.ygeno.2019.07.020] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2018] [Revised: 04/14/2019] [Accepted: 07/30/2019] [Indexed: 12/13/2022]
Abstract
Cellular senescence is an irreversible cell cycle arrest process associated with aging and senescence-related diseases. DNA damage is an extensive feature of cellular senescence and aging. Different levels of DNA damage could lead to cellular senescence or transient cell-cycle arrest, but the genetic regulatory mechanisms determining cell fate are still not clear. In this work, high-resolution time course analysis of gene expression in DNA damage-induced cellular senescence and transient cell-cycle arrest was used to explore the transcriptomic differences between different cell fates after DNA damage response and to investigate the key regulatory factors affecting senescent cell fates. Pathways such as the cell cycle, DNA repair and cholesterol metabolism showed characteristic differential response. A number of key transcription factors were predicted to regulating cell cycle and DNA repair. Our study provides genome-wide insights into the molecular-level mechanisms of senescent cell fate decisions after DNA damage response.
Collapse
Affiliation(s)
- Zhen Zhao
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and System Biology, BNRist, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Qiongye Dong
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and System Biology, BNRist, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuehui Liu
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and System Biology, BNRist, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Lei Wei
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and System Biology, BNRist, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Liyang Liu
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and System Biology, BNRist, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Yanda Li
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and System Biology, BNRist, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and System Biology, BNRist, Department of Automation, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
10
|
Yang HB, Jiang J, Li LL, Yang HQ, Zhang XY. Biomarker identification of thyroid associated ophthalmopathy using microarray data. Int J Ophthalmol 2018; 11:1482-1488. [PMID: 30225222 DOI: 10.18240/ijo.2018.09.09] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Accepted: 01/03/2018] [Indexed: 01/08/2023] Open
Abstract
AIM To uncover the underlying pathogenesis of thyroid associated ophthalmopathy (TAO) and explore potential biomarkers of this disease. METHODS The expression profile GSE9340, which was downloaded from Gene Expression Omnibus database, included 18 specimens from 10 TAO patients and 8 hyperthyroidism patients without ophthalmopathy. The platform was HumanRef-8 v2 Expression BeadChip. Raw data were normalized using preprocess. Core package and the differentially expressed genes (DEGs) were identified based on t-test with limma package of R. Functional enrichment analyses were performed recruiting the DAVID tool. Based on STRING database, a protein-protein interaction (PPI) network was constructed, from which a module was extracted. The functional enrichment for genes in the module was performed by the BinGO plugin. RESULTS In total, 861 DEGs (433 up-regulated and 428 down-regulated) between TAO patients and hyperthyroidism patients without ophthalmopathy were identified. Crucial nodes in the PPI network included TPX2, CDCA5, PRC1, KIF23 and MKI67, which were also remarkable in the module and all enriched in cell cycle process. Additionally, MKI67 was highly correlated with TAO. Besides, the DEGs of GTF2F1, SMC3, USF1 and ZNF263 were predicted as transcription factors (TFs). CONCLUSION Several crucial genes are identified such as TPX2, CDCA5, PRC1 and KIF23, which all might play significant roles in TAO via the regulation of cell cycle process. Regulatory relationships between TPX2 and CDCA5 as well as between PRC1 and KIF23 may exist. Additionally, MKI67 may be a potent biomarker of TAO, and SMC3 and ZNF263 may exert their roles as TFs in TAO progression.
Collapse
Affiliation(s)
- Hong-Bin Yang
- Department of Ophthalmology, the First Affiliated Hospital of Harbin Medical University, Harbin 150080, Heilongjiang Province, China
| | - Jie Jiang
- Department of Ophthalmology, the First Affiliated Hospital of Harbin Medical University, Harbin 150080, Heilongjiang Province, China
| | - Lu-Lu Li
- Department of Neurosurgery, the Second Affiliated Hospital of Harbin Medical University, Harbin 150080, Heilongjiang Province, China
| | - Huang-Qiang Yang
- Department of Neurosurgery, the Second Affiliated Hospital of Harbin Medical University, Harbin 150080, Heilongjiang Province, China
| | - Xiao-Yu Zhang
- Department of Neurosurgery, the Second Affiliated Hospital of Harbin Medical University, Harbin 150080, Heilongjiang Province, China
| |
Collapse
|
11
|
Abernathy DG, Kim WK, McCoy MJ, Lake AM, Ouwenga R, Lee SW, Xing X, Li D, Lee HJ, Heuckeroth RO, Dougherty JD, Wang T, Yoo AS. MicroRNAs Induce a Permissive Chromatin Environment that Enables Neuronal Subtype-Specific Reprogramming of Adult Human Fibroblasts. Cell Stem Cell 2018; 21:332-348.e9. [PMID: 28886366 DOI: 10.1016/j.stem.2017.08.002] [Citation(s) in RCA: 94] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Revised: 06/26/2017] [Accepted: 08/09/2017] [Indexed: 12/19/2022]
Abstract
Directed reprogramming of human fibroblasts into fully differentiated neurons requires massive changes in epigenetic and transcriptional states. Induction of a chromatin environment permissive for acquiring neuronal subtype identity is therefore a major barrier to fate conversion. Here we show that the brain-enriched miRNAs miR-9/9∗ and miR-124 (miR-9/9∗-124) trigger reconfiguration of chromatin accessibility, DNA methylation, and mRNA expression to induce a default neuronal state. miR-9/9∗-124-induced neurons (miNs) are functionally excitable and uncommitted toward specific subtypes but possess open chromatin at neuronal subtype-specific loci, suggesting that such identity can be imparted by additional lineage-specific transcription factors. Consistently, we show that ISL1 and LHX3 selectively drive conversion to a highly homogeneous population of human spinal cord motor neurons. This study shows that modular synergism between miRNAs and neuronal subtype-specific transcription factors can drive lineage-specific neuronal reprogramming, providing a general platform for high-efficiency generation of distinct subtypes of human neurons.
Collapse
Affiliation(s)
- Daniel G Abernathy
- Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO 63110, USA; Program in Developmental, Regenerative, and Stem Cell Biology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Woo Kyung Kim
- Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Matthew J McCoy
- Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO 63110, USA; Program in Molecular Genetics & Genomics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Allison M Lake
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Rebecca Ouwenga
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Seong Won Lee
- Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Xiaoyun Xing
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Daofeng Li
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Hyung Joo Lee
- Program in Molecular Genetics & Genomics, Washington University School of Medicine, St. Louis, MO 63110, USA; Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Robert O Heuckeroth
- Department of Pediatrics, The Perelman School of Medicine at the University of Pennsylvania, and The Children's Hospital of Philadelphia Research Institute, Philadelphia, PA 19104, USA
| | - Joseph D Dougherty
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Andrew S Yoo
- Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.
| |
Collapse
|
12
|
Guo WL, Huang DS. An efficient method to transcription factor binding sites imputation via simultaneous completion of multiple matrices with positional consistency. MOLECULAR BIOSYSTEMS 2018; 13:1827-1837. [PMID: 28718849 DOI: 10.1039/c7mb00155j] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Transcription factors (TFs) are DNA-binding proteins that have a central role in regulating gene expression. Identification of DNA-binding sites of TFs is a key task in understanding transcriptional regulation, cellular processes and disease. Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) enables genome-wide identification of in vivo TF binding sites. However, it is still difficult to map every TF in every cell line owing to cost and biological material availability, which poses an enormous obstacle for integrated analysis of gene regulation. To address this problem, we propose a novel computational approach, TFBSImpute, for predicting additional TF binding profiles by leveraging information from available ChIP-seq TF binding data. TFBSImpute fuses the dataset to a 3-mode tensor and imputes missing TF binding signals via simultaneous completion of multiple TF binding matrices with positional consistency. We show that signals predicted by our method achieve overall similarity with experimental data and that TFBSImpute significantly outperforms baseline approaches, by assessing the performance of imputation methods against observed ChIP-seq TF binding profiles. Besides, motif analysis shows that TFBSImpute preforms better in capturing binding motifs enriched in observed data compared with baselines, indicating that the higher performance of TFBSImpute is not simply due to averaging related samples. We anticipate that our approach will constitute a useful complement to experimental mapping of TF binding, which is beneficial for further study of regulation mechanisms and disease.
Collapse
Affiliation(s)
- Wei-Li Guo
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, 201804, China.
| | | |
Collapse
|
13
|
Wang Y, Ung MH, Xia T, Cheng W, Cheng C. Cancer cell line specific co-factors modulate the FOXM1 cistrome. Oncotarget 2017; 8:76498-76515. [PMID: 29100329 PMCID: PMC5652723 DOI: 10.18632/oncotarget.20405] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Accepted: 08/14/2017] [Indexed: 12/11/2022] Open
Abstract
ChIP-seq has been commonly applied to identify genomic occupation of transcription factors (TFs) in a context-specific manner. It is generally assumed that a TF should have similar binding patterns in cells from the same or closely related tissues. Surprisingly, this assumption has not been carefully examined. To this end, we systematically compared the genomic binding of the cell cycle regulator FOXM1 in eight cell lines from seven different human tissues at binding signal, peaks and target genes levels. We found that FOXM1 binding in ER-positive breast cancer cell line MCF-7 are distinct comparing to those in not only other non-breast cell lines, but also MDA-MB-231, ER-negative breast cancer cell line. However, binding sites in MDA-MB-231 and non-breast cell lines were highly consistent. The recruitment of estrogen receptor alpha (ERα) caused the unique FOXM1 binding patterns in MCF-7. Moreover, the activity of FOXM1 in MCF-7 reflects the regulatory functions of ERα, while in MDA-MB-231 and non-breast cell lines, FOXM1 activities regulate cell proliferation. Our results suggest that tissue similarity, in some specific contexts, does not hold precedence over TF-cofactors interactions in determining transcriptional states and that the genomic binding of a TF can be dramatically affected by a particular co-factor under certain conditions.
Collapse
Affiliation(s)
- Yue Wang
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China.,Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Matthew H Ung
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Tian Xia
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Wenqing Cheng
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Chao Cheng
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA.,Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, NH 03766, USA.,Department of Biomedical Data Sciences, Geisel School of Medicine at Dartmouth, Lebanon, NH 03766, USA
| |
Collapse
|
14
|
Chang YM, Ling L, Chang YT, Chang YW, Li WH, Shih ACC, Chen CC. Three TF Co-expression Modules Regulate Pressure-Overload Cardiac Hypertrophy in Male Mice. Sci Rep 2017; 7:7560. [PMID: 28790436 PMCID: PMC5548763 DOI: 10.1038/s41598-017-07981-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 07/03/2017] [Indexed: 12/22/2022] Open
Abstract
Pathological cardiac hypertrophy, a dynamic remodeling process, is a major risk factor for heart failure. Although a number of key regulators and related genes have been identified, how the transcription factors (TFs) dynamically regulate the associated genes and control the morphological and electrophysiological changes during the hypertrophic process are still largely unknown. In this study, we obtained the time-course transcriptomes at five time points in four weeks from male murine hearts subjected to transverse aorta banding surgery. From a series of computational analyses, we identified three major co-expression modules of TF genes that may regulate the gene expression changes during the development of cardiac hypertrophy in mice. After pressure overload, the TF genes in Module 1 were up-regulated before the occurrence of significant morphological changes and one week later were down-regulated gradually, while those in Modules 2 and 3 took over the regulation as the heart size increased. Our analyses revealed that the TF genes up-regulated at the early stages likely initiated the cascading regulation and most of the well-known cardiac miRNAs were up-regulated at later stages for suppression. In addition, the constructed time-dependent regulatory network reveals some TFs including Egr2 as new candidate key regulators of cardiovascular-associated (CV) genes.
Collapse
Affiliation(s)
- Yao-Ming Chang
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Li Ling
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Ya-Ting Chang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Yu-Wang Chang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Wen-Hsiung Li
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, 60637, USA
| | | | - Chien-Chang Chen
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.
| |
Collapse
|
15
|
Jo K, Jung I, Moon JH, Kim S. Influence maximization in time bounded network identifies transcription factors regulating perturbed pathways. Bioinformatics 2017; 32:i128-i136. [PMID: 27307609 PMCID: PMC4908359 DOI: 10.1093/bioinformatics/btw275] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Motivation: To understand the dynamic nature of the biological process, it is crucial to identify perturbed pathways in an altered environment and also to infer regulators that trigger the response. Current time-series analysis methods, however, are not powerful enough to identify perturbed pathways and regulators simultaneously. Widely used methods include methods to determine gene sets such as differentially expressed genes or gene clusters and these genes sets need to be further interpreted in terms of biological pathways using other tools. Most pathway analysis methods are not designed for time series data and they do not consider gene-gene influence on the time dimension. Results: In this article, we propose a novel time-series analysis method TimeTP for determining transcription factors (TFs) regulating pathway perturbation, which narrows the focus to perturbed sub-pathways and utilizes the gene regulatory network and protein–protein interaction network to locate TFs triggering the perturbation. TimeTP first identifies perturbed sub-pathways that propagate the expression changes along the time. Starting points of the perturbed sub-pathways are mapped into the network and the most influential TFs are determined by influence maximization technique. The analysis result is visually summarized in TF-Pathway map in time clock. TimeTP was applied to PIK3CA knock-in dataset and found significant sub-pathways and their regulators relevant to the PIP3 signaling pathway. Availability and Implementation: TimeTP is implemented in Python and available at http://biohealth.snu.ac.kr/software/TimeTP/. Supplementary information:Supplementary data are available at Bioinformatics online. Contact:sunkim.bioinfo@snu.ac.kr
Collapse
Affiliation(s)
- Kyuri Jo
- Department of Computer Science and Engineering
| | - Inuk Jung
- Interdisciplinary Program in Bioinformatics
| | | | - Sun Kim
- Department of Computer Science and Engineering Interdisciplinary Program in Bioinformatics Bioinformatics Institute, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
16
|
Ruffalo M, Bar-Joseph Z. Genome wide predictions of miRNA regulation by transcription factors. Bioinformatics 2017; 32:i746-i754. [PMID: 27587697 DOI: 10.1093/bioinformatics/btw452] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
MOTIVATION Reconstructing regulatory networks from expression and interaction data is a major goal of systems biology. While much work has focused on trying to experimentally and computationally determine the set of transcription-factors (TFs) and microRNAs (miRNAs) that regulate genes in these networks, relatively little work has focused on inferring the regulation of miRNAs by TFs. Such regulation can play an important role in several biological processes including development and disease. The main challenge for predicting such interactions is the very small positive training set currently available. Another challenge is the fact that a large fraction of miRNAs are encoded within genes making it hard to determine the specific way in which they are regulated. RESULTS To enable genome wide predictions of TF-miRNA interactions, we extended semi-supervised machine-learning approaches to integrate a large set of different types of data including sequence, expression, ChIP-seq and epigenetic data. As we show, the methods we develop achieve good performance on both a labeled test set, and when analyzing general co-expression networks. We next analyze mRNA and miRNA cancer expression data, demonstrating the advantage of using the predicted set of interactions for identifying more coherent and relevant modules, genes, and miRNAs. The complete set of predictions is available on the supporting website and can be used by any method that combines miRNAs, genes, and TFs. AVAILABILITY AND IMPLEMENTATION Code and full set of predictions are available from the supporting website: http://cs.cmu.edu/~mruffalo/tf-mirna/ CONTACT zivbj@cs.cmu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matthew Ruffalo
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA 15213
| | - Ziv Bar-Joseph
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA 15213
| |
Collapse
|
17
|
Liu S, Zibetti C, Wan J, Wang G, Blackshaw S, Qian J. Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility. BMC Bioinformatics 2017; 18:355. [PMID: 28750606 PMCID: PMC5530957 DOI: 10.1186/s12859-017-1769-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 07/19/2017] [Indexed: 12/04/2022] Open
Abstract
Background Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility profiles provide useful information in prediction of TF binding events in various physiological conditions. Furthermore, ChIP-Seq analysis was used to determine genome-wide binding sites for a range of different TFs in multiple cell types. Integration of these two types of genomic information can improve the prediction of TF binding events. Results We assessed to what extent a model built upon on other TFs and/or other cell types could be used to predict the binding sites of TFs of interest. A random forest model was built using a set of cell type-independent features such as specific sequences recognized by the TFs and evolutionary conservation, as well as cell type-specific features derived from chromatin accessibility data. Our analysis suggested that the models learned from other TFs and/or cell lines performed almost as well as the model learned from the target TF in the cell type of interest. Interestingly, models based on multiple TFs performed better than single-TF models. Finally, we proposed a universal model, BPAC, which was generated using ChIP-Seq data from multiple TFs in various cell types. Conclusion Integrating chromatin accessibility information with sequence information improves prediction of TF binding.The prediction of TF binding is transferable across TFs and/or cell lines suggesting there are a set of universal “rules”. A computational tool was developed to predict TF binding sites based on the universal “rules”. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1769-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sheng Liu
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Cristina Zibetti
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Jun Wan
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Guohua Wang
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Seth Blackshaw
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.,Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.,Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.,Centre for Human Systems Biology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.,Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Jiang Qian
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.
| |
Collapse
|
18
|
Trescher S, Münchmeyer J, Leser U. Estimating genome-wide regulatory activity from multi-omics data sets using mathematical optimization. BMC SYSTEMS BIOLOGY 2017; 11:41. [PMID: 28347313 PMCID: PMC5369021 DOI: 10.1186/s12918-017-0419-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2016] [Accepted: 03/08/2017] [Indexed: 12/28/2022]
Abstract
Background Gene regulation is one of the most important cellular processes, indispensable for the adaptability of organisms and closely interlinked with several classes of pathogenesis and their progression. Elucidation of regulatory mechanisms can be approached by a multitude of experimental methods, yet integration of the resulting heterogeneous, large, and noisy data sets into comprehensive and tissue or disease-specific cellular models requires rigorous computational methods. Recently, several algorithms have been proposed which model genome-wide gene regulation as sets of (linear) equations over the activity and relationships of transcription factors, genes and other factors. Subsequent optimization finds those parameters that minimize the divergence of predicted and measured expression intensities. In various settings, these methods produced promising results in terms of estimating transcription factor activity and identifying key biomarkers for specific phenotypes. However, despite their common root in mathematical optimization, they vastly differ in the types of experimental data being integrated, the background knowledge necessary for their application, the granularity of their regulatory model, the concrete paradigm used for solving the optimization problem and the data sets used for evaluation. Results Here, we review five recent methods of this class in detail and compare them with respect to several key properties. Furthermore, we quantitatively compare the results of four of the presented methods based on publicly available data sets. Conclusions The results show that all methods seem to find biologically relevant information. However, we also observe that the mutual result overlaps are very low, which contradicts biological intuition. Our aim is to raise further awareness of the power of these methods, yet also to identify common shortcomings and necessary extensions enabling focused research on the critical points. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0419-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Saskia Trescher
- Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany.
| | - Jannes Münchmeyer
- Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
| | - Ulf Leser
- Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
| |
Collapse
|
19
|
Qin Q, Feng J. Imputation for transcription factor binding predictions based on deep learning. PLoS Comput Biol 2017; 13:e1005403. [PMID: 28234893 PMCID: PMC5345877 DOI: 10.1371/journal.pcbi.1005403] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Revised: 03/10/2017] [Accepted: 02/09/2017] [Indexed: 01/11/2023] Open
Abstract
Understanding the cell-specific binding patterns of transcription factors (TFs) is fundamental to studying gene regulatory networks in biological systems, for which ChIP-seq not only provides valuable data but is also considered as the gold standard. Despite tremendous efforts from the scientific community to conduct TF ChIP-seq experiments, the available data represent only a limited percentage of ChIP-seq experiments, considering all possible combinations of TFs and cell lines. In this study, we demonstrate a method for accurately predicting cell-specific TF binding for TF-cell line combinations based on only a small fraction (4%) of the combinations using available ChIP-seq data. The proposed model, termed TFImpute, is based on a deep neural network with a multi-task learning setting to borrow information across transcription factors and cell lines. Compared with existing methods, TFImpute achieves comparable accuracy on TF-cell line combinations with ChIP-seq data; moreover, TFImpute achieves better accuracy on TF-cell line combinations without ChIP-seq data. This approach can predict cell line specific enhancer activities in K562 and HepG2 cell lines, as measured by massively parallel reporter assays, and predicts the impact of SNPs on TF binding. Transcription factors play a central role in regulating various cellular processes. They bind to DNA in a cell-specific way. To study where a TF would bind to DNA, ChIP-seq experiment has been developed and widely adopted by the science community to study genome-wide in vivo protein-DNA interactions. However, for each TF, only a limited number of cell types have been explored by ChIP-seq experiments. To study the binding of a TF to a DNA sequence in a cell line without corresponding ChIP-seq data, researchers would check whether there is a motif for the TF in the sequence. However, motif alone contains only sequence information and therefore cannot reflect the cell specificity of TF binding. In this work, we demonstrate how to model the TF binding problem using deep learning and achieve cell specific binding prediction for TF-cell line combinations without ChIP-seq data.
Collapse
Affiliation(s)
- Qian Qin
- Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Jianxing Feng
- Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, Shanghai, China
- * E-mail:
| |
Collapse
|
20
|
Gustafsson M, Gawel DR, Alfredsson L, Baranzini S, Björkander J, Blomgran R, Hellberg S, Eklund D, Ernerudh J, Kockum I, Konstantinell A, Lahesmaa R, Lentini A, Liljenström HRI, Mattson L, Matussek A, Mellergård J, Mendez M, Olsson T, Pujana MA, Rasool O, Serra-Musach J, Stenmarker M, Tripathi S, Viitala M, Wang H, Zhang H, Nestor CE, Benson M. A validated gene regulatory network and GWAS identifies early regulators of T cell-associated diseases. Sci Transl Med 2016; 7:313ra178. [PMID: 26560356 DOI: 10.1126/scitranslmed.aad2722] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Early regulators of disease may increase understanding of disease mechanisms and serve as markers for presymptomatic diagnosis and treatment. However, early regulators are difficult to identify because patients generally present after they are symptomatic. We hypothesized that early regulators of T cell-associated diseases could be found by identifying upstream transcription factors (TFs) in T cell differentiation and by prioritizing hub TFs that were enriched for disease-associated polymorphisms. A gene regulatory network (GRN) was constructed by time series profiling of the transcriptomes and methylomes of human CD4(+) T cells during in vitro differentiation into four helper T cell lineages, in combination with sequence-based TF binding predictions. The TFs GATA3, MAF, and MYB were identified as early regulators and validated by ChIP-seq (chromatin immunoprecipitation sequencing) and small interfering RNA knockdowns. Differential mRNA expression of the TFs and their targets in T cell-associated diseases supports their clinical relevance. To directly test if the TFs were altered early in disease, T cells from patients with two T cell-mediated diseases, multiple sclerosis and seasonal allergic rhinitis, were analyzed. Strikingly, the TFs were differentially expressed during asymptomatic stages of both diseases, whereas their targets showed altered expression during symptomatic stages. This analytical strategy to identify early regulators of disease by combining GRNs with genome-wide association studies may be generally applicable for functional and clinical studies of early disease development.
Collapse
Affiliation(s)
- Mika Gustafsson
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden. Bioinformatics, Department of Physics, Chemistry, and Biology, Linköping University, SE-581 83 Linköping, Sweden.
| | - Danuta R Gawel
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden
| | - Lars Alfredsson
- Institute of Environmental Medicine, Karolinska Institutet, SE-171 77 Solna, Sweden
| | - Sergio Baranzini
- Department of Neurology, University of California, San Francisco, CA 94158, USA
| | - Janne Björkander
- Futurum-Academy for Health and Care, County Council of Jönköping, SE-551 85 Jönköping, Sweden
| | - Robert Blomgran
- Department of Clinical and Experimental Medicine, Division of Microbiology and Molecular Medicine, Linköping University, SE-581 83 Linköping, Sweden
| | - Sandra Hellberg
- Department of Clinical and Experimental Medicine, Division of Clinical Immunology, Unit of Autoimmunity and Immune Regulation, Linköping University, SE-581 83 Linköping, Sweden
| | - Daniel Eklund
- Department of Clinical Immunology and Transfusion Medicine, Linköping University, SE-581 83 Linköping, Sweden
| | - Jan Ernerudh
- Department of Clinical and Experimental Medicine, Division of Clinical Immunology, Unit of Autoimmunity and Immune Regulation, Linköping University, SE-581 83 Linköping, Sweden. Department of Clinical Immunology and Transfusion Medicine, Linköping University, SE-581 83 Linköping, Sweden
| | - Ingrid Kockum
- Department of Clinical Neurosciences, Karolinska Institutet and Centrum for Molecular Medicine, SE-171 77 Stockholm, Sweden
| | - Aelita Konstantinell
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden. Department of Medical Biology, The Arctic University of Norway, NO-9037 Tromsø, Norway
| | - Riita Lahesmaa
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520 Turku, Finland
| | - Antonio Lentini
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden
| | - H Robert I Liljenström
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden
| | - Lina Mattson
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden
| | - Andreas Matussek
- Futurum-Academy for Health and Care, County Council of Jönköping, SE-551 85 Jönköping, Sweden
| | - Johan Mellergård
- Department of Neurology and Department of Clinical and Experimental Medicine, Linköping University, SE-581 83 Linköping, Sweden
| | - Melissa Mendez
- Laboratorio de Investigación en Enfermedades Infecciosas, LID, Universidad Peruana Cayetano Heredia, Lima PE-15102, Peru
| | - Tomas Olsson
- Department of Clinical Neurosciences, Karolinska Institutet and Centrum for Molecular Medicine, SE-171 77 Stockholm, Sweden
| | - Miguel A Pujana
- Program Against Cancer Therapeutic Resistance (ProCURE), Cancer and Systems Biology Unit, Catalan Institute of Oncology, IDIBELL, L'Hospitalet del Llobregat, ES-08908 Barcelona, Spain
| | - Omid Rasool
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520 Turku, Finland
| | - Jordi Serra-Musach
- Program Against Cancer Therapeutic Resistance (ProCURE), Cancer and Systems Biology Unit, Catalan Institute of Oncology, IDIBELL, L'Hospitalet del Llobregat, ES-08908 Barcelona, Spain
| | - Margaretha Stenmarker
- Futurum-Academy for Health and Care, County Council of Jönköping, SE-551 85 Jönköping, Sweden
| | - Subhash Tripathi
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520 Turku, Finland
| | - Miro Viitala
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520 Turku, Finland
| | - Hui Wang
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden. Department of Immunology, MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Huan Zhang
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden
| | - Colm E Nestor
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden
| | - Mikael Benson
- The Centre for Individualised Medicine, Department of Clinical and Experimental Medicine, Division of Pediatrics, Linköping University, SE-581 83 Linköping, Sweden.
| |
Collapse
|
21
|
Gitter A, Bar-Joseph Z. The SDREM Method for Reconstructing Signaling and Regulatory Response Networks: Applications for Studying Disease Progression. Methods Mol Biol 2016; 1303:493-506. [PMID: 26235087 DOI: 10.1007/978-1-4939-2627-5_30] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The Signaling and Dynamic Regulatory Events Miner (SDREM) is a powerful computational approach for identifying which signaling pathways and transcription factors control the temporal cellular response to a stimulus. SDREM builds end-to-end response models by combining condition-independent protein-protein interactions and transcription factor binding data with two types of condition-specific data: source proteins that detect the stimulus and changes in gene expression over time. Here we describe how to apply SDREM to study human diseases, using epidermal growth factor (EGF) response impacting neurogenesis and Alzheimer's disease as an example.
Collapse
|
22
|
Tsai DY, Hung KH, Lin IY, Su ST, Wu SY, Chung CH, Wang TC, Li WH, Shih ACC, Lin KI. Uncovering MicroRNA Regulatory Hubs that Modulate Plasma Cell Differentiation. Sci Rep 2015; 5:17957. [PMID: 26655851 PMCID: PMC4675970 DOI: 10.1038/srep17957] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Accepted: 11/09/2015] [Indexed: 01/08/2023] Open
Abstract
Using genome-wide approaches, we studied the microRNA (miRNA) expression profile during human plasma cell (PC) differentiation induced by stimulation of human blood B cells with T follicular helper cell–dependent signals. Combining the profiles of differentially expressed genes in PC differentiation with gene ontology (GO) analysis revealed that a significant group of genes involved in the transcription factor (TF) activity was preferentially changed. We thus focused on studying the effects of differentially expressed miRNAs on several key TFs in PC differentiation. Cohorts of differentially expressed miRNAs cooperating as miRNA hubs were predicted and validated to modulate key TFs, including a down-regulated miRNA hub containing miR-101-3p, -125b-5p, and -223-3p contributing to induction of PRDM1 as well as an up-regulated miRNA hub containing miR-34a-5p, -148a-3p, and -183-5p suppressing BCL6, BACH2, and FOXP1. Induced expression of NF-κB and PRDM1 during PC differentiation controlled the expression of up- and down-regulated miRNA hubs, respectively. Co-expression of miR-101-3p, -125b-5p, and -223-3p in stimulated B cells showed synergistic effects on inhibition of PC formation, which can be rescued by re-introduction of PRDM1. Together, we catalogue the complex roadmap of miRNAs and their functional interplay in collaboratively directing PC differentiation.
Collapse
Affiliation(s)
- Dong-Yan Tsai
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan.,Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei 112, Taiwan
| | - Kuo-Hsuan Hung
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan.,Institute of Microbiology and Immunology, National Yang-Ming University, Taipei 112, Taiwan
| | - I-Ying Lin
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Shin-Tang Su
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Shih-Ying Wu
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - Cheng-Han Chung
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Tong-Cheng Wang
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan.,Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Wen-Hsiung Li
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan.,Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan
| | | | - Kuo-I Lin
- Genomics Research Center, Academia Sinica, Taipei 115, Taiwan.,Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei 112, Taiwan
| |
Collapse
|
23
|
Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast. PLoS Comput Biol 2015; 11:e1004418. [PMID: 26291518 PMCID: PMC4546298 DOI: 10.1371/journal.pcbi.1004418] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 06/29/2015] [Indexed: 11/19/2022] Open
Abstract
Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA “intrinsic properties” (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome. Identification of transcription factor binding sites based on sequence motifs is typically accompanied by a high false positive rate. Increasing evidence suggests that there are many other factors besides DNA sequence that may affect the binding and interaction of TFs with DNA. Through the integration of sequence motif, chromatin state, and DNA structure properties, we show that TF binding can be better predicted. Moreover, considering chromatin state and DNA structure properties simultaneously yields a significant improvement. While the binding of some TFs can be readily predicted using either chromatin state information or DNA structure, other TFs need both. Thus, our findings provide insights on how different histone modifications and DNA structure properties may influence the binding of a particular TF and thus how TFs regulate gene expression. These features are referred to as sequence “intrinsic properties” because they can be predicted from sequences alone. These intrinsic properties can be used to build a TF binding prediction model that has a similar performance to considering all features. Moreover, the intrinsic property model allows TFBS predictions not only across TFs, but also across DNA-binding domain families that are present in most eukaryotes, suggesting that the model likely can be used across species.
Collapse
|
24
|
Yang J, Ramsey SA. A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites. Bioinformatics 2015; 31:3445-50. [PMID: 26130577 DOI: 10.1093/bioinformatics/btv391] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Accepted: 06/24/2015] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION The position-weight matrix (PWM) is a useful representation of a transcription factor binding site (TFBS) sequence pattern because the PWM can be estimated from a small number of representative TFBS sequences. However, because the PWM probability model assumes independence between individual nucleotide positions, the PWMs for some TFs poorly discriminate binding sites from non-binding-sites that have similar sequence content. Since the local three-dimensional DNA structure ('shape') is a determinant of TF binding specificity and since DNA shape has a significant sequence-dependence, we combined DNA shape-derived features into a TF-generalized regulatory score and tested whether the score could improve PWM-based discrimination of TFBS from non-binding-sites. RESULTS We compared a traditional PWM model to a model that combines the PWM with a DNA shape feature-based regulatory potential score, for accuracy in detecting binding sites for 75 vertebrate transcription factors. The PWM+shape model was more accurate than the PWM-only model, for 45% of TFs tested, with no significant loss of accuracy for the remaining TFs. AVAILABILITY AND IMPLEMENTATION The shape-based model is available as an open-source R package at that is archived on the GitHub software repository at https://github.com/ramseylab/regshape/. CONTACT stephen.ramsey@oregonstate.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Stephen A Ramsey
- Department of Biomedical Sciences and School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
25
|
Imrichová H, Hulselmans G, Atak ZK, Potier D, Aerts S. i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly. Nucleic Acids Res 2015; 43:W57-64. [PMID: 25925574 PMCID: PMC4489282 DOI: 10.1093/nar/gkv395] [Citation(s) in RCA: 125] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Accepted: 04/15/2015] [Indexed: 12/21/2022] Open
Abstract
i-cisTarget is a web tool to predict regulators of a set of genomic regions, such as ChIP-seq peaks or co-regulated/similar enhancers. i-cisTarget can also be used to identify upstream regulators and their target enhancers starting from a set of co-expressed genes. Whereas the original version of i-cisTarget was focused on Drosophila data, the 2015 update also provides support for human and mouse data. i-cisTarget detects transcription factor motifs (position weight matrices) and experimental data tracks (e.g. from ENCODE, Roadmap Epigenomics) that are enriched in the input set of regions. As experimental data tracks we include transcription factor ChIP-seq data, histone modification ChIP-seq data and open chromatin data. The underlying processing method is based on a ranking-and-recovery procedure, allowing accurate determination of enrichment across heterogeneous datasets, while also discriminating direct from indirect target regions through a ‘leading edge’ analysis. We illustrate i-cisTarget on various Ewing sarcoma datasets to identify EWS-FLI1 targets starting from ChIP-seq, differential ATAC-seq, differential H3K27ac and differential gene expression data. Use of i-cisTarget is free and open to all, and there is no login requirement. Address: http://gbiomed.kuleuven.be/apps/lcb/i-cisTarget.
Collapse
Affiliation(s)
- Hana Imrichová
- Laboratory of Computational Biology, Center for Human Genetics, University of Leuven, 3000 Leuven, Belgium
| | - Gert Hulselmans
- Laboratory of Computational Biology, Center for Human Genetics, University of Leuven, 3000 Leuven, Belgium
| | - Zeynep Kalender Atak
- Laboratory of Computational Biology, Center for Human Genetics, University of Leuven, 3000 Leuven, Belgium
| | - Delphine Potier
- Laboratory of Computational Biology, Center for Human Genetics, University of Leuven, 3000 Leuven, Belgium
| | - Stein Aerts
- Laboratory of Computational Biology, Center for Human Genetics, University of Leuven, 3000 Leuven, Belgium
| |
Collapse
|
26
|
Decoding the regulatory landscape of melanoma reveals TEADS as regulators of the invasive cell state. Nat Commun 2015; 6:6683. [PMID: 25865119 PMCID: PMC4403341 DOI: 10.1038/ncomms7683] [Citation(s) in RCA: 294] [Impact Index Per Article: 32.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Accepted: 02/16/2015] [Indexed: 12/18/2022] Open
Abstract
Transcriptional reprogramming of proliferative melanoma cells into a phenotypically distinct invasive cell subpopulation is a critical event at the origin of metastatic spreading. Here we generate transcriptome, open chromatin and histone modification maps of melanoma cultures; and integrate this data with existing transcriptome and DNA methylation profiles from tumour biopsies to gain insight into the mechanisms underlying this key reprogramming event. This shows thousands of genomic regulatory regions underlying the proliferative and invasive states, identifying SOX10/MITF and AP-1/TEAD as regulators, respectively. Knockdown of TEADs shows a previously unrecognized role in the invasive gene network and establishes a causative link between these transcription factors, cell invasion and sensitivity to MAPK inhibitors. Using regulatory landscapes and in silico analysis, we show that transcriptional reprogramming underlies the distinct cellular states present in melanoma. Furthermore, it reveals an essential role for the TEADs, linking it to clinically relevant mechanisms such as invasion and resistance.
Collapse
|
27
|
Gong W, Koyano-Nakagawa N, Li T, Garry DJ. Inferring dynamic gene regulatory networks in cardiac differentiation through the integration of multi-dimensional data. BMC Bioinformatics 2015; 16:74. [PMID: 25887857 PMCID: PMC4359553 DOI: 10.1186/s12859-015-0460-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2014] [Accepted: 01/12/2015] [Indexed: 02/07/2023] Open
Abstract
Background Decoding the temporal control of gene expression patterns is key to the understanding of the complex mechanisms that govern developmental decisions during heart development. High-throughput methods have been employed to systematically study the dynamic and coordinated nature of cardiac differentiation at the global level with multiple dimensions. Therefore, there is a pressing need to develop a systems approach to integrate these data from individual studies and infer the dynamic regulatory networks in an unbiased fashion. Results We developed a two-step strategy to integrate data from (1) temporal RNA-seq, (2) temporal histone modification ChIP-seq, (3) transcription factor (TF) ChIP-seq and (4) gene perturbation experiments to reconstruct the dynamic network during heart development. First, we trained a logistic regression model to predict the probability (LR score) of any base being bound by 543 TFs with known positional weight matrices. Second, four dimensions of data were combined using a time-varying dynamic Bayesian network model to infer the dynamic networks at four developmental stages in the mouse [mouse embryonic stem cells (ESCs), mesoderm (MES), cardiac progenitors (CP) and cardiomyocytes (CM)]. Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes. The LR scores of experimentally verified ESCs and heart enhancers were significantly higher than random regions (p <10−100), suggesting that a high LR score is a reliable indicator for functional TF binding sites. Our network inference model identified a region with an elevated LR score approximately −9400 bp upstream of the transcriptional start site of Nkx2-5, which overlapped with a previously reported enhancer region (−9435 to −8922 bp). TFs such as Tead1, Gata4, Msx2, and Tgif1 were predicted to bind to this region and participate in the regulation of Nkx2-5 gene expression. Our model also predicted the key regulatory networks for the ESC-MES, MES-CP and CP-CM transitions. Conclusion We report a novel method to systematically integrate multi-dimensional -omics data and reconstruct the gene regulatory networks. This method will allow one to rapidly determine the cis-modules that regulate key genes during cardiac differentiation. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0460-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wuming Gong
- Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN, 55114, USA.
| | - Naoko Koyano-Nakagawa
- Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN, 55114, USA.
| | - Tongbin Li
- AccuraScience LLC, 5721 Merle Hay Road, Suite #16B, Johnston, IA, 50131, USA.
| | - Daniel J Garry
- Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN, 55114, USA.
| |
Collapse
|
28
|
Su D, Wang X, Campbell MR, Song L, Safi A, Crawford GE, Bell DA. Interactions of chromatin context, binding site sequence content, and sequence evolution in stress-induced p53 occupancy and transactivation. PLoS Genet 2015; 11:e1004885. [PMID: 25569532 PMCID: PMC4287438 DOI: 10.1371/journal.pgen.1004885] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2014] [Accepted: 11/10/2014] [Indexed: 01/10/2023] Open
Abstract
Cellular stresses activate the tumor suppressor p53 protein leading to selective binding to DNA response elements (REs) and gene transactivation from a large pool of potential p53 REs (p53REs). To elucidate how p53RE sequences and local chromatin context interact to affect p53 binding and gene transactivation, we mapped genome-wide binding localizations of p53 and H3K4me3 in untreated and doxorubicin (DXR)-treated human lymphoblastoid cells. We examined the relationships among p53 occupancy, gene expression, H3K4me3, chromatin accessibility (DNase 1 hypersensitivity, DHS), ENCODE chromatin states, p53RE sequence, and evolutionary conservation. We observed that the inducible expression of p53-regulated genes was associated with the steady-state chromatin status of the cell. Most highly inducible p53-regulated genes were suppressed at baseline and marked by repressive histone modifications or displayed CTCF binding. Comparison of p53RE sequences residing in different chromatin contexts demonstrated that weaker p53REs resided in open promoters, while stronger p53REs were located within enhancers and repressed chromatin. p53 occupancy was strongly correlated with similarity of the target DNA sequences to the p53RE consensus, but surprisingly, inversely correlated with pre-existing nucleosome accessibility (DHS) and evolutionary conservation at the p53RE. Occupancy by p53 of REs that overlapped transposable element (TE) repeats was significantly higher (p<10−7) and correlated with stronger p53RE sequences (p<10−110) relative to nonTE-associated p53REs, particularly for MLT1H, LTR10B, and Mer61 TEs. However, binding at these elements was generally not associated with transactivation of adjacent genes. Occupied p53REs located in L2-like TEs were unique in displaying highly negative PhyloP scores (predicted fast-evolving) and being associated with altered H3K4me3 and DHS levels. These results underscore the systematic interaction between chromatin status and p53RE context in the induced transactivation response. This p53 regulated response appears to have been tuned via evolutionary processes that may have led to repression and/or utilization of p53REs originating from primate-specific transposon elements. It is well established that p53 binds DNA elements near p53 target genes to regulate the response to cellular stress. To assess factors influencing binding to response elements and subsequent gene expression, we have analyzed 2932 p53-occupied response elements (p53REs) in the context of genome-wide chromatin state, DNA accessibility and dynamics, and considered roles for binding-sequence specificity and evolutionary conservation. While p53 occupancy level shows little apparent direct relationship to gene expression change, after grouping expressed genes by their chromatin status at baseline, a relationship between occupancy of p53REs and gene expression change emerged. Analysis of p53RE sequences demonstrated that p53 occupancy was strongly correlated with sequence similarity to p53RE consensus, but surprisingly, was inversely correlated with nucleosome accessibility (DHS) and evolutionary conservation. These data revealed a systematic interaction between p53RE content and chromatin context that affects both quantitative p53 occupancy and the induced transactivation response to exposure. Moreover, this interaction appears to have been tuned via evolutionary events involving transposable elements, which strongly bind p53, but in only a few instances affect gene expression levels. Models of p53-regulated gene expression response that consider both chromatin state and sequence context may prove useful in guiding strategies for cancer prevention or therapy.
Collapse
Affiliation(s)
- Dan Su
- Environmental Genomics Group, Laboratory of Molecular Genetics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Xuting Wang
- Environmental Genomics Group, Laboratory of Molecular Genetics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Michelle R. Campbell
- Environmental Genomics Group, Laboratory of Molecular Genetics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Lingyun Song
- Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, United States of America
| | - Alexias Safi
- Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, United States of America
| | - Gregory E. Crawford
- Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, United States of America
| | - Douglas A. Bell
- Environmental Genomics Group, Laboratory of Molecular Genetics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
29
|
Jain S, Gitter A, Bar-Joseph Z. Multitask learning of signaling and regulatory networks with application to studying human response to flu. PLoS Comput Biol 2014; 10:e1003943. [PMID: 25522349 PMCID: PMC4270428 DOI: 10.1371/journal.pcbi.1003943] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 09/28/2014] [Indexed: 01/04/2023] Open
Abstract
Reconstructing regulatory and signaling response networks is one of the major goals of systems biology. While several successful methods have been suggested for this task, some integrating large and diverse datasets, these methods have so far been applied to reconstruct a single response network at a time, even when studying and modeling related conditions. To improve network reconstruction we developed MT-SDREM, a multi-task learning method which jointly models networks for several related conditions. In MT-SDREM, parameters are jointly constrained across the networks while still allowing for condition-specific pathways and regulation. We formulate the multi-task learning problem and discuss methods for optimizing the joint target function. We applied MT-SDREM to reconstruct dynamic human response networks for three flu strains: H1N1, H5N1 and H3N2. Our multi-task learning method was able to identify known and novel factors and genes, improving upon prior methods that model each condition independently. The MT-SDREM networks were also better at identifying proteins whose removal affects viral load indicating that joint learning can still lead to accurate, condition-specific, networks. Supporting website with MT-SDREM implementation: http://sb.cs.cmu.edu/mtsdrem To understand why some flu strains are more virulent than others, researchers attempt to profile and model the molecular human response to these strains and identify similarities and differences between the resulting models. So far, the modeling and analysis part has been done independently for each strain and the results contrasted in a post-processing step. Here we present a new method, termed MT-SDREM, that simultaneously models the response to all strains allowing us to identify both, the core response elements that are shared among the strains, and factors that are uniquely activated or repressed by individual strains. We applied this method to study the human response to three flu strains: H1N1, H3N2 and H5N1. As we show, the method was able to correctly identify several common and known factors regulating immune response to such strains and also identified unique factors for each of the strains. The models reconstructed by the simultaneous analysis method improved upon those generated by methods that model each strain response separately. Our joint models can be used to identify strain specific treatments as well as treatments that are likely to be effective against all three strains.
Collapse
Affiliation(s)
- Siddhartha Jain
- Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Anthony Gitter
- Microsoft Research, Cambridge, Massachusetts, United States of America
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Ziv Bar-Joseph
- Lane Center for Computational Biology and Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
30
|
Wise A, Bar-Joseph Z. SMARTS: reconstructing disease response networks from multiple individuals using time series gene expression data. ACTA ACUST UNITED AC 2014; 31:1250-7. [PMID: 25480376 DOI: 10.1093/bioinformatics/btu800] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2014] [Accepted: 11/26/2014] [Indexed: 02/02/2023]
Abstract
MOTIVATION Current methods for reconstructing dynamic regulatory networks are focused on modeling a single response network using model organisms or cell lines. Unlike these models or cell lines, humans differ in their background expression profiles due to age, genetics and life factors. In addition, there are often differences in start and end times for time series human data and in the rate of progress based on the specific individual. Thus, new methods are required to integrate time series data from multiple individuals when modeling and constructing disease response networks. RESULTS We developed Scalable Models for the Analysis of Regulation from Time Series (SMARTS), a method integrating static and time series data from multiple individuals to reconstruct condition-specific response networks in an unsupervised way. Using probabilistic graphical models, SMARTS iterates between reconstructing different regulatory networks and assigning individuals to these networks, taking into account varying individual start times and response rates. These models can be used to group different sets of patients and to identify transcription factors that differentiate the observed responses between these groups. We applied SMARTS to analyze human response to influenza and mouse brain development. In both cases, it was able to greatly improve baseline groupings while identifying key relevant TFs that differ between the groups. Several of these groupings and TFs are known to regulate the relevant processes while others represent novel hypotheses regarding immune response and development. AVAILABILITY AND IMPLEMENTATION Software and Supplementary information are available at http://sb.cs.cmu.edu/smarts/. CONTACT zivbj@cs.cmu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aaron Wise
- Lane Center for Computational Biology and Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ziv Bar-Joseph
- Lane Center for Computational Biology and Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA Lane Center for Computational Biology and Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
31
|
Nguyen N, Vo A, Choi I, Won KJ. A stationary wavelet entropy-based clustering approach accurately predicts gene expression. J Comput Biol 2014; 22:236-49. [PMID: 25383910 DOI: 10.1089/cmb.2014.0221] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Studying epigenetic landscapes is important to understand the condition for gene regulation. Clustering is a useful approach to study epigenetic landscapes by grouping genes based on their epigenetic conditions. However, classical clustering approaches that often use a representative value of the signals in a fixed-sized window do not fully use the information written in the epigenetic landscapes. Clustering approaches to maximize the information of the epigenetic signals are necessary for better understanding gene regulatory environments. For effective clustering of multidimensional epigenetic signals, we developed a method called Dewer, which uses the entropy of stationary wavelet of epigenetic signals inside enriched regions for gene clustering. Interestingly, the gene expression levels were highly correlated with the entropy levels of epigenetic signals. Dewer separates genes better than a window-based approach in the assessment using gene expression and achieved a correlation coefficient above 0.9 without using any training procedure. Our results show that the changes of the epigenetic signals are useful to study gene regulation.
Collapse
Affiliation(s)
- Nha Nguyen
- 1 Department of Genetics, School of Medicine, University of Pennsylvania , Philadelphia, Pennsylvania
| | | | | | | |
Collapse
|
32
|
Nguyen N, Vo A, Won KJ. A wavelet approach to detect enriched regions and explore epigenomic landscapes. J Comput Biol 2014; 21:846-54. [PMID: 25072902 DOI: 10.1089/cmb.2014.0095] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Epigenetic landscapes represent how cells regulate gene activity. To understand their effect on gene regulation, it is important to detect their occupancy in the genome. Unlike transcription factors whose binding regions are limited to narrow regions, histone modification marks are enriched over broader areas. The stochastic characteristics unique to each mark make it hard to detect their enrichment. Classically, a predefined window has been used to detect their enrichment. However, these approaches heavily rely on the predetermined parameters. Also, the window-based approaches cannot handle the enrichment of multiple marks. We propose a novel algorithm, called SeqW, to detect enrichment of multiple histone modification marks. SeqW applies a zooming approach to detect a broadly enriched domain. The zooming approach helps domain detection by increasing signal-to-noise ratio. The borders of the domains are detected by studying the characteristics of signals in the wavelet domain. We show that SeqW outperformed previous predictors in detecting broad peaks. Also, we applied SeqW in studying spatial combinations of histone modification patterns.
Collapse
Affiliation(s)
- Nha Nguyen
- 1 Department of Genetics, School of Medicine, University of Pennsylvania , Philadelphia, Pennsylvania
| | | | | |
Collapse
|
33
|
Nie Y, Cheng X, Chen J, Sun X. Nucleosome organization in the vicinity of transcription factor binding sites in the human genome. BMC Genomics 2014; 15:493. [PMID: 24942981 PMCID: PMC4073502 DOI: 10.1186/1471-2164-15-493] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2013] [Accepted: 06/10/2014] [Indexed: 12/23/2022] Open
Abstract
Background The binding of transcription factors (TFs) to specific DNA sequences is an initial and crucial step of transcription. In eukaryotes, this process is highly dependent on the local chromatin state, which can be modified by recruiting chromatin remodelers. However, previous studies have focused mainly on nucleosome occupancy around the TF binding sites (TFBSs) of a few specific TFs. Here, we investigated the nucleosome occupancy profiles around computationally inferred binding sites, based on 519 TF binding motifs, in human GM12878 and K562 cells. Results Although high nucleosome occupancy is intrinsically encoded at TFBSs in vitro, nucleosomes are generally depleted at TFBSs in vivo, and approximately a quarter of TFBSs showed well-positioned in vivo nucleosomes on both sides. RNA polymerase near the transcription start site (TSS) has a large effect on the nucleosome occupancy distribution around the binding sites located within one kilobase to the nearest TSS; fuzzier nucleosome positioning was thus observed around these sites. In addition, in contrast to yeast, repressors, rather than activators, were more likely to bind to nucleosomal DNA in the human cells, and nucleosomes around repressor sites were better positioned in vivo. Genes with repressor sites exhibiting well-positioned nucleosomes on both sides, and genes with activator sites occupied by nucleosomes had significantly lower expression, suggesting that actions of activators and repressors are associated with the nucleosome occupancy around their binding sites. It was also interesting to note that most of the binding sites, which were not in the DNase I-hypersensitive regions, were cell-type specific, and higher in vivo nucleosome occupancy were observed at these binding sites. Conclusions This study demonstrated that RNA polymerase and the functions of bound TFs affected the local nucleosome occupancy around TFBSs, and nucleosome occupancy patterns around TFBSs were associated with the expression levels of target genes. Electronic supplementary material The online version of this article (doi: 10.1186/1471-2164-15-493) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, 210096 Nanjing, China.
| |
Collapse
|
34
|
The E2F transcription factors regulate tumor development and metastasis in a mouse model of metastatic breast cancer. Mol Cell Biol 2014; 34:3229-43. [PMID: 24934442 DOI: 10.1128/mcb.00737-14] [Citation(s) in RCA: 84] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
While the E2F transcription factors (E2Fs) have a clearly defined role in cell cycle control, recent work has uncovered new functions. Using genomic signature methods, we predicted a role for the activator E2F transcription factors in the mouse mammary tumor virus (MMTV)-polyomavirus middle T oncoprotein (PyMT) mouse model of metastatic breast cancer. To genetically test the hypothesis that the E2Fs function to regulate tumor development and metastasis, we interbred MMTV-PyMT mice with E2F1, E2F2, or E2F3 knockout mice. With the ablation of individual E2Fs, we noted alterations of tumor latency, histology, and vasculature. Interestingly, we noted striking reductions in metastatic capacity and in the number of circulating tumor cells in both the E2F1 and E2F2 knockout backgrounds. Investigating E2F target genes that mediate metastasis, we found that E2F loss led to decreased levels of vascular endothelial growth factor (Vegfa), Bmp4, Cyr61, Nupr1, Plod 2, P4ha1, Adamts1, Lgals3, and Angpt2. These gene expression changes indicate that the E2Fs control the expression of genes critical to angiogenesis, the remodeling of the extracellular matrix, tumor cell survival, and tumor cell interactions with vascular endothelial cells that facilitate metastasis to the lungs. Taken together, these results reveal that the E2F transcription factors play key roles in mediating tumor development and metastasis in addition to their well-characterized roles in cell cycle control.
Collapse
|
35
|
Santra T. A bayesian framework that integrates heterogeneous data for inferring gene regulatory networks. Front Bioeng Biotechnol 2014; 2:13. [PMID: 25152886 PMCID: PMC4126456 DOI: 10.3389/fbioe.2014.00013] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Accepted: 04/28/2014] [Indexed: 11/29/2022] Open
Abstract
Reconstruction of gene regulatory networks (GRNs) from experimental data is a fundamental challenge in systems biology. A number of computational approaches have been developed to infer GRNs from mRNA expression profiles. However, expression profiles alone are proving to be insufficient for inferring GRN topologies with reasonable accuracy. Recently, it has been shown that integration of external data sources (such as gene and protein sequence information, gene ontology data, protein-protein interactions) with mRNA expression profiles may increase the reliability of the inference process. Here, I propose a new approach that incorporates transcription factor binding sites (TFBS) and physical protein interactions (PPI) among transcription factors (TFs) in a Bayesian variable selection (BVS) algorithm which can infer GRNs from mRNA expression profiles subjected to genetic perturbations. Using real experimental data, I show that the integration of TFBS and PPI data with mRNA expression profiles leads to significantly more accurate networks than those inferred from expression profiles alone. Additionally, the performance of the proposed algorithm is compared with a series of least absolute shrinkage and selection operator (LASSO) regression-based network inference methods that can also incorporate prior knowledge in the inference framework. The results of this comparison suggest that BVS can outperform LASSO regression-based method in some circumstances.
Collapse
Affiliation(s)
- Tapesh Santra
- Systems Biology Ireland, University College Dublin, Dublin, Ireland
| |
Collapse
|
36
|
Nygård S, Reitan T, Clancy T, Nygaard V, Bjørnstad J, Skrbic B, Tønnessen T, Christensen G, Hovig E. Identifying pathogenic processes by integrating microarray data with prior knowledge. BMC Bioinformatics 2014; 15:115. [PMID: 24758699 PMCID: PMC4006456 DOI: 10.1186/1471-2105-15-115] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2013] [Accepted: 04/09/2014] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND It is of great importance to identify molecular processes and pathways that are involved in disease etiology. Although there has been an extensive use of various high-throughput methods for this task, pathogenic pathways are still not completely understood. Often the set of genes or proteins identified as altered in genome-wide screens show a poor overlap with canonical disease pathways. These findings are difficult to interpret, yet crucial in order to improve the understanding of the molecular processes underlying the disease progression. We present a novel method for identifying groups of connected molecules from a set of differentially expressed genes. These groups represent functional modules sharing common cellular function and involve signaling and regulatory events. Specifically, our method makes use of Bayesian statistics to identify groups of co-regulated genes based on the microarray data, where external information about molecular interactions and connections are used as priors in the group assignments. Markov chain Monte Carlo sampling is used to search for the most reliable grouping. RESULTS Simulation results showed that the method improved the ability of identifying correct groups compared to traditional clustering, especially for small sample sizes. Applied to a microarray heart failure dataset the method found one large cluster with several genes important for the structure of the extracellular matrix and a smaller group with many genes involved in carbohydrate metabolism. The method was also applied to a microarray dataset on melanoma cancer patients with or without metastasis, where the main cluster was dominated by genes related to keratinocyte differentiation. CONCLUSION Our method found clusters overlapping with known pathogenic processes, but also pointed to new connections extending beyond the classical pathways.
Collapse
Affiliation(s)
- Ståle Nygård
- Bioinformatics Core Facility, Institute for Medical Informatics, Oslo University Hospital, Oslo, Norway
- Institute for Experimental Medical Research, Oslo University Hospital and University of Oslo, Oslo, Norway
- KG Jebsen Cardiac Research Centre and Center for Heart Failure Research, University of Oslo, Oslo, Norway
| | - Trond Reitan
- Center for Ecological and Evolutionary Synthesis, Department of Biology, University of Oslo, Oslo, Norway
| | - Trevor Clancy
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | - Vegard Nygaard
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | - Johannes Bjørnstad
- KG Jebsen Cardiac Research Centre and Center for Heart Failure Research, University of Oslo, Oslo, Norway
- Department of Cardiothoracic Surgery, Oslo University Hospital, Oslo, Norway
| | - Biljana Skrbic
- KG Jebsen Cardiac Research Centre and Center for Heart Failure Research, University of Oslo, Oslo, Norway
- Department of Cardiothoracic Surgery, Oslo University Hospital, Oslo, Norway
| | - Theis Tønnessen
- KG Jebsen Cardiac Research Centre and Center for Heart Failure Research, University of Oslo, Oslo, Norway
- Department of Cardiothoracic Surgery, Oslo University Hospital, Oslo, Norway
| | - Geir Christensen
- Institute for Experimental Medical Research, Oslo University Hospital and University of Oslo, Oslo, Norway
- KG Jebsen Cardiac Research Centre and Center for Heart Failure Research, University of Oslo, Oslo, Norway
| | - Eivind Hovig
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Institute for Medical Informatics, Oslo University Hospital, Oslo, Norway
- Department of informatics, University of Oslo, Oslo, Norway
| |
Collapse
|
37
|
Levinson M, Zhou Q. A penalized Bayesian approach to predicting sparse protein-DNA binding landscapes. ACTA ACUST UNITED AC 2014; 30:636-43. [PMID: 24115169 DOI: 10.1093/bioinformatics/btt585] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Cellular processes are controlled, directly or indirectly, by the binding of hundreds of different DNA binding factors (DBFs) to the genome. One key to deeper understanding of the cell is discovering where, when and how strongly these DBFs bind to the DNA sequence. Direct measurement of DBF binding sites (BSs; e.g. through ChIP-Chip or ChIP-Seq experiments) is expensive, noisy and not available for every DBF in every cell type. Naive and most existing computational approaches to detecting which DBFs bind in a set of genomic regions of interest often perform poorly, due to the high false discovery rates and restrictive requirements for prior knowledge. RESULTS We develop SparScape, a penalized Bayesian method for identifying DBFs active in the considered regions and predicting a joint probabilistic binding landscape. Using a sparsity-inducing penalization, SparScape is able to select a small subset of DBFs with enriched BSs in a set of DNA sequences from a much larger candidate set. This substantially reduces the false positives in prediction of BSs. Analysis of ChIP-Seq data in mouse embryonic stem cells and simulated data show that SparScape dramatically outperforms the naive motif scanning method and the comparable computational approaches in terms of DBF identification and BS prediction. AVAILABILITY AND IMPLEMENTATION SparScape is implemented in C++ with OpenMP (optional at compilation) and is freely available at 'www.stat.ucla.edu/∼zhou/Software.html' for academic use.
Collapse
Affiliation(s)
- Matthew Levinson
- Department of Statistics, University of California, Los Angeles, CA 90095, USA
| | | |
Collapse
|
38
|
Transcription factor binding sites prediction based on modified nucleosomes. PLoS One 2014; 9:e89226. [PMID: 24586611 PMCID: PMC3931712 DOI: 10.1371/journal.pone.0089226] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Accepted: 01/17/2014] [Indexed: 11/19/2022] Open
Abstract
In computational methods, position weight matrices (PWMs) are commonly applied for transcription factor binding site (TFBS) prediction. Although these matrices are more accurate than simple consensus sequences to predict actual binding sites, they usually produce a large number of false positive (FP) predictions and so are impoverished sources of information. Several studies have employed additional sources of information such as sequence conservation or the vicinity to transcription start sites to distinguish true binding regions from random ones. Recently, the spatial distribution of modified nucleosomes has been shown to be associated with different promoter architectures. These aligned patterns can facilitate DNA accessibility for transcription factors. We hypothesize that using data from these aligned and periodic patterns can improve the performance of binding region prediction. In this study, we propose two effective features, “modified nucleosomes neighboring” and “modified nucleosomes occupancy”, to decrease FP in binding site discovery. Based on these features, we designed a logistic regression classifier which estimates the probability of a region as a TFBS. Our model learned each feature based on Sp1 binding sites on Chromosome 1 and was tested on the other chromosomes in human CD4+T cells. In this work, we investigated 21 histone modifications and found that only 8 out of 21 marks are strongly correlated with transcription factor binding regions. To prove that these features are not specific to Sp1, we combined the logistic regression classifier with the PWM, and created a new model to search TFBSs on the genome. We tested the model using transcription factors MAZ, PU.1 and ELF1 and compared the results to those using only the PWM. The results show that our model can predict Transcription factor binding regions more successfully. The relative simplicity of the model and capability of integrating other features make it a superior method for TFBS prediction.
Collapse
|
39
|
Abstract
MOTIVATION Several types of studies, including genome-wide association studies and RNA interference screens, strive to link genes to diseases. Although these approaches have had some success, genetic variants are often only present in a small subset of the population, and screens are noisy with low overlap between experiments in different labs. Neither provides a mechanistic model explaining how identified genes impact the disease of interest or the dynamics of the pathways those genes regulate. Such mechanistic models could be used to accurately predict downstream effects of knocking down pathway members and allow comprehensive exploration of the effects of targeting pairs or higher-order combinations of genes. RESULTS We developed methods to model the activation of signaling and dynamic regulatory networks involved in disease progression. Our model, SDREM, integrates static and time series data to link proteins and the pathways they regulate in these networks. SDREM uses prior information about proteins' likelihood of involvement in a disease (e.g. from screens) to improve the quality of the predicted signaling pathways. We used our algorithms to study the human immune response to H1N1 influenza infection. The resulting networks correctly identified many of the known pathways and transcriptional regulators of this disease. Furthermore, they accurately predict RNA interference effects and can be used to infer genetic interactions, greatly improving over other methods suggested for this task. Applying our method to the more pathogenic H5N1 influenza allowed us to identify several strain-specific targets of this infection. AVAILABILITY SDREM is available from http://sb.cs.cmu.edu/sdrem. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anthony Gitter
- Computer Science Department and Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
| | | |
Collapse
|
40
|
Yang TH, Wu WS. Inferring functional transcription factor-gene binding pairs by integrating transcription factor binding data with transcription factor knockout data. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S13. [PMID: 24565265 PMCID: PMC4029220 DOI: 10.1186/1752-0509-7-s6-s13] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Background Chromatin immunoprecipitation (ChIP) experiments are now the most comprehensive experimental approaches for mapping the binding of transcription factors (TFs) to their target genes. However, ChIP data alone is insufficient for identifying functional binding target genes of TFs for two reasons. First, there is an inherent high false positive/negative rate in ChIP-chip or ChIP-seq experiments. Second, binding signals in the ChIP data do not necessarily imply functionality. Methods It is known that ChIP-chip data and TF knockout (TFKO) data reveal complementary information on gene regulation. While ChIP-chip data can provide TF-gene binding pairs, TFKO data can provide TF-gene regulation pairs. Therefore, we propose a novel network approach for identifying functional TF-gene binding pairs by integrating the ChIP-chip data with the TFKO data. In our method, a TF-gene binding pair from the ChIP-chip data is regarded to be functional if it also has high confident curated TFKO TF-gene regulatory relation or deduced hypostatic TF-gene regulatory relation. Results and conclusions We first validated our method on a gathered ground truth set. Then we applied our method to the ChIP-chip data to identify functional TF-gene binding pairs. The biological significance of our identified functional TF-gene binding pairs was shown by assessing their functional enrichment, the prevalence of protein-protein interaction, and expression coherence. Our results outperformed the results of three existing methods across all measures. And our identified functional targets of TFs also showed statistical significance over the randomly assigned TF-gene pairs. We also showed that our method is dataset independent and can apply to ChIP-seq data and the E. coli genome. Finally, we provided an example showing the biological applicability of our notion.
Collapse
|
41
|
Chen CC, Xiao S, Xie D, Cao X, Song CX, Wang T, He C, Zhong S. Understanding variation in transcription factor binding by modeling transcription factor genome-epigenome interactions. PLoS Comput Biol 2013; 9:e1003367. [PMID: 24339764 PMCID: PMC3854512 DOI: 10.1371/journal.pcbi.1003367] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2012] [Accepted: 10/15/2013] [Indexed: 12/20/2022] Open
Abstract
Despite explosive growth in genomic datasets, the methods for studying epigenomic mechanisms of gene regulation remain primitive. Here we present a model-based approach to systematically analyze the epigenomic functions in modulating transcription factor-DNA binding. Based on the first principles of statistical mechanics, this model considers the interactions between epigenomic modifications and a cis-regulatory module, which contains multiple binding sites arranged in any configurations. We compiled a comprehensive epigenomic dataset in mouse embryonic stem (mES) cells, including DNA methylation (MeDIP-seq and MRE-seq), DNA hydroxymethylation (5-hmC-seq), and histone modifications (ChIP-seq). We discovered correlations of transcription factors (TFs) for specific combinations of epigenomic modifications, which we term epigenomic motifs. Epigenomic motifs explained why some TFs appeared to have different DNA binding motifs derived from in vivo (ChIP-seq) and in vitro experiments. Theoretical analyses suggested that the epigenome can modulate transcriptional noise and boost the cooperativity of weak TF binding sites. ChIP-seq data suggested that epigenomic boost of binding affinities in weak TF binding sites can function in mES cells. We showed in theory that the epigenome should suppress the TF binding differences on SNP-containing binding sites in two people. Using personal data, we identified strong associations between H3K4me2/H3K9ac and the degree of personal differences in NFκB binding in SNP-containing binding sites, which may explain why some SNPs introduce much smaller personal variations on TF binding than other SNPs. In summary, this model presents a powerful approach to analyze the functions of epigenomic modifications. This model was implemented into an open source program APEG (Affinity Prediction by Epigenome and Genome, http://systemsbio.ucsd.edu/apeg).
Collapse
Affiliation(s)
- Chieh-Chun Chen
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Shu Xiao
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Dan Xie
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Xiaoyi Cao
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Chun-Xiao Song
- Department of Chemistry, University of Chicago, Chicago, Illinois, United States of America
| | - Ting Wang
- Department of Genetics, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Chuan He
- Department of Chemistry, University of Chicago, Chicago, Illinois, United States of America
| | - Sheng Zhong
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
42
|
Lee H, Flaherty P, Ji HP. Systematic genomic identification of colorectal cancer genes delineating advanced from early clinical stage and metastasis. BMC Med Genomics 2013; 6:54. [PMID: 24308539 PMCID: PMC3907018 DOI: 10.1186/1755-8794-6-54] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2013] [Accepted: 11/27/2013] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Colorectal cancer is the third leading cause of cancer deaths in the United States. The initial assessment of colorectal cancer involves clinical staging that takes into account the extent of primary tumor invasion, determining the number of lymph nodes with metastatic cancer and the identification of metastatic sites in other organs. Advanced clinical stage indicates metastatic cancer, either in regional lymph nodes or in distant organs. While the genomic and genetic basis of colorectal cancer has been elucidated to some degree, less is known about the identity of specific cancer genes that are associated with advanced clinical stage and metastasis. METHODS We compiled multiple genomic data types (mutations, copy number alterations, gene expression and methylation status) as well as clinical meta-data from The Cancer Genome Atlas (TCGA). We used an elastic-net regularized regression method on the combined genomic data to identify genetic aberrations and their associated cancer genes that are indicators of clinical stage. We ranked candidate genes by their regression coefficient and level of support from multiple assay modalities. RESULTS A fit of the elastic-net regularized regression to 197 samples and integrated analysis of four genomic platforms identified the set of top gene predictors of advanced clinical stage, including: WRN, SYK, DDX5 and ADRA2C. These genetic features were identified robustly in bootstrap resampling analysis. CONCLUSIONS We conducted an analysis integrating multiple genomic features including mutations, copy number alterations, gene expression and methylation. This integrated approach in which one considers all of these genomic features performs better than any individual genomic assay. We identified multiple genes that robustly delineate advanced clinical stage, suggesting their possible role in colorectal cancer metastatic progression.
Collapse
Affiliation(s)
- HoJoon Lee
- Division of Oncology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Patrick Flaherty
- Department of Biomedical Engineering, Worcester Polytechnic Institute, Worcester, MA 01605, USA
| | - Hanlee P Ji
- Division of Oncology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA 94304, USA
| |
Collapse
|
43
|
Zhong S, He X, Bar-Joseph Z. Predicting tissue specific transcription factor binding sites. BMC Genomics 2013; 14:796. [PMID: 24238150 PMCID: PMC3898213 DOI: 10.1186/1471-2164-14-796] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Accepted: 11/06/2013] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Studies of gene regulation often utilize genome-wide predictions of transcription factor (TF) binding sites. Most existing prediction methods are based on sequence information alone, ignoring biological contexts such as developmental stages and tissue types. Experimental methods to study in vivo binding, including ChIP-chip and ChIP-seq, can only study one transcription factor in a single cell type and under a specific condition in each experiment, and therefore cannot scale to determine the full set of regulatory interactions in mammalian transcriptional regulatory networks. RESULTS We developed a new computational approach, PIPES, for predicting tissue-specific TF binding. PIPES integrates in vitro protein binding microarrays (PBMs), sequence conservation and tissue-specific epigenetic (DNase I hypersensitivity) information. We demonstrate that PIPES improves over existing methods on distinguishing between in vivo bound and unbound sequences using ChIP-seq data for 11 mouse TFs. In addition, our predictions are in good agreement with current knowledge of tissue-specific TF regulation. CONCLUSIONS We provide a systematic map of computationally predicted tissue-specific binding targets for 284 mouse TFs across 55 tissue/cell types. Such comprehensive resource is useful for researchers studying gene regulation.
Collapse
Affiliation(s)
| | | | - Ziv Bar-Joseph
- Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| |
Collapse
|
44
|
LASAGNA-Search: an integrated web tool for transcription factor binding site search and visualization. Biotechniques 2013; 54:141-53. [PMID: 23599922 DOI: 10.2144/000113999] [Citation(s) in RCA: 92] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
The release of ChIP-seq data from the ENCyclopedia Of DNA Elements (ENCODE) and Model Organism ENCyclopedia Of DNA Elements (modENCODE) projects has significantly increased the amount of transcription factor (TF) binding affinity information available to researchers. However, scientists still routinely use TF binding site (TFBS) search tools to scan unannotated sequences for TFBSs, particularly when searching for lesser-known TFs or TFs in organisms for which ChIP-seq data are unavailable. The sequence analysis often involves multiple steps such as TF model collection, promoter sequence retrieval, and visualization; thus, several different tools are required. We have developed a novel integrated web tool named LASAGNA-Search that allows users to perform TFBS searches without leaving the web site. LASAGNA-Search uses the LASAGNA (Length-Aware Site Alignment Guided by Nucleotide Association) algorithm for TFBS alignment. Important features of LASAGNA-Search include (i) acceptance of unaligned variable-length TFBSs, (ii) a collection of 1726 TF models, (iii) automatic promoter sequence retrieval, (iv) visualization in the UCSC Genome Browser, and (v) gene regulatory network inference and visualization based on binding specificities. LASAGNA-Search is freely available at http://biogrid.engr.uconn.edu/lasagna_search/.
Collapse
|
45
|
Chen YC, Cheng JH, Tsai ZTY, Tsai HK, Chuang TJ. The impact of trans-regulation on the evolutionary rates of metazoan proteins. Nucleic Acids Res 2013; 41:6371-80. [PMID: 23658220 PMCID: PMC3711421 DOI: 10.1093/nar/gkt349] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2013] [Revised: 04/10/2013] [Accepted: 04/14/2013] [Indexed: 11/13/2022] Open
Abstract
Transcription factor (TF) and microRNA (miRNA) are two crucial trans-regulatory factors that coordinately control gene expression. Understanding the impacts of these two factors on the rate of protein sequence evolution is of great importance in evolutionary biology. While many biological factors associated with evolutionary rate variations have been studied, evolutionary analysis of simultaneously accounting for TF and miRNA regulations across metazoans is still uninvestigated. Here, we provide a series of statistical analyses to assess the influences of TF and miRNA regulations on evolutionary rates across metazoans (human, mouse and fruit fly). Our results reveal that the negative correlations between trans-regulation and evolutionary rates hold well across metazoans, but the strength of TF regulation as a rate indicator becomes weak when the other confounding factors that may affect evolutionary rates are controlled. We show that miRNA regulation tends to be a more essential indicator of evolutionary rates than TF regulation, and the combination of TF and miRNA regulations has a significant dependent effect on protein evolutionary rates. We also show that trans-regulation (especially miRNA regulation) is much more important in human/mouse than in fruit fly in determining protein evolutionary rates, suggesting a considerable variation in rate determinants between vertebrates and invertebrates.
Collapse
Affiliation(s)
- Yi-Ching Chen
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan and Genomic Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Jen-Hao Cheng
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan and Genomic Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Zing Tsung-Yeh Tsai
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan and Genomic Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Huai-Kuang Tsai
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan and Genomic Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Trees-Juen Chuang
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan and Genomic Research Center, Academia Sinica, Taipei 115, Taiwan
| |
Collapse
|
46
|
Abstract
BACKGROUND Although genome-wide association studies (GWAS) and subsequent meta-analyses have confirmed associations between the PTPN2 (protein tyrosine phosphatase, nonreceptor type 2) gene and Crohn's disease (CD), the potential causal variants remain unidentified. We aimed to dissect potential causal CD-associated PTPN2 variants, assess their functional significance, and relate PTPN2 protein expression with inflammation in CD. METHODS A 3-stage study was carried out. In stage 1, we genotyped tagging single nucleotide polymorphisms (tag-SNPs) in the PTPN2 gene in a sample of patients with CD (<20 years, n = 556) and controls (n = 602). In stage 2, we resequenced the putative promoter, target exons and introns in the PTPN2 gene, and examined associations with high-frequency variants with CD in the stage 1 cohort. In stage 3 we studied the relationship between PTPN2 protein expression and mucosal inflammation and carried out in silico analyses to study the functional characteristics of the PTPN2 CD-associated SNPs. RESULTS In stage 1, we observed associations between 5 intronic SNPs and CD including rs1893217 (P = 2 × 10⁻⁴), the SNP that is in perfect linkage disequilibrium with the lead genome-wide association studies SNP rs2542151. Resequencing revealed 2 known promoter polymorphisms. No associations between these promoter SNPs and CD were evident. In silico analyses revealed that the 5 associated intronic SNPs influenced PTPN2 expression and binding to important transcription factors. PTPN2 protein was overexpressed in inflamed intestinal tissues of patients with CD. CONCLUSIONS Our findings suggest that noncoding variation in the PTPN2 gene may represent the causal variations influencing susceptibility for CD.
Collapse
|
47
|
Lim JH, Iggo RD, Barker D. Models incorporating chromatin modification data identify functionally important p53 binding sites. Nucleic Acids Res 2013; 41:5582-93. [PMID: 23599002 PMCID: PMC3675478 DOI: 10.1093/nar/gkt260] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Genome-wide prediction of transcription factor binding sites is notoriously difficult. We have developed and applied a logistic regression approach for prediction of binding sites for the p53 transcription factor that incorporates sequence information and chromatin modification data. We tested this by comparison of predicted sites with known binding sites defined by chromatin immunoprecipitation (ChIP), by the location of predictions relative to genes, by the function of nearby genes and by analysis of gene expression data after p53 activation. We compared the predictions made by our novel model with predictions based only on matches to a sequence position weight matrix (PWM). In whole genome assays, the fraction of known sites identified by the two models was similar, suggesting that there was little to be gained from including chromatin modification data. In contrast, there were highly significant and biologically relevant differences between the two models in the location of the predicted binding sites relative to genes, in the function of nearby genes and in the responsiveness of nearby genes to p53 activation. We propose that these contradictory results can be explained by PWM and ChIP data reflecting primarily biophysical properties of protein–DNA interactions, whereas chromatin modification data capture biologically important functional information.
Collapse
Affiliation(s)
- Ji-Hyun Lim
- Sir Harold Mitchell Building, School of Biology, University of St Andrews, St Andrews, Fife, KY16 9TH, UK
| | | | | |
Collapse
|
48
|
Nie Y, Liu H, Sun X. The patterns of histone modifications in the vicinity of transcription factor binding sites in human lymphoblastoid cell lines. PLoS One 2013; 8:e60002. [PMID: 23527292 PMCID: PMC3602107 DOI: 10.1371/journal.pone.0060002] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2012] [Accepted: 02/25/2013] [Indexed: 01/12/2023] Open
Abstract
Transcription factor (TF) binding at specific DNA sequences is the fundamental step in transcriptional regulation and is highly dependent on the chromatin structure context, which may be affected by specific histone modifications and variants, known as histone marks. The lack of a global binding map for hundreds of TFs means that previous studies have focused mainly on histone marks at binding sites for several specific TFs. We therefore studied 11 histone marks around computationally-inferred and experimentally-determined TF binding sites (TFBSs), based on 164 and 34 TFs, respectively, in human lymphoblastoid cell lines. For H2A.Z, methylation of H3K4, and acetylation of H3K27 and H3K9, the mark patterns exhibited bimodal distributions and strong pairwise correlations in the 600-bp region around enriched TFBSs, suggesting that these marks mainly coexist within the two nucleosomes proximal to the TF sites. TFs competing with nucleosomes to access DNA at most binding sites, contributes to the bimodal distribution, which is a common feature of histone marks for TF binding. Mark H3K79me2 showed a unimodal distribution on one side of TFBSs and the signals extended up to 4000 bp, indicating a longer-distance pattern. Interestingly, H4K20me1, H3K27me3, H3K36me3 and H3K9me3, which were more diffuse and less enriched surrounding TFBSs, showed unimodal distributions around the enriched TFBSs, suggesting that some TFs may bind to nucleosomal DNA. Besides, asymmetrical distributions of H3K36me3 and H3K9me3 indicated that repressors might establish a repressive chromatin structure in one direction to repress gene expression. In conclusion, this study demonstrated the ranges of histone marks associated with TF binding, and the common features of these marks around the binding sites. These findings have epigenetic implications for future analysis of regulatory elements.
Collapse
Affiliation(s)
- Yumin Nie
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Hongde Liu
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science & Medical Engineering, Southeast University, Nanjing, China
- * E-mail:
| |
Collapse
|
49
|
Won KJ, Zhang X, Wang T, Ding B, Raha D, Snyder M, Ren B, Wang W. Comparative annotation of functional regions in the human genome using epigenomic data. Nucleic Acids Res 2013; 41:4423-32. [PMID: 23482391 PMCID: PMC3632130 DOI: 10.1093/nar/gkt143] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Epigenetic regulation is dynamic and cell-type dependent. The recently available epigenomic data in multiple cell types provide an unprecedented opportunity for a comparative study of epigenetic landscape. We developed a machine-learning method called ChroModule to annotate the epigenetic states in eight ENCyclopedia Of DNA Elements cell types. The trained model successfully captured the characteristic histone-modification patterns associated with regulatory elements, such as promoters and enhancers, and showed superior performance on identifying enhancers compared with the state-of-art methods. In addition, given the fixed number of epigenetic states in the model, ChroModule allows straightforward illustration of epigenetic variability in multiple cell types. Using this feature, we found that invariable and variable epigenetic states across cell types correspond to housekeeping functions and stimulus response, respectively. Especially, we observed that enhancers, but not the other regulatory elements, dictate cell specificity, as similar cell types share common enhancers, and cell-type-specific enhancers are often bound by transcription factors playing critical roles in that cell type. More interestingly, we found some genomic regions are dormant in cell type but primed to become active in other cell types. These observations highlight the usefulness of ChroModule in comparative analysis and interpretation of multiple epigenomes.
Collapse
Affiliation(s)
- Kyoung-Jae Won
- Department of Chemistry and Biochemistry, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0359, USA
| | | | | | | | | | | | | | | |
Collapse
|
50
|
Kim H, Gelenbe E. Reconstruction of large-scale gene regulatory networks using Bayesian model averaging. IEEE Trans Nanobioscience 2013; 11:259-65. [PMID: 22987132 DOI: 10.1109/tnb.2012.2214233] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Gene regulatory networks provide the systematic view of molecular interactions in a complex living system. However, constructing large-scale gene regulatory networks is one of the most challenging problems in systems biology. Also large burst sets of biological data require a proper integration technique for reliable gene regulatory network construction. Here we present a new reverse engineering approach based on Bayesian model averaging which attempts to combine all the appropriate models describing interactions among genes. This Bayesian approach with a prior based on the Gibbs distribution provides an efficient means to integrate multiple sources of biological data. In a simulation study with maximum of 2000 genes, our method shows better sensitivity than previous elastic-net and Gaussian graphical models, with a fixed specificity of 0.99. The study also shows that the proposed method outperforms the other standard methods for a DREAM dataset generated by nonlinear stochastic models. In brain tumor data analysis, three large-scale networks consisting of 4422 genes were built using the gene expression of non-tumor, low and high grade tumor mRNA expression samples, along with DNA-protein binding affinity information. We found that genes having a large variation of degree distribution among the three tumor networks are the ones that see most involved in regulatory and developmental processes, which possibly gives a novel insight concerning conventional differentially expressed gene analysis.
Collapse
Affiliation(s)
- Haseong Kim
- Intelligent Systems and Networks Group, Department of Electrical and Electronic Engineering, Imperial College London, London SW72AZ, UK.
| | | |
Collapse
|