1
|
Ambulkar PS, Waghmare JE, Verma Shivkumar P, Chaudhari AR, Gangane NM, Narang P, Pal AK. The association of testis-specific hTAF7L gene variants with idiopathic azoospermic and severe oligozoospermic male infertility. Andrologia 2022; 54:e14581. [PMID: 36068176 DOI: 10.1111/and.14581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 05/27/2022] [Accepted: 08/25/2022] [Indexed: 12/01/2022] Open
Abstract
Spermatogenesis is regulated by complex tissue specific gene expression in the testis to achieve normal male fertility. X-chromosome specific TATA binding protein (TBP)-associated factor 7L (hTAF7L) is one of the transcriptional regulator genes considered essential for spermatogenesis. The aim of this study was to evaluate the role of variants/mutations in the testis-specific hTAF7L gene in non-obstructive azoospermia and severe oligozoospermia male infertility. We studied 156 idiopathic non-obstructive azoospermic, severe oligozoospermic infertile males and 50 fertile proven controls. Infertile males and control subjects were genotyped for variants of the hTAF7L gene using polymerase chain reaction and a direct Sanger sequencing approach. The odds ratio evaluated the association of hTAF7L gene variants with idiopathic male infertility. The variants found in the hTAF7L gene were subjected to an in-silico analysis study. In infertile study subjects, we observed 11 single base pair nucleotide changes at various exons and three frameshift variants at exon 10 in the hTAF7L gene. We also found more than one variant in some non-obstructive azoospermia and severe oligozoospermia infertile males along with control subjects. All these variants changed the amino acid sequences in the hTAF7L gene. However, similar changes were also seen in fertile subjects, and the differences were not statistically significant. In-silico tools also predicted that the variants were likely to be benign. The variants in cDNA of the hTAF7L gene were typical SNPs. It is found that the hTAF7L gene is highly polymorphic and these missense variants are not directly associated with male infertility. However, we feel that more studies are needed to elucidate the role of multiple variants of the hTAF7L gene in the process of normal spermatogenesis.
Collapse
Affiliation(s)
- Prafulla S Ambulkar
- Centre for Genetics & Genomics, Department of Anatomy, Mahatma Gandhi Institute of Medical Sciences, Wardha, Maharashtra, India
| | - Jwalant E Waghmare
- Centre for Genetics & Genomics, Department of Anatomy, Mahatma Gandhi Institute of Medical Sciences, Wardha, Maharashtra, India
| | - Poonam Verma Shivkumar
- Department of Obstetrics & Gynaecology, Mahatma Gandhi Institute of Medical Sciences, Wardha, Maharashtra, India
| | - Ajay R Chaudhari
- Department of Physiology, Mahatma Gandhi Institute of Medical Sciences, Wardha, Maharashtra, India
| | - Nitin M Gangane
- Centre for Genetics & Genomics, Department of Anatomy, Mahatma Gandhi Institute of Medical Sciences, Wardha, Maharashtra, India.,Department of Pathology, Mahatma Gandhi Institute of Medical Sciences, Wardha, Maharashtra, India
| | - Pratibha Narang
- Centre for Genetics & Genomics, Department of Anatomy, Mahatma Gandhi Institute of Medical Sciences, Wardha, Maharashtra, India.,Department of Microbiology, Mahatma Gandhi Institute of Medical Sciences, Wardha, Maharashtra, India
| | - Asoke K Pal
- Centre for Genetics & Genomics, Department of Anatomy, Mahatma Gandhi Institute of Medical Sciences, Wardha, Maharashtra, India
| |
Collapse
|
2
|
Angeles AK, Janke F, Bauer S, Christopoulos P, Riediger AL, Sültmann H. Liquid Biopsies beyond Mutation Calling: Genomic and Epigenomic Features of Cell-Free DNA in Cancer. Cancers (Basel) 2021; 13:5615. [PMID: 34830770 PMCID: PMC8616179 DOI: 10.3390/cancers13225615] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 11/08/2021] [Accepted: 11/09/2021] [Indexed: 01/12/2023] Open
Abstract
Cell-free DNA (cfDNA) analysis using liquid biopsies is a non-invasive method to gain insights into the biology, therapy response, mechanisms of acquired resistance and therapy escape of various tumors. While it is well established that individual cancer treatment options can be adjusted by panel next-generation sequencing (NGS)-based evaluation of driver mutations in cfDNA, emerging research additionally explores the value of deep characterization of tumor cfDNA genomics and fragmentomics as well as nucleosome modifications (chromatin structure), and methylation patterns (epigenomics) for comprehensive and multi-modal assessment of cfDNA. These tools have the potential to improve disease monitoring, increase the sensitivity of minimal residual disease identification, and detection of cancers at earlier stages. Recent progress in emerging technologies of cfDNA analysis is summarized, the added potential clinical value is highlighted, strengths and limitations are identified and compared with conventional targeted NGS analysis, and current challenges and future directions are discussed.
Collapse
Affiliation(s)
- Arlou Kristina Angeles
- Division of Cancer Genome Research, German Cancer Research Center (DKFZ) and German Cancer Consortium (DKTK), 69120 Heidelberg, Germany; (A.K.A.); (F.J.); (S.B.)
- National Center for Tumor Diseases (NCT), 69120 Heidelberg, Germany;
- Translational Lung Research Center, German Center for Lung Research (DZL) at Heidelberg University Hospital, 69120 Heidelberg, Germany
| | - Florian Janke
- Division of Cancer Genome Research, German Cancer Research Center (DKFZ) and German Cancer Consortium (DKTK), 69120 Heidelberg, Germany; (A.K.A.); (F.J.); (S.B.)
- National Center for Tumor Diseases (NCT), 69120 Heidelberg, Germany;
- Translational Lung Research Center, German Center for Lung Research (DZL) at Heidelberg University Hospital, 69120 Heidelberg, Germany
- Medical Faculty, Heidelberg University, 69120 Heidelberg, Germany
| | - Simone Bauer
- Division of Cancer Genome Research, German Cancer Research Center (DKFZ) and German Cancer Consortium (DKTK), 69120 Heidelberg, Germany; (A.K.A.); (F.J.); (S.B.)
- National Center for Tumor Diseases (NCT), 69120 Heidelberg, Germany;
- Translational Lung Research Center, German Center for Lung Research (DZL) at Heidelberg University Hospital, 69120 Heidelberg, Germany
| | - Petros Christopoulos
- National Center for Tumor Diseases (NCT), 69120 Heidelberg, Germany;
- Translational Lung Research Center, German Center for Lung Research (DZL) at Heidelberg University Hospital, 69120 Heidelberg, Germany
- Department of Oncology, Thoraxklinik at Heidelberg University Hospital, 69126 Heidelberg, Germany
| | - Anja Lisa Riediger
- Helmholtz Young Investigator Group, Multiparametric Methods for Early Detection of Prostate Cancer, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany;
- Department of Urology, Heidelberg University Hospital, 69120 Heidelberg, Germany
- Faculty of Biosciences, Heidelberg University, 69120 Heidelberg, Germany
| | - Holger Sültmann
- Division of Cancer Genome Research, German Cancer Research Center (DKFZ) and German Cancer Consortium (DKTK), 69120 Heidelberg, Germany; (A.K.A.); (F.J.); (S.B.)
- National Center for Tumor Diseases (NCT), 69120 Heidelberg, Germany;
- Translational Lung Research Center, German Center for Lung Research (DZL) at Heidelberg University Hospital, 69120 Heidelberg, Germany
| |
Collapse
|
3
|
Zhang J, Liu J, Lee D, Lou S, Chen Z, Gürsoy G, Gerstein M. DiNeR: a Differential graphical model for analysis of co-regulation Network Rewiring. BMC Bioinformatics 2020; 21:281. [PMID: 32615918 PMCID: PMC7333332 DOI: 10.1186/s12859-020-03605-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 06/15/2020] [Indexed: 01/30/2023] Open
Abstract
BACKGROUND During transcription, numerous transcription factors (TFs) bind to targets in a highly coordinated manner to control the gene expression. Alterations in groups of TF-binding profiles (i.e. "co-binding changes") can affect the co-regulating associations between TFs (i.e. "rewiring the co-regulator network"). This, in turn, can potentially drive downstream expression changes, phenotypic variation, and even disease. However, quantification of co-regulatory network rewiring has not been comprehensively studied. RESULTS To address this, we propose DiNeR, a computational method to directly construct a differential TF co-regulation network from paired disease-to-normal ChIP-seq data. Specifically, DiNeR uses a graphical model to capture the gained and lost edges in the co-regulation network. Then, it adopts a stability-based, sparsity-tuning criterion -- by sub-sampling the complete binding profiles to remove spurious edges -- to report only significant co-regulation alterations. Finally, DiNeR highlights hubs in the resultant differential network as key TFs associated with disease. We assembled genome-wide binding profiles of 104 TFs in the K562 and GM12878 cell lines, which loosely model the transition between normal and cancerous states in chronic myeloid leukemia (CML). In total, we identified 351 significantly altered TF co-regulation pairs. In particular, we found that the co-binding of the tumor suppressor BRCA1 and RNA polymerase II, a well-known transcriptional pair in healthy cells, was disrupted in tumors. Thus, DiNeR successfully extracted hub regulators and discovered well-known risk genes. CONCLUSIONS Our method DiNeR makes it possible to quantify changes in co-regulatory networks and identify alterations to TF co-binding patterns, highlighting key disease regulators. Our method DiNeR makes it possible to quantify changes in co-regulatory networks and identify alterations to TF co-binding patterns, highlighting key disease regulators.
Collapse
Affiliation(s)
- Jing Zhang
- Department of Computer Science, University of California, Irvine, CA, 92617, USA
| | - Jason Liu
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Donghoon Lee
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Shaoke Lou
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Zhanlin Chen
- Department of Molecular Cellular and Developmental Biology, Yale University, New Haven, CT, 06520, USA
- Department of Computer Science, Yale University, New Haven, CT, 06520, USA
| | - Gamze Gürsoy
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Mark Gerstein
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, 06520, USA.
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA.
- Department of Computer Science, Yale University, New Haven, CT, 06520, USA.
| |
Collapse
|
4
|
Hafner A, Kublo L, Tsabar M, Lahav G, Stewart-Ornstein J. Identification of universal and cell-type specific p53 DNA binding. BMC Mol Cell Biol 2020; 21:5. [PMID: 32070277 PMCID: PMC7027055 DOI: 10.1186/s12860-020-00251-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 02/11/2020] [Indexed: 01/09/2023] Open
Abstract
Background The tumor suppressor p53 is a major regulator of the DNA damage response and has been suggested to selectively bind and activate cell-type specific gene expression programs. However recent studies and meta-analyses of genomic data propose largely uniform, and condition independent p53 binding and thus question the selective and cell-type dependent function of p53. Results To systematically assess the cell-type specificity of p53, we measured its association with DNA in 12 p53 wild-type cancer cell lines, from a range of epithelial linages, in response to ionizing radiation. We found that the majority of bound sites were occupied across all cell lines, however we also identified a subset of binding sites that were specific to one or a few cell lines. Unlike the shared p53-bound genome, which was not dependent on chromatin accessibility, the association of p53 with these atypical binding sites was well explained by chromatin accessibility and could be modulated by forcing cell state changes such as the epithelial-to-mesenchymal transition. Conclusions Our study reconciles previous conflicting views in the p53 field, by demonstrating that although the majority of p53 DNA binding is conserved across cell types, there is a small set of cell line specific binding sites that depend on cell state.
Collapse
Affiliation(s)
- Antonina Hafner
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA. .,Department of Developmental Biology, Stanford University, Stanford, CA, 94305, USA.
| | - Lyubov Kublo
- University of Pittsburgh Medical Center (UPMC) Hillman Cancer Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| | - Michael Tsabar
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Galit Lahav
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Jacob Stewart-Ornstein
- University of Pittsburgh Medical Center (UPMC) Hillman Cancer Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA.,Department of Computational and Systems Biology, University of Pittsburgh Medical School, Pittsburgh, PA, 15260, USA
| |
Collapse
|
5
|
Wang Z, He W, Tang J, Guo F. Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families. J Chem Inf Model 2020; 60:1876-1883. [DOI: 10.1021/acs.jcim.9b01012] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Zongyu Wang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Wenying He
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
- Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, P. R. China
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina 29208, United States
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| |
Collapse
|
6
|
Vervier K, Michaelson JJ. TiSAn: estimating tissue-specific effects of coding and non-coding variants. Bioinformatics 2019; 34:3061-3068. [PMID: 29912365 PMCID: PMC6137979 DOI: 10.1093/bioinformatics/bty301] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 04/16/2018] [Indexed: 02/06/2023] Open
Abstract
Motivation Model-based estimates of general deleteriousness, like CADD, DANN or PolyPhen, have become indispensable tools in the interpretation of genetic variants. However, these approaches say little about the tissues in which the effects of deleterious variants will be most meaningful. Tissue-specific annotations have been recently inferred for dozens of tissues/cell types from large collections of cross-tissue epigenomic data, and have demonstrated sensitivity in predicting affected tissues in complex traits. It remains unclear, however, whether including additional genome-scale data specific to the tissue of interest would appreciably improve functional annotations. Results Herein, we introduce TiSAn, a tool that integrates multiple genome-scale data sources, defined by expert knowledge. TiSAn uses machine learning to discriminate variants relevant to a tissue from those with no bearing on the function of that tissue. Predictions are made genome-wide, and can be used to contextualize and filter variants of interest in whole genome sequencing or genome-wide association studies. We demonstrate the accuracy and flexibility of TiSAn by producing predictive models for human heart and brain, and detecting tissue-relevant variations in large cohorts for autism spectrum disorder (TiSAn-brain) and coronary artery disease (TiSAn-heart). We find the multiomics TiSAn model is better able to prioritize genetic variants according to their tissue-specific action than the current state-of-the-art method, GenoSkyLine. Availability and implementation Software and vignettes are available at http://github.com/kevinVervier/TiSAn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kévin Vervier
- Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa City, IA, USA
| | - Jacob J Michaelson
- Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
7
|
Madsen JGS, Rauch A, Van Hauwaert EL, Schmidt SF, Winnefeld M, Mandrup S. Integrated analysis of motif activity and gene expression changes of transcription factors. Genome Res 2018; 28:243-255. [PMID: 29233921 PMCID: PMC5793788 DOI: 10.1101/gr.227231.117] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Accepted: 12/01/2017] [Indexed: 01/01/2023]
Abstract
The ability to predict transcription factors based on sequence information in regulatory elements is a key step in systems-level investigation of transcriptional regulation. Here, we have developed a novel tool, IMAGE, for precise prediction of causal transcription factors based on transcriptome profiling and genome-wide maps of enhancer activity. High precision is obtained by combining a near-complete database of position weight matrices (PWMs), generated by compiling public databases and systematic prediction of PWMs for uncharacterized transcription factors, with a state-of-the-art method for PWM scoring and a novel machine learning strategy, based on both enhancers and promoters, to predict the contribution of motifs to transcriptional activity. We applied IMAGE to published data obtained during 3T3-L1 adipocyte differentiation and showed that IMAGE predicts causal transcriptional regulators of this process with higher confidence than existing methods. Furthermore, we generated genome-wide maps of enhancer activity and transcripts during human mesenchymal stem cell commitment and adipocyte differentiation and used IMAGE to identify positive and negative transcriptional regulators of this process. Collectively, our results demonstrate that IMAGE is a powerful and precise method for prediction of regulators of gene expression.
Collapse
Affiliation(s)
- Jesper Grud Skat Madsen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense, Denmark
| | - Alexander Rauch
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense, Denmark
| | - Elvira Laila Van Hauwaert
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense, Denmark
| | - Søren Fisker Schmidt
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense, Denmark
| | - Marc Winnefeld
- Research and Development, Beiersdorf AG, 20245 Hamburg, Germany
| | - Susanne Mandrup
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense, Denmark
| |
Collapse
|
8
|
Lee W, Park B, Han K. Sequence-based prediction of putative transcription factor binding sites in DNA sequences of any length. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 15:1461-1469. [PMID: 29990126 DOI: 10.1109/tcbb.2017.2773075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
A transcription factor (TF) is a protein that regulates gene expression by binding to specific DNA sequences. Despite the recent advances in experimental techniques for identifying transcription factor binding sites (TFBS) in DNA sequences, a large number of TFBS are to be unveiled in many species. Several computational methods developed for predicting TFBS in DNA are tissue- or species-specific methods, so cannot be used without prior knowledge of tissue or species. Some computational methods are applicable to finding TFBS in short DNA sequences only. In this paper we propose a new learning method for predicting TFBS in DNA of any length using the composition, transition and distribution of nucleotides and amino acids in DNA and TF sequences. In independent testing of the method on datasets that were not used in training the method, its accuracy and MCC were as high as 81.84% and 0.634, respectively. The proposed method can be a useful aid for selecting potential TFBS in a large amount of DNA sequences before conducting biochemical experiments to empirically determine TFBS. The program and data sets are available at http://bclab.inha.ac.kr/TFbinding.
Collapse
|
9
|
Ruffalo M, Stojanov P, Pillutla VK, Varma R, Bar-Joseph Z. Reconstructing cancer drug response networks using multitask learning. BMC SYSTEMS BIOLOGY 2017; 11:96. [PMID: 29017547 PMCID: PMC5635550 DOI: 10.1186/s12918-017-0471-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Accepted: 10/02/2017] [Indexed: 01/03/2023]
Abstract
BACKGROUND Translating in vitro results to clinical tests is a major challenge in systems biology. Here we present a new Multi-Task learning framework which integrates thousands of cell line expression experiments to reconstruct drug specific response networks in cancer. RESULTS The reconstructed networks correctly identify several shared key proteins and pathways while simultaneously highlighting many cell type specific proteins. We used top proteins from each drug network to predict survival for patients prescribed the drug. CONCLUSIONS Predictions based on proteins from the in-vitro derived networks significantly outperformed predictions based on known cancer genes indicating that Multi-Task learning can indeed identify accurate drug response networks.
Collapse
Affiliation(s)
- Matthew Ruffalo
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Petar Stojanov
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Venkata Krishna Pillutla
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Rohan Varma
- Electrical and Computer Engineering, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ziv Bar-Joseph
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. .,Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
10
|
Kang Y, Liow HH, Maier EJ, Brent MR. NetProphet 2.0: mapping transcription factor networks by exploiting scalable data resources. Bioinformatics 2017; 34:249-257. [PMID: 28968736 PMCID: PMC5860202 DOI: 10.1093/bioinformatics/btx563] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 03/14/2017] [Accepted: 09/11/2017] [Indexed: 11/15/2022] Open
Abstract
Motivation Cells process information, in part, through transcription factor (TF) networks, which control the rates at which individual genes produce their products. A TF network map is a graph that indicates which TFs bind and directly regulate each gene. Previous work has described network mapping algorithms that rely exclusively on gene expression data and ‘integrative’ algorithms that exploit a wide range of data sources including chromatin immunoprecipitation sequencing (ChIP-seq) of many TFs, genome-wide chromatin marks, and binding specificities for many TFs determined in vitro. However, such resources are available only for a few major model systems and cannot be easily replicated for new organisms or cell types. Results We present NetProphet 2.0, a ‘data light’ algorithm for TF network mapping, and show that it is more accurate at identifying direct targets of TFs than other, similarly data light algorithms. In particular, it improves on the accuracy of NetProphet 1.0, which used only gene expression data, by exploiting three principles. First, combining multiple approaches to network mapping from expression data can improve accuracy relative to the constituent approaches. Second, TFs with similar DNA binding domains bind similar sets of target genes. Third, even a noisy, preliminary network map can be used to infer DNA binding specificities from promoter sequences and these inferred specificities can be used to further improve the accuracy of the network map. Availability and implementation Source code and comprehensive documentation are freely available at https://github.com/yiming-kang/NetProphet_2.0. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yiming Kang
- Department of Computer Science and Engineering and Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, MO, USA
| | - Hien-Haw Liow
- Department of Mathematics, Washington University, Saint Louis, MO, USA
| | - Ezekiel J Maier
- Department of Computer Science and Engineering and Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, MO, USA
| | - Michael R Brent
- Department of Computer Science and Engineering and Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, MO, USA
| |
Collapse
|
11
|
Co-regulation of microRNAs and transcription factors in cardiomyocyte specific differentiation of murine embryonic stem cells: An aspect from transcriptome analysis. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2017; 1860:983-1001. [DOI: 10.1016/j.bbagrm.2017.07.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Revised: 07/17/2017] [Accepted: 07/30/2017] [Indexed: 12/21/2022]
|
12
|
Wang Y, Ung MH, Xia T, Cheng W, Cheng C. Cancer cell line specific co-factors modulate the FOXM1 cistrome. Oncotarget 2017; 8:76498-76515. [PMID: 29100329 PMCID: PMC5652723 DOI: 10.18632/oncotarget.20405] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Accepted: 08/14/2017] [Indexed: 12/11/2022] Open
Abstract
ChIP-seq has been commonly applied to identify genomic occupation of transcription factors (TFs) in a context-specific manner. It is generally assumed that a TF should have similar binding patterns in cells from the same or closely related tissues. Surprisingly, this assumption has not been carefully examined. To this end, we systematically compared the genomic binding of the cell cycle regulator FOXM1 in eight cell lines from seven different human tissues at binding signal, peaks and target genes levels. We found that FOXM1 binding in ER-positive breast cancer cell line MCF-7 are distinct comparing to those in not only other non-breast cell lines, but also MDA-MB-231, ER-negative breast cancer cell line. However, binding sites in MDA-MB-231 and non-breast cell lines were highly consistent. The recruitment of estrogen receptor alpha (ERα) caused the unique FOXM1 binding patterns in MCF-7. Moreover, the activity of FOXM1 in MCF-7 reflects the regulatory functions of ERα, while in MDA-MB-231 and non-breast cell lines, FOXM1 activities regulate cell proliferation. Our results suggest that tissue similarity, in some specific contexts, does not hold precedence over TF-cofactors interactions in determining transcriptional states and that the genomic binding of a TF can be dramatically affected by a particular co-factor under certain conditions.
Collapse
Affiliation(s)
- Yue Wang
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China.,Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Matthew H Ung
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Tian Xia
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Wenqing Cheng
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Chao Cheng
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA.,Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, NH 03766, USA.,Department of Biomedical Data Sciences, Geisel School of Medicine at Dartmouth, Lebanon, NH 03766, USA
| |
Collapse
|
13
|
Ruffalo M, Bar-Joseph Z. Genome wide predictions of miRNA regulation by transcription factors. Bioinformatics 2017; 32:i746-i754. [PMID: 27587697 DOI: 10.1093/bioinformatics/btw452] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
MOTIVATION Reconstructing regulatory networks from expression and interaction data is a major goal of systems biology. While much work has focused on trying to experimentally and computationally determine the set of transcription-factors (TFs) and microRNAs (miRNAs) that regulate genes in these networks, relatively little work has focused on inferring the regulation of miRNAs by TFs. Such regulation can play an important role in several biological processes including development and disease. The main challenge for predicting such interactions is the very small positive training set currently available. Another challenge is the fact that a large fraction of miRNAs are encoded within genes making it hard to determine the specific way in which they are regulated. RESULTS To enable genome wide predictions of TF-miRNA interactions, we extended semi-supervised machine-learning approaches to integrate a large set of different types of data including sequence, expression, ChIP-seq and epigenetic data. As we show, the methods we develop achieve good performance on both a labeled test set, and when analyzing general co-expression networks. We next analyze mRNA and miRNA cancer expression data, demonstrating the advantage of using the predicted set of interactions for identifying more coherent and relevant modules, genes, and miRNAs. The complete set of predictions is available on the supporting website and can be used by any method that combines miRNAs, genes, and TFs. AVAILABILITY AND IMPLEMENTATION Code and full set of predictions are available from the supporting website: http://cs.cmu.edu/~mruffalo/tf-mirna/ CONTACT zivbj@cs.cmu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matthew Ruffalo
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA 15213
| | - Ziv Bar-Joseph
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA 15213
| |
Collapse
|
14
|
Uygun S, Seddon AE, Azodi CB, Shiu SH. Predictive Models of Spatial Transcriptional Response to High Salinity. PLANT PHYSIOLOGY 2017; 174:450-464. [PMID: 28373393 PMCID: PMC5411138 DOI: 10.1104/pp.16.01828] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Accepted: 03/27/2017] [Indexed: 05/12/2023]
Abstract
Plants are exposed to a variety of environmental conditions, and their ability to respond to environmental variation depends on the proper regulation of gene expression in an organ-, tissue-, and cell type-specific manner. Although our knowledge of how stress responses are regulated is accumulating, a genome-wide model of how plant transcription factors (TFs) and cis-regulatory elements control spatially specific stress response has yet to emerge. Using Arabidopsis (Arabidopsis thaliana) as a model, we identified a set of 1,894 putative cis-regulatory elements (pCREs) that are associated with high-salinity (salt) up-regulated genes in the root or the shoot. We used these pCREs to develop computational models that can better predict salt up-regulated genes in the root and shoot compared with models based on known TF binding motifs. In addition, we incorporated TF binding sites identified via large-scale in vitro assays, chromatin accessibility, evolutionary conservation, and pCRE combinatorial relationships in machine learning models and found that only consideration of pCRE combinations led to better performance in salt up-regulation prediction in the root and shoot. Our results suggest that the plant organ transcriptional response to high salinity is regulated by a core set of pCREs and provide a genome-wide view of the cis-regulatory code of plant spatial transcriptional responses to environmental stress.
Collapse
Affiliation(s)
- Sahra Uygun
- Genetics Program (S.U., S.-H.S.), Department of Plant Biology (A.E.S., C.B.A., S.-H.S.), and Ecology, Evolutionary Biology, and Behavior Program (S.-H.S.), Michigan State University, East Lansing, Michigan 48824
| | - Alexander E Seddon
- Genetics Program (S.U., S.-H.S.), Department of Plant Biology (A.E.S., C.B.A., S.-H.S.), and Ecology, Evolutionary Biology, and Behavior Program (S.-H.S.), Michigan State University, East Lansing, Michigan 48824
| | - Christina B Azodi
- Genetics Program (S.U., S.-H.S.), Department of Plant Biology (A.E.S., C.B.A., S.-H.S.), and Ecology, Evolutionary Biology, and Behavior Program (S.-H.S.), Michigan State University, East Lansing, Michigan 48824
| | - Shin-Han Shiu
- Genetics Program (S.U., S.-H.S.), Department of Plant Biology (A.E.S., C.B.A., S.-H.S.), and Ecology, Evolutionary Biology, and Behavior Program (S.-H.S.), Michigan State University, East Lansing, Michigan 48824
| |
Collapse
|
15
|
Boeva V. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells. Front Genet 2016; 7:24. [PMID: 26941778 PMCID: PMC4763482 DOI: 10.3389/fgene.2016.00024] [Citation(s) in RCA: 87] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 02/05/2016] [Indexed: 12/27/2022] Open
Abstract
Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation.
Collapse
Affiliation(s)
- Valentina Boeva
- Centre de Recherche, Institut CurieParis, France; INSERM, U900Paris, France; Mines ParisTechFontainebleau, France; PSL Research UniversityParis, France; Department of Development, Reproduction and Cancer, Institut CochinParis, France; INSERM, U1016Paris, France; Centre National de la Recherche Scientifique UMR 8104Paris, France; Université Paris Descartes UMR-S1016Paris, France
| |
Collapse
|
16
|
Saint-André V, Federation AJ, Lin CY, Abraham BJ, Reddy J, Lee TI, Bradner JE, Young RA. Models of human core transcriptional regulatory circuitries. Genome Res 2016; 26:385-96. [PMID: 26843070 PMCID: PMC4772020 DOI: 10.1101/gr.197590.115] [Citation(s) in RCA: 178] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2015] [Accepted: 12/21/2015] [Indexed: 01/06/2023]
Abstract
A small set of core transcription factors (TFs) dominates control of the gene expression program in embryonic stem cells and other well-studied cellular models. These core TFs collectively regulate their own gene expression, thus forming an interconnected auto-regulatory loop that can be considered the core transcriptional regulatory circuitry (CRC) for that cell type. There is limited knowledge of core TFs, and thus models of core regulatory circuitry, for most cell types. We recently discovered that genes encoding known core TFs forming CRCs are driven by super-enhancers, which provides an opportunity to systematically predict CRCs in poorly studied cell types through super-enhancer mapping. Here, we use super-enhancer maps to generate CRC models for 75 human cell and tissue types. These core circuitry models should prove valuable for further investigating cell-type–specific transcriptional regulation in healthy and diseased cells.
Collapse
Affiliation(s)
- Violaine Saint-André
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA
| | - Alexander J Federation
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| | - Charles Y Lin
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| | - Brian J Abraham
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA
| | - Jessica Reddy
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Tong Ihn Lee
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA
| | - James E Bradner
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA; Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Richard A Young
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
17
|
Kibet CK, Machanick P. Transcription factor motif quality assessment requires systematic comparative analysis. F1000Res 2015; 4:ISCB Comm J-1429. [PMID: 27092243 PMCID: PMC4821295 DOI: 10.12688/f1000research.7408.2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/29/2016] [Indexed: 11/22/2022] Open
Abstract
Transcription factor (TF) binding site prediction remains a challenge in gene regulatory research due to degeneracy and potential variability in binding sites in the genome. Dozens of algorithms designed to learn binding models (motifs) have generated many motifs available in research papers with a subset making it to databases like JASPAR, UniPROBE and Transfac. The presence of many versions of motifs from the various databases for a single TF and the lack of a standardized assessment technique makes it difficult for biologists to make an appropriate choice of binding model and for algorithm developers to benchmark, test and improve on their models. In this study, we review and evaluate the approaches in use, highlight differences and demonstrate the difficulty of defining a standardized motif assessment approach. We review scoring functions, motif length, test data and the type of performance metrics used in prior studies as some of the factors that influence the outcome of a motif assessment. We show that the scoring functions and statistics used in motif assessment influence ranking of motifs in a TF-specific manner. We also show that TF binding specificity can vary by source of genomic binding data. We also demonstrate that information content of a motif is not in isolation a measure of motif quality but is influenced by TF binding behaviour. We conclude that there is a need for an easy-to-use tool that presents all available evidence for a comparative analysis.
Collapse
Affiliation(s)
- Caleb Kipkurui Kibet
- Department of Computer Science and Research Unit in Bioinformatics (RUBi), Rhodes University, Grahamstown, South Africa
| | - Philip Machanick
- Department of Computer Science and Research Unit in Bioinformatics (RUBi), Rhodes University, Grahamstown, South Africa
| |
Collapse
|
18
|
Kibet CK, Machanick P. Transcription factor motif quality assessment requires systematic comparative analysis. F1000Res 2015; 4:ISCB Comm J-1429. [PMID: 27092243 DOI: 10.12688/f1000research.7408.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/19/2015] [Indexed: 03/26/2024] Open
Abstract
Transcription factor (TF) binding site prediction remains a challenge in gene regulatory research due to degeneracy and potential variability in binding sites in the genome. Dozens of algorithms designed to learn binding models (motifs) have generated many motifs available in research papers with a subset making it to databases like JASPAR, UniPROBE and Transfac. The presence of many versions of motifs from the various databases for a single TF and the lack of a standardized assessment technique makes it difficult for biologists to make an appropriate choice of binding model and for algorithm developers to benchmark, test and improve on their models. In this study, we review and evaluate the approaches in use, highlight differences and demonstrate the difficulty of defining a standardized motif assessment approach. We review scoring functions, motif length, test data and the type of performance metrics used in prior studies as some of the factors that influence the outcome of a motif assessment. We show that the scoring functions and statistics used in motif assessment influence ranking of motifs in a TF-specific manner. We also show that TF binding specificity can vary by source of genomic binding data. Finally, we demonstrate that information content of a motif is not in isolation a measure of motif quality but is influenced by TF binding behaviour. We conclude that there is a need for an easy-to-use tool that presents all available evidence for a comparative analysis.
Collapse
Affiliation(s)
- Caleb Kipkurui Kibet
- Department of Computer Science and Research Unit in Bioinformatics (RUBi), Rhodes University, Grahamstown, South Africa
| | - Philip Machanick
- Department of Computer Science and Research Unit in Bioinformatics (RUBi), Rhodes University, Grahamstown, South Africa
| |
Collapse
|
19
|
Wise A, Bar-Joseph Z. SMARTS: reconstructing disease response networks from multiple individuals using time series gene expression data. ACTA ACUST UNITED AC 2014; 31:1250-7. [PMID: 25480376 DOI: 10.1093/bioinformatics/btu800] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2014] [Accepted: 11/26/2014] [Indexed: 02/02/2023]
Abstract
MOTIVATION Current methods for reconstructing dynamic regulatory networks are focused on modeling a single response network using model organisms or cell lines. Unlike these models or cell lines, humans differ in their background expression profiles due to age, genetics and life factors. In addition, there are often differences in start and end times for time series human data and in the rate of progress based on the specific individual. Thus, new methods are required to integrate time series data from multiple individuals when modeling and constructing disease response networks. RESULTS We developed Scalable Models for the Analysis of Regulation from Time Series (SMARTS), a method integrating static and time series data from multiple individuals to reconstruct condition-specific response networks in an unsupervised way. Using probabilistic graphical models, SMARTS iterates between reconstructing different regulatory networks and assigning individuals to these networks, taking into account varying individual start times and response rates. These models can be used to group different sets of patients and to identify transcription factors that differentiate the observed responses between these groups. We applied SMARTS to analyze human response to influenza and mouse brain development. In both cases, it was able to greatly improve baseline groupings while identifying key relevant TFs that differ between the groups. Several of these groupings and TFs are known to regulate the relevant processes while others represent novel hypotheses regarding immune response and development. AVAILABILITY AND IMPLEMENTATION Software and Supplementary information are available at http://sb.cs.cmu.edu/smarts/. CONTACT zivbj@cs.cmu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aaron Wise
- Lane Center for Computational Biology and Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ziv Bar-Joseph
- Lane Center for Computational Biology and Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA Lane Center for Computational Biology and Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|