1
|
Zhang F, Lee A, Freitas A, Herb J, Wang Z, Gupta S, Chen Z, Xu H. A transcription network underlies the dual genomic coordination of mitochondrial biogenesis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.25.577217. [PMID: 38410491 PMCID: PMC10896348 DOI: 10.1101/2024.01.25.577217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
Mitochondrial biogenesis requires the expression of genes encoded by both the nuclear and mitochondrial genomes. However, aside from a handful transcriptional factors regulating specific subsets of mitochondrial genes, the overall architecture of the transcriptional control of mitochondrial biogenesis remains to be elucidated. The mechanisms coordinating these two genomes are largely unknown. We performed a targeted RNAi screen in developing eyes with reduced mitochondrial DNA content, anticipating a synergistic disruption of tissue development due to impaired mitochondrial biogenesis and mtDNA deficiency. Among 638 transcription factors annotated in Drosophila genome, 77 were identified as potential regulators of mitochondrial biogenesis. Utilizing published ChIP-seq data of positive hits, we constructed a regulatory network revealing the logic of the transcription regulation of mitochondrial biogenesis. Multiple transcription factors in core layers had extensive connections, collectively governing the expression of nearly all mitochondrial genes, whereas factors sitting on the top layer may respond to cellular cues to modulate mitochondrial biogenesis through the underlying network. CG1603, a core component of the network, was found to be indispensable for the expression of most nuclear mitochondrial genes, including those required for mtDNA maintenance and gene expression, thus coordinating nuclear genome and mtDNA activities in mitochondrial biogenies. Additional genetics analyses validated YL-1, a transcription factor upstream of CG1603 in the network, as a regulator controlling CG1603 expression and mitochondrial biogenesis.
Collapse
Affiliation(s)
- Fan Zhang
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Annie Lee
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Anna Freitas
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Jake Herb
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Zongheng Wang
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Snigdha Gupta
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Zhe Chen
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Hong Xu
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
2
|
Thi HV, Hoang TN, Le NQK, Chu DT. Application of data science and bioinformatics in RNA therapeutics. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 203:83-97. [PMID: 38360007 DOI: 10.1016/bs.pmbts.2023.12.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2024]
Abstract
Nowadays, information technology (IT) has been holding a significant role in daily life worldwide. The trajectory of data science and bioinformatics promises pioneering personalized therapies, reshaping medical landscapes and patient care. For RNA therapy to reach more patients, a comprehensive understanding of the application of data science and bioinformatics to this therapy is essential. Thus, this chapter has summarized the application of data science and bioinformatics in RNA therapeutics. Data science applications in RNA therapy, such as data integration and analytics, machine learning, and drug development, have been discussed. In addition, aspects of bioinformatics such as RNA design and evaluation, drug delivery system simulation, and databases for personalized medicine have also been covered in this chapter. These insights have shed light on existing evidence and opened potential future directions. From there, scientists can elevate RNA-based therapeutics into an era of tailored treatments and revolutionary healthcare.
Collapse
Affiliation(s)
- Hue Vu Thi
- Center for Biomedicine and Community Health, International School, Vietnam National University, Hanoi, Vietnam; Faculty of Applied Sciences, International School, Vietnam National University, Hanoi, Vietnam
| | - Thanh-Nhat Hoang
- Center for Biomedicine and Community Health, International School, Vietnam National University, Hanoi, Vietnam
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan; AIBioMed Research Group, Taipei Medical University, Taipei, Taiwan
| | - Dinh-Toi Chu
- Center for Biomedicine and Community Health, International School, Vietnam National University, Hanoi, Vietnam; Faculty of Applied Sciences, International School, Vietnam National University, Hanoi, Vietnam.
| |
Collapse
|
3
|
Godini R, Fallahi H, Pocock R. The regulatory landscape of neurite development in Caenorhabditis elegans. Front Mol Neurosci 2022; 15:974208. [PMID: 36090252 PMCID: PMC9453034 DOI: 10.3389/fnmol.2022.974208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 07/26/2022] [Indexed: 11/18/2022] Open
Abstract
Neuronal communication requires precise connectivity of neurite projections (axons and dendrites). Developing neurites express cell-surface receptors that interpret extracellular cues to enable correct guidance toward, and connection with, target cells. Spatiotemporal regulation of neurite guidance molecule expression by transcription factors (TFs) is critical for nervous system development and function. Here, we review how neurite development is regulated by TFs in the Caenorhabditis elegans nervous system. By collecting publicly available transcriptome and ChIP-sequencing data, we reveal gene expression dynamics during neurite development, providing insight into transcriptional mechanisms governing construction of the nervous system architecture.
Collapse
Affiliation(s)
- Rasoul Godini
- Development and Stem Cells Program, Department of Anatomy and Developmental Biology, Monash Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia
- *Correspondence: Rasoul Godini,
| | - Hossein Fallahi
- Department of Biology, School of Sciences, Razi University, Kermanshah, Iran
| | - Roger Pocock
- Development and Stem Cells Program, Department of Anatomy and Developmental Biology, Monash Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia
- Roger Pocock,
| |
Collapse
|
4
|
McDonald JMC, Reed RD. Patterns of selection across gene regulatory networks. Semin Cell Dev Biol 2022; 145:60-67. [PMID: 35474149 DOI: 10.1016/j.semcdb.2022.03.029] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 01/31/2022] [Accepted: 03/23/2022] [Indexed: 12/29/2022]
Abstract
Gene regulatory networks (GRNs) are the core engine of organismal development. If we would like to understand the origin and diversification of phenotypes, it is necessary to consider the structure of GRNs in order to reconstruct the links between genetic mutations and phenotypic change. Much of the progress in evolutionary developmental biology, however, has occurred without a nuanced consideration of the evolution of functional relationships between genes, especially in the context of their broader network interactions. Characterizing and comparing GRNs across traits and species in a more detailed way will allow us to determine how network position influences what genes drive adaptive evolution. In this perspective paper, we consider the architecture of developmental GRNs and how positive selection strength may vary across a GRN. We then propose several testable models for these patterns of selection and experimental approaches to test these models.
Collapse
Affiliation(s)
- Jeanne M C McDonald
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, United States.
| | - Robert D Reed
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, United States.
| |
Collapse
|
5
|
Hegre SA, Samdal H, Klima A, Stovner EB, Nørsett KG, Liabakk NB, Olsen LC, Chawla K, Aas PA, Sætrom P. Joint changes in RNA, RNA polymerase II, and promoter activity through the cell cycle identify non-coding RNAs involved in proliferation. Sci Rep 2021; 11:18952. [PMID: 34556693 PMCID: PMC8460802 DOI: 10.1038/s41598-021-97909-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 07/26/2021] [Indexed: 11/09/2022] Open
Abstract
Proper regulation of the cell cycle is necessary for normal growth and development of all organisms. Conversely, altered cell cycle regulation often underlies proliferative diseases such as cancer. Long non-coding RNAs (lncRNAs) are recognized as important regulators of gene expression and are often found dysregulated in diseases, including cancers. However, identifying lncRNAs with cell cycle functions is challenging due to their often low and cell-type specific expression. We present a highly effective method that analyses changes in promoter activity, transcription, and RNA levels for identifying genes enriched for cell cycle functions. Specifically, by combining RNA sequencing with ChIP sequencing through the cell cycle of synchronized human keratinocytes, we identified 1009 genes with cell cycle-dependent expression and correlated changes in RNA polymerase II occupancy or promoter activity as measured by histone 3 lysine 4 trimethylation (H3K4me3). These genes were highly enriched for genes with known cell cycle functions and included 57 lncRNAs. We selected four of these lncRNAs-SNHG26, EMSLR, ZFAS1, and EPB41L4A-AS1-for further experimental validation and found that knockdown of each of the four lncRNAs affected cell cycle phase distributions and reduced proliferation in multiple cell lines. These results show that many genes with cell cycle functions have concomitant cell-cycle dependent changes in promoter activity, transcription, and RNA levels and support that our multi-omics method is well suited for identifying lncRNAs involved in the cell cycle.
Collapse
Affiliation(s)
- Siv Anita Hegre
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway
| | - Helle Samdal
- Department of Computer Science, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway
| | - Antonin Klima
- Department of Computer Science, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway
| | - Endre B Stovner
- Department of Computer Science, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway.,K.G. Jebsen Center for Genetic Epidemiology, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway
| | - Kristin G Nørsett
- Department of Computer Science, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway.,Department of Biomedical Laboratory Science, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway
| | - Nina Beate Liabakk
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway
| | - Lene Christin Olsen
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway.,Bioinformatics Core Facility-BioCore, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway.,The Central Norway Regional Health Authority, St. Olavs Hospital HF, Trondheim, Norway
| | - Konika Chawla
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway.,Bioinformatics Core Facility-BioCore, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway
| | - Per Arne Aas
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway
| | - Pål Sætrom
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway. .,Department of Computer Science, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway. .,K.G. Jebsen Center for Genetic Epidemiology, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway. .,Bioinformatics Core Facility-BioCore, Norwegian University of Science and Technology (NTNU), 7491, Trondheim, Norway.
| |
Collapse
|
6
|
Massa AT, Mousel MR, Herndon MK, Herndon DR, Murdoch BM, White SN. Genome-Wide Histone Modifications and CTCF Enrichment Predict Gene Expression in Sheep Macrophages. Front Genet 2021; 11:612031. [PMID: 33488675 PMCID: PMC7817998 DOI: 10.3389/fgene.2020.612031] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 11/30/2020] [Indexed: 12/14/2022] Open
Abstract
Alveolar macrophages function in innate and adaptive immunity, wound healing, and homeostasis in the lungs dependent on tissue-specific gene expression under epigenetic regulation. The functional diversity of tissue resident macrophages, despite their common myeloid lineage, highlights the need to study tissue-specific regulatory elements that control gene expression. Increasing evidence supports the hypothesis that subtle genetic changes alter sheep macrophage response to important production pathogens and zoonoses, for example, viruses like small ruminant lentiviruses and bacteria like Coxiella burnetii. Annotation of transcriptional regulatory elements will aid researchers in identifying genetic mutations of immunological consequence. Here we report the first genome-wide survey of regulatory elements in any sheep immune cell, utilizing alveolar macrophages. We assayed histone modifications and CTCF enrichment by chromatin immunoprecipitation with deep sequencing (ChIP-seq) in two sheep to determine cis-regulatory DNA elements and chromatin domain boundaries that control immunity-related gene expression. Histone modifications included H3K4me3 (denoting active promoters), H3K27ac (active enhancers), H3K4me1 (primed and distal enhancers), and H3K27me3 (broad silencers). In total, we identified 248,674 reproducible regulatory elements, which allowed assignment of putative biological function in macrophages to 12% of the sheep genome. Data exceeded the FAANG and ENCODE standards of 20 million and 45 million useable fragments for narrow and broad marks, respectively. Active elements showed consensus with RNA-seq data and were predictive of gene expression in alveolar macrophages from the publicly available Sheep Gene Expression Atlas. Silencer elements were not enriched for expressed genes, but rather for repressed developmental genes. CTCF enrichment enabled identification of 11,000 chromatin domains with mean size of 258 kb. To our knowledge, this is the first report to use immunoprecipitated CTCF to determine putative topological domains in sheep immune cells. Furthermore, these data will empower phenotype-associated mutation discovery since most causal variants are within regulatory elements.
Collapse
Affiliation(s)
- Alisha T Massa
- Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, United States
| | - Michelle R Mousel
- Animal Disease Research Unit, Agricultural Research Service, United States Department of Agriculture, Pullman, WA, United States.,Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, United States
| | - Maria K Herndon
- Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, United States
| | - David R Herndon
- Animal Disease Research Unit, Agricultural Research Service, United States Department of Agriculture, Pullman, WA, United States
| | - Brenda M Murdoch
- Department of Animal and Veterinary Science, University of Idaho, Moscow, ID, United States.,Center for Reproductive Biology, Washington State University, Pullman, WA, United States
| | - Stephen N White
- Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, United States.,Animal Disease Research Unit, Agricultural Research Service, United States Department of Agriculture, Pullman, WA, United States.,Center for Reproductive Biology, Washington State University, Pullman, WA, United States
| |
Collapse
|
7
|
Sharipov RN, Kondrakhin YV, Ryabova AS, Yevshin IS, Kolpakov FA. Assessment of transcriptional importance of cell line-specific features based on GTRD and FANTOM5 data. PLoS One 2020; 15:e0243332. [PMID: 33347457 PMCID: PMC7751965 DOI: 10.1371/journal.pone.0243332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 11/19/2020] [Indexed: 11/18/2022] Open
Abstract
Creating a complete picture of the regulation of transcription seems to be an urgent task of modern biology. Regulation of transcription is a complex process carried out by transcription factors (TFs) and auxiliary proteins. Over the past decade, ChIP-Seq has become the most common experimental technology studying genome-wide interactions between TFs and DNA. We assessed the transcriptional significance of cell line-specific features using regression analysis of ChIP-Seq datasets from the GTRD database and transcriptional start site (TSS) activities from the FANTOM5 expression atlas. For this purpose, we initially generated a large number of features that were defined as the presence or absence of TFs in different promoter regions around TSSs. Using feature selection and regression analysis, we identified sets of the most important TFs that affect expression activity of TSSs in human cell lines such as HepG2, K562 and HEK293. We demonstrated that some TFs can be classified as repressors and activators depending on their location relative to TSS.
Collapse
Affiliation(s)
- Ruslan N. Sharipov
- Laboratory of Bioinformatics, Federal Research Center for Information and Computational Technologies, Novosibirsk, Russian Federation
- Specialized Educational Scientific Center, Novosibirsk State University, Novosibirsk, Russian Federation
- BIOSOFT.RU, Ltd, Novosibirsk, Russian Federation
| | - Yury V. Kondrakhin
- Laboratory of Bioinformatics, Federal Research Center for Information and Computational Technologies, Novosibirsk, Russian Federation
- BIOSOFT.RU, Ltd, Novosibirsk, Russian Federation
| | - Anna S. Ryabova
- Laboratory of Bioinformatics, Federal Research Center for Information and Computational Technologies, Novosibirsk, Russian Federation
- BIOSOFT.RU, Ltd, Novosibirsk, Russian Federation
| | - Ivan S. Yevshin
- Laboratory of Bioinformatics, Federal Research Center for Information and Computational Technologies, Novosibirsk, Russian Federation
- BIOSOFT.RU, Ltd, Novosibirsk, Russian Federation
| | - Fedor A. Kolpakov
- Laboratory of Bioinformatics, Federal Research Center for Information and Computational Technologies, Novosibirsk, Russian Federation
- BIOSOFT.RU, Ltd, Novosibirsk, Russian Federation
| |
Collapse
|
8
|
Conboy K, Henshall DC, Brennan GP. Epigenetic principles underlying epileptogenesis and epilepsy syndromes. Neurobiol Dis 2020; 148:105179. [PMID: 33181318 DOI: 10.1016/j.nbd.2020.105179] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Revised: 11/06/2020] [Accepted: 11/08/2020] [Indexed: 12/21/2022] Open
Abstract
Epilepsy is a network disorder driven by fundamental changes in the function of the cells which compose these networks. Driving this aberrant cellular function are large scale changes in gene expression and gene expression regulation. Recent studies have revealed rapid and persistent changes in epigenetic control of gene expression as a critical regulator of the epileptic transcriptome. Epigenetic-mediated gene output regulates many aspects of cellular physiology including neuronal structure, neurotransmitter assembly and abundance, protein abundance of ion channels and other critical neuronal processes. Thus, understanding the contribution of epigenetic-mediated gene regulation could illuminate novel regulatory mechanisms which may form the basis of novel therapeutic approaches to treat epilepsy. In this review we discuss the effects of epileptogenic brain insults on epigenetic regulation of gene expression, recent efforts to target epigenetic processes to block epileptogenesis and the prospects of an epigenetic-based therapy for epilepsy, and finally we discuss technological advancements which have facilitated the interrogation of the epigenome.
Collapse
Affiliation(s)
- Karen Conboy
- Department of Physiology and Medical Physics, RCSI University of Medicine and Health Sciences, Dublin, Ireland; FutureNeuro, the SFI Research Centre for Chronic and Rare Neurological Diseases, RCSI University of Medicine and Health Sciences, Dublin, Ireland
| | - David C Henshall
- Department of Physiology and Medical Physics, RCSI University of Medicine and Health Sciences, Dublin, Ireland; FutureNeuro, the SFI Research Centre for Chronic and Rare Neurological Diseases, RCSI University of Medicine and Health Sciences, Dublin, Ireland.
| | - Gary P Brennan
- FutureNeuro, the SFI Research Centre for Chronic and Rare Neurological Diseases, RCSI University of Medicine and Health Sciences, Dublin, Ireland; School of Biomolecular and Biomedical Science, UCD Conway Institute, University College Dublin, Dublin, Ireland
| |
Collapse
|
9
|
Razaghi-Moghadam Z, Nikoloski Z. Supervised learning of gene-regulatory networks based on graph distance profiles of transcriptomics data. NPJ Syst Biol Appl 2020; 6:21. [PMID: 32606380 PMCID: PMC7327016 DOI: 10.1038/s41540-020-0140-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 06/09/2020] [Indexed: 02/07/2023] Open
Abstract
Characterisation of gene-regulatory network (GRN) interactions provides a stepping stone to understanding how genes affect cellular phenotypes. Yet, despite advances in profiling technologies, GRN reconstruction from gene expression data remains a pressing problem in systems biology. Here, we devise a supervised learning approach, GRADIS, which utilises support vector machine to reconstruct GRNs based on distance profiles obtained from a graph representation of transcriptomics data. By employing the data from Escherichia coli and Saccharomyces cerevisiae as well as synthetic networks from the DREAM4 and five network inference challenges, we demonstrate that our GRADIS approach outperforms the state-of-the-art supervised and unsupervided approaches. This holds when predictions about target genes for individual transcription factors as well as for the entire network are considered. We employ experimentally verified GRNs from E. coli and S. cerevisiae to validate the predictions and obtain further insights in the performance of the proposed approach. Our GRADIS approach offers the possibility for usage of other network-based representations of large-scale data, and can be readily extended to help the characterisation of other cellular networks, including protein–protein and protein–metabolite interactions.
Collapse
Affiliation(s)
- Zahra Razaghi-Moghadam
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany.,Systems Biology and Mathematical Modeling group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam, Germany
| | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany. .,Systems Biology and Mathematical Modeling group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam, Germany.
| |
Collapse
|
10
|
Wang Z, Yin J, Zhou W, Bai J, Xie Y, Xu K, Zheng X, Xiao J, Zhou L, Qi X, Li Y, Li X, Xu J. Complex impact of DNA methylation on transcriptional dysregulation across 22 human cancer types. Nucleic Acids Res 2020; 48:2287-2302. [PMID: 32002550 PMCID: PMC7049702 DOI: 10.1093/nar/gkaa041] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 01/14/2020] [Indexed: 12/18/2022] Open
Abstract
Accumulating evidence has demonstrated that transcriptional regulation is affected by DNA methylation. Understanding the perturbation of DNA methylation-mediated regulation between transcriptional factors (TFs) and targets is crucial for human diseases. However, the global landscape of DNA methylation-mediated transcriptional dysregulation (DMTD) across cancers has not been portrayed. Here, we systematically identified DMTD by integrative analysis of transcriptome, methylome and regulatome across 22 human cancer types. Our results revealed that transcriptional regulation was affected by DNA methylation, involving hundreds of methylation-sensitive TFs (MethTFs). In addition, pan-cancer MethTFs, the regulatory activity of which is generally affected by DNA methylation across cancers, exhibit dominant functional characteristics and regulate several cancer hallmarks. Moreover, pan-cancer MethTFs were found to be affected by DNA methylation in a complex pattern. Finally, we investigated the cooperation among MethTFs and identified a network module that consisted of 43 MethTFs with prognostic potential. In summary, we systematically dissected the transcriptional dysregulation mediated by DNA methylation across cancer types, and our results provide a valuable resource for both epigenetic and transcriptional regulation communities.
Collapse
Affiliation(s)
- Zishan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jiaqi Yin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Weiwei Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jing Bai
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yunjin Xie
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Kang Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xiangyi Zheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jun Xiao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Li Zhou
- Department of Nephrology, Affiliated Hospital of Chengde Medical College, Chengde, Hebei Province, China
| | - Xiaolin Qi
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, Hainan 571199, China
| | - Yongsheng Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China.,Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, Hainan 571199, China.,College of Biomedical Information and Engineering, Hainan Medical University, Haikou, Hainan 570100, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China.,Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, Hainan 571199, China.,College of Biomedical Information and Engineering, Hainan Medical University, Haikou, Hainan 570100, China
| | - Juan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China.,Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, Hainan 571199, China.,College of Biomedical Information and Engineering, Hainan Medical University, Haikou, Hainan 570100, China
| |
Collapse
|
11
|
Klein HU, Schäfer M, Bennett DA, Schwender H, De Jager PL. Bayesian integrative analysis of epigenomic and transcriptomic data identifies Alzheimer's disease candidate genes and networks. PLoS Comput Biol 2020; 16:e1007771. [PMID: 32255787 PMCID: PMC7138305 DOI: 10.1371/journal.pcbi.1007771] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 03/03/2020] [Indexed: 12/28/2022] Open
Abstract
Biomedical research studies have generated large multi-omic datasets to study complex diseases like Alzheimer’s disease (AD). An important aim of these studies is the identification of candidate genes that demonstrate congruent disease-related alterations across the different data types measured by the study. We developed a new method to detect such candidate genes in large multi-omic case-control studies that measure multiple data types in the same set of samples. The method is based on a gene-centric integrative coefficient quantifying to what degree consistent differences are observed in the different data types. For statistical inference, a Bayesian hierarchical model is used to study the distribution of the integrative coefficient. The model employs a conditional autoregressive prior to integrate a functional gene network and to share information between genes known to be functionally related. We applied the method to an AD dataset consisting of histone acetylation, DNA methylation, and RNA transcription data from human cortical tissue samples of 233 subjects, and we detected 816 genes with consistent differences between persons with AD and controls. The findings were validated in protein data and in RNA transcription data from two independent AD studies. Finally, we found three subnetworks of jointly dysregulated genes within the functional gene network which capture three distinct biological processes: myeloid cell differentiation, protein phosphorylation and synaptic signaling. Further investigation of the myeloid network indicated an upregulation of this network in early stages of AD prior to accumulation of hyperphosphorylated tau and suggested that increased CSF1 transcription in astrocytes may contribute to microglial activation in AD. Thus, we developed a method that integrates multiple data types and external knowledge of gene function to detect candidate genes, applied the method to an AD dataset, and identified several disease-related genes and processes demonstrating the usefulness of the integrative approach. Recent technological advances have led to a new generation of studies that interrogate multiple molecular levels in the same target tissue of a set of subjects, generating complex multi-omic datasets with which to study disease mechanism. These datasets of genetic, epigenomic, transcriptomic, and other data have the potential to reveal novel biological insights; however, integrative analyses remain challenging and require new computational methods. We developed an integrative Bayesian approach to detect genes with consistent differences between case and control samples across multiple data types. The method further integrates prior knowledge about gene function in the form of a gene functional similarity network to improve statistical inference by sharing information between related genes. We applied our method to an Alzheimer’s disease dataset of epigenomic and transcriptomic data and detected and then validated several novel and known candidate genes as well as three major disease-related biological processes. One of these processes reflected microglial activation and included the cytokine CSF1. Single-nucleus data revealed that CSF1 was primarily upregulated in astrocytes, implicating the involvement of this cell type in microglial activation. Hence, we demonstrated that integrative analysis approaches to multi-omic datasets can improve candidate gene detection and thereby generate new insights into complex diseases.
Collapse
Affiliation(s)
- Hans-Ulrich Klein
- Center for Translational & Computational Neuroimmunology, Department of Neurology, Columbia University Irving Medical Center, New York, New York, United States of America
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, New York, United States of America
- * E-mail:
| | - Martin Schäfer
- Mathematical Institute, Heinrich Heine University, Düsseldorf, Germany
| | - David A. Bennett
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, Illinois, United States of America
| | - Holger Schwender
- Mathematical Institute, Heinrich Heine University, Düsseldorf, Germany
| | - Philip L. De Jager
- Center for Translational & Computational Neuroimmunology, Department of Neurology, Columbia University Irving Medical Center, New York, New York, United States of America
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, New York, United States of America
| |
Collapse
|
12
|
van der Wijst MGP, de Vries DH, Groot HE, Trynka G, Hon CC, Bonder MJ, Stegle O, Nawijn MC, Idaghdour Y, van der Harst P, Ye CJ, Powell J, Theis FJ, Mahfouz A, Heinig M, Franke L. The single-cell eQTLGen consortium. eLife 2020; 9:e52155. [PMID: 32149610 PMCID: PMC7077978 DOI: 10.7554/elife.52155] [Citation(s) in RCA: 121] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 03/03/2020] [Indexed: 12/17/2022] Open
Abstract
In recent years, functional genomics approaches combining genetic information with bulk RNA-sequencing data have identified the downstream expression effects of disease-associated genetic risk factors through so-called expression quantitative trait locus (eQTL) analysis. Single-cell RNA-sequencing creates enormous opportunities for mapping eQTLs across different cell types and in dynamic processes, many of which are obscured when using bulk methods. Rapid increase in throughput and reduction in cost per cell now allow this technology to be applied to large-scale population genetics studies. To fully leverage these emerging data resources, we have founded the single-cell eQTLGen consortium (sc-eQTLGen), aimed at pinpointing the cellular contexts in which disease-causing genetic variants affect gene expression. Here, we outline the goals, approach and potential utility of the sc-eQTLGen consortium. We also provide a set of study design considerations for future single-cell eQTL studies.
Collapse
Affiliation(s)
- MGP van der Wijst
- Department of Genetics, Oncode Institute, University of Groningen, University Medical Center GroningenGroningenNetherlands
| | - DH de Vries
- Department of Genetics, Oncode Institute, University of Groningen, University Medical Center GroningenGroningenNetherlands
| | - HE Groot
- Department of Cardiology, University of Groningen, University Medical Center GroningenGroningenNetherlands
| | - G Trynka
- Wellcome Sanger InstituteHinxtonUnited Kingdom
- Open TargetsHinxtonUnited Kingdom
| | - CC Hon
- RIKEN Center for Integrative Medical SciencesYokahamaJapan
| | - MJ Bonder
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ)HeidelbergGermany
- Genome Biology Unit, European Molecular Biology LaboratoryHeidelbergGermany
| | - O Stegle
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ)HeidelbergGermany
- Genome Biology Unit, European Molecular Biology LaboratoryHeidelbergGermany
| | - MC Nawijn
- Department of Pathology and Medical Biology, GRIAC Research Institute, University of Groningen, University Medical Center GroningenGroningenNetherlands
| | - Y Idaghdour
- Program in Biology, Public Health Research Center, New York University Abu DhabiAbu DhabiUnited Arab Emirates
| | - P van der Harst
- Department of Cardiology, University of Groningen, University Medical Center GroningenGroningenNetherlands
| | - CJ Ye
- Institute for Human Genetics, Bakar Computational Health Sciences Institute, Bakar ImmunoX Initiative, Department of Medicine, Department of Bioengineering and Therapeutic Sciences, Department of Epidemiology and Biostatistics, Chan Zuckerberg Biohub, University of California San FranciscoSan FranciscoUnited States
| | - J Powell
- Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute, UNSW Cellular Genomics Futures Institute, University of New South WalesSydneyAustralia
| | - FJ Theis
- Institute of Computational Biology, Helmholtz Zentrum MünchenNeuherbergGermany
- Department of Mathematics, Technical University of MunichGarching bei MünchenGermany
| | - A Mahfouz
- Leiden Computational Biology Center, Leiden University Medical CenterLeidenNetherlands
- Delft Bioinformatics Lab, Delft University of TechnologyDelftNetherlands
| | - M Heinig
- Institute of Computational Biology, Helmholtz Zentrum MünchenNeuherbergGermany
- Department of Informatics, Technical University of MunichGarching bei MünchenGermany
| | - L Franke
- Department of Genetics, Oncode Institute, University of Groningen, University Medical Center GroningenGroningenNetherlands
| |
Collapse
|
13
|
Staunton PM, Miranda-CasoLuengo AA, Loftus BJ, Gormley IC. BINDER: computationally inferring a gene regulatory network for Mycobacterium abscessus. BMC Bioinformatics 2019; 20:466. [PMID: 31500560 PMCID: PMC6734328 DOI: 10.1186/s12859-019-3042-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 08/21/2019] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Although many of the genic features in Mycobacterium abscessus have been fully validated, a comprehensive understanding of the regulatory elements remains lacking. Moreover, there is little understanding of how the organism regulates its transcriptomic profile, enabling cells to survive in hostile environments. Here, to computationally infer the gene regulatory network for Mycobacterium abscessus we propose a novel statistical computational modelling approach: BayesIan gene regulatory Networks inferreD via gene coExpression and compaRative genomics (BINDER). In tandem with derived experimental coexpression data, the property of genomic conservation is exploited to probabilistically infer a gene regulatory network in Mycobacterium abscessus.Inference on regulatory interactions is conducted by combining 'primary' and 'auxiliary' data strata. The data forming the primary and auxiliary strata are derived from RNA-seq experiments and sequence information in the primary organism Mycobacterium abscessus as well as ChIP-seq data extracted from a related proxy organism Mycobacterium tuberculosis. The primary and auxiliary data are combined in a hierarchical Bayesian framework, informing the apposite bivariate likelihood function and prior distributions respectively. The inferred relationships provide insight to regulon groupings in Mycobacterium abscessus. RESULTS We implement BINDER on data relating to a collection of 167,280 regulator-target pairs resulting in the identification of 54 regulator-target pairs, across 5 transcription factors, for which there is strong probability of regulatory interaction. CONCLUSIONS The inferred regulatory interactions provide insight to, and a valuable resource for further studies of, transcriptional control in Mycobacterium abscessus, and in the family of Mycobacteriaceae more generally. Further, the developed BINDER framework has broad applicability, useable in settings where computational inference of a gene regulatory network requires integration of data sources derived from both the primary organism of interest and from related proxy organisms.
Collapse
Affiliation(s)
- Patrick M. Staunton
- School of Medicine, Conway Institute, University College Dublin, Dublin, Ireland
| | | | - Brendan J. Loftus
- School of Medicine, Conway Institute, University College Dublin, Dublin, Ireland
| | - Isobel Claire Gormley
- School of Mathematics and Statistics, Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland
| |
Collapse
|
14
|
Rao X, Dixon RA. Co-expression networks for plant biology: why and how. Acta Biochim Biophys Sin (Shanghai) 2019; 51:981-988. [PMID: 31436787 DOI: 10.1093/abbs/gmz080] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 06/20/2019] [Accepted: 07/01/2019] [Indexed: 12/29/2022] Open
Abstract
Co-expression network analysis is one of the most powerful approaches for interpretation of large transcriptomic datasets. It enables characterization of modules of co-expressed genes that may share biological functional linkages. Such networks provide an initial way to explore functional associations from gene expression profiling and can be applied to various aspects of plant biology. This review presents the applications of co-expression network analysis in plant biology and addresses optimized strategies from the recent literature for performing co-expression analysis on plant biological systems. Additionally, we describe the combined interpretation of co-expression analysis with other genomic data to enhance the generation of biologically relevant information.
Collapse
Affiliation(s)
- Xiaolan Rao
- BioDiscovery Institute and Department of Biological Sciences, University of North Texas, Denton, TX 76203, USA
| | - Richard A Dixon
- BioDiscovery Institute and Department of Biological Sciences, University of North Texas, Denton, TX 76203, USA
| |
Collapse
|
15
|
Chen X, Gu J, Wang X, Jung JG, Wang TL, Hilakivi-Clarke L, Clarke R, Xuan J. CRNET: an efficient sampling approach to infer functional regulatory networks by integrating large-scale ChIP-seq and time-course RNA-seq data. Bioinformatics 2019; 34:1733-1740. [PMID: 29280996 DOI: 10.1093/bioinformatics/btx827] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Accepted: 12/20/2017] [Indexed: 12/28/2022] Open
Abstract
Motivation NGS techniques have been widely applied in genetic and epigenetic studies. Multiple ChIP-seq and RNA-seq profiles can now be jointly used to infer functional regulatory networks (FRNs). However, existing methods suffer from either oversimplified assumption on transcription factor (TF) regulation or slow convergence of sampling for FRN inference from large-scale ChIP-seq and time-course RNA-seq data. Results We developed an efficient Bayesian integration method (CRNET) for FRN inference using a two-stage Gibbs sampler to estimate iteratively hidden TF activities and the posterior probabilities of binding events. A novel statistic measure that jointly considers regulation strength and regression error enables the sampling process of CRNET to converge quickly, thus making CRNET very efficient for large-scale FRN inference. Experiments on synthetic and benchmark data showed a significantly improved performance of CRNET when compared with existing methods. CRNET was applied to breast cancer data to identify FRNs functional at promoter or enhancer regions in breast cancer MCF-7 cells. Transcription factor MYC is predicted as a key functional factor in both promoter and enhancer FRNs. We experimentally validated the regulation effects of MYC on CRNET-predicted target genes using appropriate RNAi approaches in MCF-7 cells. Availability and implementation R scripts of CRNET are available at http://www.cbil.ece.vt.edu/software.htm. Contact xuan@vt.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xi Chen
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Jinghua Gu
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Xiao Wang
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Jin-Gyoung Jung
- Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, USA
| | - Tian-Li Wang
- Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, USA
| | - Leena Hilakivi-Clarke
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057, USA
| | - Robert Clarke
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057, USA
| | - Jianhua Xuan
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
16
|
Rioualen C, Charbonnier-Khamvongsa L, Collado-Vides J, van Helden J. Integrating Bacterial ChIP-seq and RNA-seq Data With SnakeChunks. CURRENT PROTOCOLS IN BIOINFORMATICS 2019; 66:e72. [PMID: 30786165 PMCID: PMC7302399 DOI: 10.1002/cpbi.72] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Next-generation sequencing (NGS) is becoming a routine approach in most domains of the life sciences. To ensure reproducibility of results, there is a crucial need to improve the automation of NGS data processing and enable forthcoming studies relying on big datasets. Although user-friendly interfaces now exist, there remains a strong need for accessible solutions that allow experimental biologists to analyze and explore their results in an autonomous and flexible way. The protocols here describe a modular system that enable a user to compose and fine-tune workflows based on SnakeChunks, a library of rules for the Snakemake workflow engine. They are illustrated using a study combining ChIP-seq and RNA-seq to identify target genes of the global transcription factor FNR in Escherichia coli, which has the advantage that results can be compared with the most up-to-date collection of existing knowledge about transcriptional regulation in this model organism, extracted from the RegulonDB database. © 2019 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Claire Rioualen
- Aix-Marseille University, INSERM, Laboratory of Theory and Approaches of Genome Complexity (TAGC), Marseille, France
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Lucie Charbonnier-Khamvongsa
- Aix-Marseille University, INSERM, Laboratory of Theory and Approaches of Genome Complexity (TAGC), Marseille, France
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, México
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts
| | - Jacques van Helden
- Aix-Marseille University, INSERM, Laboratory of Theory and Approaches of Genome Complexity (TAGC), Marseille, France
- Institut Français de Bioinformatique (IFB), UMS 3601-CNRS, Université Paris-Saclay, Orsay, France
| |
Collapse
|
17
|
Marsh JW, Hayward RJ, Shetty A, Mahurkar A, Humphrys MS, Myers GSA. Dual RNA-Seq of Chlamydia and Host Cells. Methods Mol Biol 2019; 2042:123-135. [PMID: 31385273 DOI: 10.1007/978-1-4939-9694-0_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
During the infection of a host cell by a bacterial pathogen, a cascading series of gene expression changes occurs as each organism manipulates or responds to the other via defense or survival strategies. Unraveling this complex interplay is key for our understanding of bacterial virulence and host response pathways for the development of novel therapeutics. Dual RNA sequencing (dual RNA-Seq) has recently been developed to simultaneously capture host and bacterial transcriptomes from an infected cell. Leveraging the sensitivity and resolution allowed by RNA-seq, dual RNA-Seq can be applied to any bacteria-eukaryotic host interaction. We pioneered dual RNA-Seq to simultaneously capture Chlamydia and host expression profiles during an in vitro infection as proof of principle. Here we provide a detailed laboratory protocol and bioinformatics analysis guidelines for dual RNA-seq experiments focusing on Chlamydia as the organism of interest.
Collapse
Affiliation(s)
- James W Marsh
- The iThree Institute, University of Technology Sydney, Ultimo, NSW, Australia
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Regan J Hayward
- The iThree Institute, University of Technology Sydney, Ultimo, NSW, Australia
| | - Amol Shetty
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MA, USA
| | - Anup Mahurkar
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MA, USA
| | - Michael S Humphrys
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MA, USA
| | - Garry S A Myers
- The iThree Institute, University of Technology Sydney, Ultimo, NSW, Australia.
| |
Collapse
|
18
|
Iuliano A, Occhipinti A, Angelini C, De Feis I, Liò P. Combining Pathway Identification and Breast Cancer Survival Prediction via Screening-Network Methods. Front Genet 2018; 9:206. [PMID: 29963073 PMCID: PMC6011013 DOI: 10.3389/fgene.2018.00206] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 05/24/2018] [Indexed: 12/30/2022] Open
Abstract
Breast cancer is one of the most common invasive tumors causing high mortality among women. It is characterized by high heterogeneity regarding its biological and clinical characteristics. Several high-throughput assays have been used to collect genome-wide information for many patients in large collaborative studies. This knowledge has improved our understanding of its biology and led to new methods of diagnosing and treating the disease. In particular, system biology has become a valid approach to obtain better insights into breast cancer biological mechanisms. A crucial component of current research lies in identifying novel biomarkers that can be predictive for breast cancer patient prognosis on the basis of the molecular signature of the tumor sample. However, the high dimension and low sample size of data greatly increase the difficulty of cancer survival analysis demanding for the development of ad-hoc statistical methods. In this work, we propose novel screening-network methods that predict patient survival outcome by screening key survival-related genes and we assess the capability of the proposed approaches using METABRIC dataset. In particular, we first identify a subset of genes by using variable screening techniques on gene expression data. Then, we perform Cox regression analysis by incorporating network information associated with the selected subset of genes. The novelty of this work consists in the improved prediction of survival responses due to the different types of screenings (i.e., a biomedical-driven, data-driven and a combination of the two) before building the network-penalized model. Indeed, the combination of the two screening approaches allows us to use the available biological knowledge on breast cancer and complement it with additional information emerging from the data used for the analysis. Moreover, we also illustrate how to extend the proposed approaches to integrate an additional omic layer, such as copy number aberrations, and we show that such strategies can further improve our prediction capabilities. In conclusion, our approaches allow to discriminate patients in high-and low-risk groups using few potential biomarkers and therefore, can help clinicians to provide more precise prognoses and to facilitate the subsequent clinical management of patients at risk of disease.
Collapse
Affiliation(s)
- Antonella Iuliano
- Istituto per le Applicazioni del Calcolo "Mauro Picone", Consiglio Nazionale delle Ricerche, Naples, Italy.,Telethon Institute of Genetics and Medicine, Pozzuoli, Italy
| | | | - Claudia Angelini
- Istituto per le Applicazioni del Calcolo "Mauro Picone", Consiglio Nazionale delle Ricerche, Naples, Italy
| | - Italia De Feis
- Istituto per le Applicazioni del Calcolo "Mauro Picone", Consiglio Nazionale delle Ricerche, Naples, Italy
| | - Pietro Liò
- Computer Laboratory, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
19
|
Abstract
Transcription is regulated by transcription factor (TF) binding at promoters and distal regulatory elements and histone modifications that control the accessibility of these elements. Chromatin immunoprecipitation followed by sequencing (ChIP-seq) has become the standard assay for identifying genome-wide protein-DNA interactions in vitro and in vivo. As large-scale ChIP-seq data sets have been collected for different TFs and histone modifications, their potential to predict gene expression can be used to test hypotheses about the mechanisms of gene regulation. In addition, complementary functional genomics assays provide a global view of chromatin accessibility and long-range cis-regulatory interactions that are being combined with TF binding and histone remodeling to study the regulation of gene expression. Thus, ChIP-seq analysis is now widely integrated with other functional genomics assays to better understand gene regulatory mechanisms. In this review, we discuss advances and challenges in integrating ChIP-seq data to identify context-specific chromatin states associated with gene activity. We describe the overall computational design of integrating ChIP-seq data with other functional genomics assays. We also discuss the challenges of extending these methods to low-input ChIP-seq assays and related single-cell assays.
Collapse
Affiliation(s)
| | - Ali Mortazavi
- Corresponding author: Ali Mortazavi, Department of Developmental and Cell Biology, 2300 Biological Sciences 3, University of California, Irvine, CA 92697, USA. Tel: (949)824-6762; E-mail:
| |
Collapse
|
20
|
Chung PJ, Jung H, Choi YD, Kim JK. Genome-wide analyses of direct target genes of four rice NAC-domain transcription factors involved in drought tolerance. BMC Genomics 2018; 19:40. [PMID: 29329517 PMCID: PMC5767043 DOI: 10.1186/s12864-017-4367-1] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2017] [Accepted: 12/06/2017] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Plant stress responses and mechanisms determining tolerance are controlled by diverse sets of genes. Transcription factors (TFs) have been implicated in conferring drought tolerance under drought stress conditions, and the identification of their target genes can elucidate molecular regulatory networks that orchestrate tolerance mechanisms. RESULTS We generated transgenic rice plants overexpressing the 4 rice TFs, OsNAC5, 6, 9, and 10, under the control of the root-specific RCc3 promoter. We showed that they were tolerant to drought stress with reduced loss of grain yield under drought conditions compared with wild type plants. To understand the molecular mechanisms underlying this tolerance, we here performed chromatin immunoprecipitation (ChIP)-Seq and RNA-Seq analyses to identify the direct target genes of the OsNAC proteins using the RCc3:6MYC-OsNAC expressing roots. A total of 475 binding loci for the 4 OsNAC proteins were identified by cross-referencing their binding to promoter regions and the expression levels of the corresponding genes. The binding loci were distributed among the promoter regions of 391 target genes that were directly up-regulated by one of the OsNAC proteins in four RCc3:6MYC-OsNAC transgenic lines. Based on gene ontology (GO) analysis, the direct target genes were related to transmembrane/transporter activity, vesicle, plant hormones, carbohydrate metabolism, and TFs. The direct targets of each OsNAC range from 4.0-8.7% of the total number of up-regulated genes found in the RNA-Seq data sets. Thus, each OsNAC up-regulates a set of direct target genes that alter root system architecture in the RCc3:OsNAC plants to confer drought tolerance. Our results provide a valuable resource for functional dissection of the molecular mechanisms of drought tolerance. CONCLUSIONS Many of the target genes, including transmembrane/transporter, vesicle related, auxin/hormone related, carbohydrate metabolic processes, and transcription factor genes, that are up-regulated by OsNACs act as the cellular components which would alter the root architectures of RCc3:OsNACs for drought tolerance.
Collapse
Affiliation(s)
- Pil Joong Chung
- Graduate School of International Agricultural Technology and Crop Biotechnology Institute/GreenBio Science & Technology, Seoul National University, Pyeongchang, 25354, South Korea
| | - Harin Jung
- Graduate School of International Agricultural Technology and Crop Biotechnology Institute/GreenBio Science & Technology, Seoul National University, Pyeongchang, 25354, South Korea.,Present address: NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117596, Singapore
| | - Yang Do Choi
- Graduate School of International Agricultural Technology and Crop Biotechnology Institute/GreenBio Science & Technology, Seoul National University, Pyeongchang, 25354, South Korea.,Department of Agricultural Biotechnology, Seoul National University, Seoul, 08826, South Korea
| | - Ju-Kon Kim
- Graduate School of International Agricultural Technology and Crop Biotechnology Institute/GreenBio Science & Technology, Seoul National University, Pyeongchang, 25354, South Korea.
| |
Collapse
|
21
|
Jordán-Pla A, Visa N. Considerations on Experimental Design and Data Analysis of Chromatin Immunoprecipitation Experiments. Methods Mol Biol 2018; 1689:9-28. [PMID: 29027161 DOI: 10.1007/978-1-4939-7380-4_2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Arguably one of the most valuable techniques to study chromatin organization, ChIP is the method of choice to map the contacts established between proteins and genomic DNA. Ever since its inception, more than 30 years ago, ChIP has been constantly evolving, improving, and expanding its capabilities and reach. Despite its widespread use by many laboratories across a wide variety of disciplines, ChIP assays can be sometimes challenging to design, and are often sensitive to variations in practical implementation.In this chapter, we provide a general overview of the ChIP method and its most common variations, with a special focus on ChIP-seq. We try to address some of the most important aspects that need to be taken into account in order to design and perform experiments that generate the most reproducible, high-quality data. Some of the main topics covered include the use of properly characterized antibodies, alternatives to chromatin preparation, the need for proper controls, and some recommendations about ChIP-seq data analysis.
Collapse
Affiliation(s)
- Antonio Jordán-Pla
- Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Svante Arrhenius väg 20c, 10691, Stockholm, Sweden.
| | - Neus Visa
- Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Svante Arrhenius väg 20c, 10691, Stockholm, Sweden
| |
Collapse
|
22
|
Pan-Cancer Mutational and Transcriptional Analysis of the Integrator Complex. Int J Mol Sci 2017; 18:ijms18050936. [PMID: 28468258 PMCID: PMC5454849 DOI: 10.3390/ijms18050936] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Revised: 04/20/2017] [Accepted: 04/23/2017] [Indexed: 12/28/2022] Open
Abstract
The integrator complex has been recently identified as a key regulator of RNA Polymerase II-mediated transcription, with many functions including the processing of small nuclear RNAs, the pause-release and elongation of polymerase during the transcription of protein coding genes, and the biogenesis of enhancer derived transcripts. Moreover, some of its components also play a role in genome maintenance. Thus, it is reasonable to hypothesize that their functional impairment or altered expression can contribute to malignancies. Indeed, several studies have described the mutations or transcriptional alteration of some Integrator genes in different cancers. Here, to draw a comprehensive pan-cancer picture of the genomic and transcriptomic alterations for the members of the complex, we reanalyzed public data from The Cancer Genome Atlas. Somatic mutations affecting Integrator subunit genes and their transcriptional profiles have been investigated in about 11,000 patients and 31 tumor types. A general heterogeneity in the mutation frequencies was observed, mostly depending on tumor type. Despite the fact that we could not establish them as cancer drivers, INTS7 and INTS8 genes were highly mutated in specific cancers. A transcriptome analysis of paired (normal and tumor) samples revealed that the transcription of INTS7, INTS8, and INTS13 is significantly altered in several cancers. Experimental validation performed on primary tumors confirmed these findings.
Collapse
|
23
|
MapReduce Algorithms for Inferring Gene Regulatory Networks from Time-Series Microarray Data Using an Information-Theoretic Approach. BIOMED RESEARCH INTERNATIONAL 2017; 2017:6261802. [PMID: 28243601 PMCID: PMC5294223 DOI: 10.1155/2017/6261802] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Revised: 11/14/2016] [Accepted: 12/13/2016] [Indexed: 12/15/2022]
Abstract
Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test. Furthermore, prodding through experimental results requires an enormous amount of computation, resulting in slow data processing. Therefore, new approaches are needed to expeditiously analyze copious amounts of experimental data resulting from cellular GRNs. To meet this need, cloud computing is promising as reported in the literature. Here, we propose new MapReduce algorithms for inferring gene regulatory networks on a Hadoop cluster in a cloud environment. These algorithms employ an information-theoretic approach to infer GRNs using time-series microarray data. Experimental results show that our MapReduce program is much faster than an existing tool while achieving slightly better prediction accuracy than the existing tool.
Collapse
|
24
|
Analysis of Co-Associated Transcription Factors via Ordered Adjacency Differences on Motif Distribution. Sci Rep 2017; 7:43597. [PMID: 28240320 PMCID: PMC5327392 DOI: 10.1038/srep43597] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2016] [Accepted: 01/25/2017] [Indexed: 01/16/2023] Open
Abstract
Transcription factors (TFs) binding to specific DNA sequences or motifs, are elementary to the regulation of transcription. The gene is regulated by a combination of TFs in close proximity. Analysis of co-TFs is an important problem in understanding the mechanism of transcriptional regulation. Recently, ChIP-seq in mapping TF provides a large amount of experimental data to analyze co-TFs. Several studies show that if two TFs are co-associated, the relative distance between TFs exhibits a peak-like distribution. In order to analyze co-TFs, we develop a novel method to evaluate the associated situation between TFs. We design an adjacency score based on ordered differences, which can illustrate co-TF binding affinities for motif analysis. For all candidate motifs, we calculate corresponding adjacency scores, and then list descending-order motifs. From these lists, we can find co-TFs for candidate motifs. On ChIP-seq datasets, our method obtains best AUC results on five datasets, 0.9432 for NMYC, 0.9109 for KLF4, 0.9006 for ZFX, 0.8892 for ESRRB, 0.8920 for E2F1. Our method has great stability on large sample datasets. AUC results of our method on all datasets are above 0.8.
Collapse
|
25
|
Abstract
Coronaviruses (CoV) comprise a large group of emerging human and animal pathogens, including the highly pathogenic severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV) strains. The molecular mechanisms regulating emerging coronavirus pathogenesis are complex and include virus–host interactions associated with entry, replication, egress and innate immune control. Epigenetics research investigates the genetic and non-genetic factors that regulate phenotypic variation, usually caused by external and environmental factors that alter host expression patterns and performance without any change in the underlying genotype. Epigenetic modifications, such as histone modifications, DNA methylation, chromatin remodeling, and non-coding RNAs, function as important regulators that remodel host chromatin, altering host expression patterns and networks in a highly flexible manner. For most of the past two and a half decades, research has focused on the molecular mechanisms by which RNA viruses antagonize the signaling and sensing components that regulate induction of the host innate immune and antiviral defense programs upon infection. More recently, a growing body of evidence supports the hypothesis that viruses, even lytic RNA viruses that replicate in the cytoplasm, have developed intricate, highly evolved, and well-coordinated processes that are designed to regulate the host epigenome, and control host innate immune antiviral defense processes, thereby promoting robust virus replication and pathogenesis. In this article, we discuss the strategies that are used to evaluate the mechanisms by which viruses regulate the host epigenome, especially focusing on highly pathogenic respiratory RNA virus infections as a model. By combining measures of epigenome reorganization with RNA and proteomic datasets, we articulate a spatial-temporal data integration approach to identify regulatory genomic clusters and regions that play a crucial role in the host’s innate immune response, thereby defining a new viral antagonism mechanism following emerging coronavirus infection.
Collapse
|
26
|
Klein H, Schäfer M. Integrative Analysis of Histone ChIP‐seq and RNA‐seq Data. ACTA ACUST UNITED AC 2016; 90:20.3.1-20.3.16. [DOI: 10.1002/cphg.17] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Hans‐Ulrich Klein
- Program in Translational NeuroPsychiatric Genomics, Brigham and Women's Hospital Boston Massachusetts
- Harvard Medical School Boston Massachusetts
- Institute of Medical Informatics, University of Münster Münster Germany
| | - Martin Schäfer
- Mathematical Institute, Heinrich Heine University Düsseldorf Düsseldorf Germany
| |
Collapse
|
27
|
Gallagher JP, Grover CE, Hu G, Wendel JF. Insights into the Ecology and Evolution of Polyploid Plants through Network Analysis. Mol Ecol 2016; 25:2644-60. [PMID: 27027619 DOI: 10.1111/mec.13626] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Revised: 03/09/2016] [Accepted: 03/22/2016] [Indexed: 12/18/2022]
Abstract
Polyploidy is a widespread phenomenon throughout eukaryotes, with important ecological and evolutionary consequences. Although genes operate as components of complex pathways and networks, polyploid changes in genes and gene expression have typically been evaluated as either individual genes or as a part of broad-scale analyses. Network analysis has been fruitful in associating genomic and other 'omic'-based changes with phenotype for many systems. In polyploid species, network analysis has the potential not only to facilitate a better understanding of the complex 'omic' underpinnings of phenotypic and ecological traits common to polyploidy, but also to provide novel insight into the interaction among duplicated genes and genomes. This adds perspective to the global patterns of expression (and other 'omic') change that accompany polyploidy and to the patterns of recruitment and/or loss of genes following polyploidization. While network analysis in polyploid species faces challenges common to other analyses of duplicated genomes, present technologies combined with thoughtful experimental design provide a powerful system to explore polyploid evolution. Here, we demonstrate the utility and potential of network analysis to questions pertaining to polyploidy with an example involving evolution of the transgressively superior cotton fibres found in polyploid Gossypium hirsutum. By combining network analysis with prior knowledge, we provide further insights into the role of profilins in fibre domestication and exemplify the potential for network analysis in polyploid species.
Collapse
Affiliation(s)
- Joseph P Gallagher
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| | - Corrinne E Grover
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| | - Guanjing Hu
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| | - Jonathan F Wendel
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| |
Collapse
|
28
|
Abstract
Single-cell RNA-sequencing methods are now robust and economically practical and are becoming a powerful tool for high-throughput, high-resolution transcriptomic analysis of cell states and dynamics. Single-cell approaches circumvent the averaging artifacts associated with traditional bulk population data, yielding new insights into the cellular diversity underlying superficially homogeneous populations. Thus far, single-cell RNA-sequencing has already shown great effectiveness in unraveling complex cell populations, reconstructing developmental trajectories, and modeling transcriptional dynamics. Ongoing technical improvements to single-cell RNA-sequencing throughput and sensitivity, the development of more sophisticated analytical frameworks for single-cell data, and an increasing array of complementary single-cell assays all promise to expand the usefulness and potential applications of single-cell transcriptomic profiling.
Collapse
Affiliation(s)
- Serena Liu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| |
Collapse
|
29
|
Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A. A survey of best practices for RNA-seq data analysis. Genome Biol 2016; 17:13. [PMID: 26813401 PMCID: PMC4728800 DOI: 10.1186/s13059-016-0881-8] [Citation(s) in RCA: 1405] [Impact Index Per Article: 175.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics.
Collapse
Affiliation(s)
- Ana Conesa
- Institute for Food and Agricultural Sciences, Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, 32603, USA. .,Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012, Valencia, Spain.
| | - Pedro Madrigal
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. .,Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery, University of Cambridge, Cambridge, CB2 0SZ, UK.
| | - Sonia Tarazona
- Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012, Valencia, Spain.,Department of Applied Statistics, Operations Research and Quality, Universidad Politécnica de Valencia, 46020, Valencia, Spain
| | - David Gomez-Cabrero
- Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, Karolinska University Hospital, 171 77, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, 17177, Stockholm, Sweden.,Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176, Stockholm, Sweden.,Science for Life Laboratory, 17121, Solna, Sweden
| | - Alejandra Cervera
- Systems Biology Laboratory, Institute of Biomedicine and Genome-Scale Biology Research Program, University of Helsinki, 00014, Helsinki, Finland
| | - Andrew McPherson
- School of Computing Science, Simon Fraser University, Burnaby, V5A 1S6, BC, Canada
| | - Michał Wojciech Szcześniak
- Department of Bioinformatics, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University in Poznań, 61-614, Poznań, Poland
| | - Daniel J Gaffney
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Laura L Elo
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
| | - Xuegong Zhang
- Key Lab of Bioinformatics/Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing, 100084, China.,School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697-2300, USA. .,Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, 92697, USA.
| |
Collapse
|
30
|
Veith N, Ziehr H, MacLeod RAF, Reamon-Buettner SM. Mechanisms underlying epigenetic and transcriptional heterogeneity in Chinese hamster ovary (CHO) cell lines. BMC Biotechnol 2016; 16:6. [PMID: 26800878 PMCID: PMC4722726 DOI: 10.1186/s12896-016-0238-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Accepted: 01/15/2016] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Recombinant cell lines developed for therapeutic antibody production often suffer instability or lose recombinant protein expression during long-term culture. Heterogeneous gene expression among cell line subclones may result from epigenetic modifications of DNA or histones, the protein component of chromatin. We thus investigated in such cell lines, DNA methylation and the chromatin environment along the human eukaryotic translation elongation factor 1 alpha 1 (EEF1A1) promoter in an antibody protein-expression vector which was integrated into the Chinese hamster ovary (CHO) cell line genome. RESULTS We analyzed four PT1-CHO cell lines which exhibited losses of protein expression at advanced passage number (>P35) growing in adherent conditions and in culture medium with 10 % FCS. These cell lines exhibited different integration sites and transgene copy numbers as determined by fluorescence in situ hybridization (FISH) and quantitative PCR (qPCR), respectively. By qRT-PCR, we analyzed the recombinant mRNA expression and correlated it with DNA methylation and with results from various approaches interrogating the chromatin landscape along the EEF1A1 promoter region. Each PT1-CHO cell line displayed specific epigenetic signatures or chromatin marks correlating with recombinant mRNA expression. The cell line with the lowest recombinant mRNA expression (PT1-1) was characterized by the highest nucleosome occupancy and displayed the lowest enrichment for histone marks associated with active transcription. In contrast, the cell line with the highest recombinant mRNA expression (PT1-55) exhibited the highest numbers of formaldehyde-assisted isolation of regulatory elements (FAIRE)-enriched regions, and was marked by enrichment for histone modifications H3K9ac and H3K9me3. Another cell line with the second highest recombinant mRNA transcription and the most stable protein expression (PT1-7) had the highest enrichments of the histone variants H3.3 and H2A.Z, and the histone modification H3K9ac. A further cell line (PT1-30) scored the highest enrichments for the bivalent marks H3K4me3 and H3K27me3. Finally, DNA methylation made a contribution, but only in the culture medium with reduced FCS or in a different expression vector. CONCLUSIONS Our results suggest that the chromatin state along the EEF1A1 promoter region can help predict recombinant mRNA expression, and thus may assist in selecting desirable clones during cell line development for protein production.
Collapse
Affiliation(s)
- Nathalie Veith
- Pharmaceutical Biotechnology, Fraunhofer Institute for Toxicology and Experimental Medicine, Inhoffenstrasse 7, 38124, Braunschweig, Germany.
| | - Holger Ziehr
- Pharmaceutical Biotechnology, Fraunhofer Institute for Toxicology and Experimental Medicine, Inhoffenstrasse 7, 38124, Braunschweig, Germany.
| | - Roderick A F MacLeod
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Inhoffenstrasse 7B, 38124, Braunschweig, Germany.
| | - Stella Marie Reamon-Buettner
- Preclinical Pharmacology and In Vitro Toxicology, Fraunhofer Institute for Toxicology and Experimental Medicine, Nikolai-Fuchs Strasse 1, 30625, Hannover, Germany.
| |
Collapse
|
31
|
Affiliation(s)
- Christine Nardini
- Lazzari Bologna, Italy ; Group of Clinical Genomic Networks, Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences Shanghai, China
| | | | - Paolo Tieri
- Consiglio Nazionale delle Ricerche, Istituto per le Applicazioni del Calcolo Rome, Italy
| |
Collapse
|
32
|
Abstract
Despite the rapid accumulation of tumor-profiling data and transcription factor (TF) ChIP-seq profiles, efforts integrating TF binding with the tumor-profiling data to understand how TFs regulate tumor gene expression are still limited. To systematically search for cancer-associated TFs, we comprehensively integrated 686 ENCODE ChIP-seq profiles representing 150 TFs with 7484 TCGA tumor data in 18 cancer types. For efficient and accurate inference on gene regulatory rules across a large number and variety of datasets, we developed an algorithm, RABIT (regression analysis with background integration). In each tumor sample, RABIT tests whether the TF target genes from ChIP-seq show strong differential regulation after controlling for background effect from copy number alteration and DNA methylation. When multiple ChIP-seq profiles are available for a TF, RABIT prioritizes the most relevant ChIP-seq profile in each tumor. In each cancer type, RABIT further tests whether the TF expression and somatic mutation variations are correlated with differential expression patterns of its target genes across tumors. Our predicted TF impact on tumor gene expression is highly consistent with the knowledge from cancer-related gene databases and reveals many previously unidentified aspects of transcriptional regulation in tumor progression. We also applied RABIT on RNA-binding protein motifs and found that some alternative splicing factors could affect tumor-specific gene expression by binding to target gene 3'UTR regions. Thus, RABIT (rabit.dfci.harvard.edu) is a general platform for predicting the oncogenic role of gene expression regulators.
Collapse
|