1
|
Cheng JH, Zheng C, Yamada R, Okada D. Visualization of the landscape of the read alignment shape of ATAC-seq data using Hellinger distance metric. Genes Cells 2024; 29:5-16. [PMID: 37989133 DOI: 10.1111/gtc.13082] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 10/25/2023] [Accepted: 10/28/2023] [Indexed: 11/23/2023]
Abstract
Assay for Transposase-Accessible Chromatin using high-throughput sequencing (ATAC-seq) is the popular technique using next-generation sequencing to measure chromatin accessibility and identify open chromatin regions. While read alignment shape information of next-generation sequencing data with intensity information has been used in various bioinformatics methods, few studies have focused on pure shape information alone. In this study, we investigated what types of ATAC-seq read alignment shapes are observed for the promoter region and whether the pure shape information was related or unrelated to other gene features. We introduced a novel concept and pipeline for handling the pure shape information of NGS data as probability distributions and quantifying their dissimilarities by information theory. Based on this concept, we demonstrate that the pure shape information of ATAC-seq data is correlated with chromatin openness and some gene characteristics. On the other hand, it is suggested that the pure information of ATAC-seq read alignment shape is unlikely to contain additional information to explain differences in RNA expression. Our study suggests that viewing the read alignment shape of NGS data as probability distributions enables us to capture the characteristics of the genome-wide landscape of such data in a non-parametric manner.
Collapse
Affiliation(s)
- Jian Hao Cheng
- Center for Genomics Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Cheng Zheng
- Center for Genomics Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Ryo Yamada
- Center for Genomics Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Daigo Okada
- Center for Genomics Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| |
Collapse
|
2
|
Yadav D, Patil-Takbhate B, Khandagale A, Bhawalkar J, Tripathy S, Khopkar-Kale P. Next-Generation sequencing transforming clinical practice and precision medicine. Clin Chim Acta 2023; 551:117568. [PMID: 37839516 DOI: 10.1016/j.cca.2023.117568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 09/27/2023] [Accepted: 09/27/2023] [Indexed: 10/17/2023]
Abstract
Next-generation sequencing (NGS) has revolutionized the field of genomics and is rapidly transforming clinical diagnosis and precision medicine. This advanced sequencing technology enables the rapid and cost-effective analysis of large-scale genomic data, allowing comprehensive exploration of the genetic landscape of diseases. In clinical diagnosis, NGS has proven to be a powerful tool for identifying disease-causing variants, enabling accurate and early detection of genetic disorders. Additionally, NGS facilitates the identification of novel disease-associated genes and variants, aiding in the development of targeted therapies and personalized treatment strategies. NGS greatly benefits precision medicine by enhancing our understanding of disease mechanisms and enabling the identification of specific molecular markers for disease subtypes, thus enabling tailored medical interventions based on individual characteristics. Furthermore, NGS contributes to the development of non-invasive diagnostic approaches, such as liquid biopsies, which can monitor disease progression and treatment response. The potential of NGS in clinical diagnosis and precision medicine is vast, yet challenges persist in data analysis, interpretation, and protocol standardization. This review highlights NGS applications in disease diagnosis, prognosis, and personalized treatment strategies, while also addressing challenges and future prospects in fully harnessing genomic potential within clinical practice.
Collapse
Affiliation(s)
- Deepali Yadav
- Central Research Facility, Dr. D.Y Patil Medical College, Hospital & Research Centre, Dr. D. Y. Patil Vidyapeeth, Pimpri Pune 411018, India; Department of Biotechnology, Dr. D. Y. Patil Arts Science and Commerce College, Pimpri Pune 411018, India
| | - Bhagyashri Patil-Takbhate
- Central Research Facility, Dr. D.Y Patil Medical College, Hospital & Research Centre, Dr. D. Y. Patil Vidyapeeth, Pimpri Pune 411018, India
| | - Anil Khandagale
- Department of Biotechnology, Dr. D. Y. Patil Arts Science and Commerce College, Pimpri Pune 411018, India
| | - Jitendra Bhawalkar
- Department of Community Medicine, Dr. D.Y Patil Medical College, Hospital & Research Centre, Dr. D. Y. Patil Vidyapeeth, Pimpri Pune 411018, India
| | - Srikanth Tripathy
- Central Research Facility, Dr. D.Y Patil Medical College, Hospital & Research Centre, Dr. D. Y. Patil Vidyapeeth, Pimpri Pune 411018, India.
| | - Priyanka Khopkar-Kale
- Central Research Facility, Dr. D.Y Patil Medical College, Hospital & Research Centre, Dr. D. Y. Patil Vidyapeeth, Pimpri Pune 411018, India.
| |
Collapse
|
3
|
Hecht V, Dong K, Rajesh S, Shpilker P, Wekhande S, Shoresh N. Analyzing histone ChIP-seq data with a bin-based probability of being signal. PLoS Comput Biol 2023; 19:e1011568. [PMID: 37862349 PMCID: PMC10619820 DOI: 10.1371/journal.pcbi.1011568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 11/01/2023] [Accepted: 10/02/2023] [Indexed: 10/22/2023] Open
Abstract
Histone ChIP-seq is one of the primary methods for charting the cellular epigenomic landscape, the components of which play a critical regulatory role in gene expression. Analyzing the activity of regulatory elements across datasets and cell types can be challenging due to shifting peak positions and normalization artifacts resulting from, for example, differing read depths, ChIP efficiencies, and target sizes. Moreover, broad regions of enrichment seen in repressive histone marks often evade detection by commonly used peak callers. Here, we present a simple and versatile method for identifying enriched regions in ChIP-seq data that relies on estimating a gamma distribution fit to non-overlapping 5kB genomic bins to establish a global background. We use this distribution to assign a probability of being signal (PBS) between zero and one to each 5 kB bin. This approach, while lower in resolution than typical peak-calling methods, provides a straightforward way to identify enriched regions and compare enrichments among multiple datasets, by transforming the data to values that are universally normalized and can be readily visualized and integrated with downstream analysis methods. We demonstrate applications of PBS for both broad and narrow histone marks, and provide several illustrations of biological insights which can be gleaned by integrating PBS scores with downstream data types.
Collapse
Affiliation(s)
- Vivian Hecht
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Kevin Dong
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Sreshtaa Rajesh
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Polina Shpilker
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Siddarth Wekhande
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Noam Shoresh
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| |
Collapse
|
4
|
Hawkins-Hooker A, Visonà G, Narendra T, Rojas-Carulla M, Schölkopf B, Schweikert G. Getting personal with epigenetics: towards individual-specific epigenomic imputation with machine learning. Nat Commun 2023; 14:4750. [PMID: 37550323 PMCID: PMC10406842 DOI: 10.1038/s41467-023-40211-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 07/18/2023] [Indexed: 08/09/2023] Open
Abstract
Epigenetic modifications are dynamic mechanisms involved in the regulation of gene expression. Unlike the DNA sequence, epigenetic patterns vary not only between individuals, but also between different cell types within an individual. Environmental factors, somatic mutations and ageing contribute to epigenetic changes that may constitute early hallmarks or causal factors of disease. Epigenetic modifications are reversible and thus promising therapeutic targets for precision medicine. However, mapping efforts to determine an individual's cell-type-specific epigenome are constrained by experimental costs and tissue accessibility. To address these challenges, we developed eDICE, an attention-based deep learning model that is trained to impute missing epigenomic tracks by conditioning on observed tracks. Using a recently published set of epigenomes from four individual donors, we show that transfer learning across individuals allows eDICE to successfully predict individual-specific epigenetic variation even in tissues that are unmapped in a given donor. These results highlight the potential of machine learning-based imputation methods to advance personalized epigenomics.
Collapse
Affiliation(s)
- Alex Hawkins-Hooker
- School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK.
- Empirical Inference Department, Max-Planck Institute for Intelligent Systems, Max-Planck-Ring 4, Tübingen, 72076, Germany.
- Centre for Artificial Intelligence, University College London, London, UK.
| | - Giovanni Visonà
- Empirical Inference Department, Max-Planck Institute for Intelligent Systems, Max-Planck-Ring 4, Tübingen, 72076, Germany
| | - Tanmayee Narendra
- School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK
- Interfaculty Institute for Biomedical Informatics, University of Tübingen, Sand 13, Tübingen, 72076, Germany
| | - Mateo Rojas-Carulla
- Empirical Inference Department, Max-Planck Institute for Intelligent Systems, Max-Planck-Ring 4, Tübingen, 72076, Germany
| | - Bernhard Schölkopf
- Empirical Inference Department, Max-Planck Institute for Intelligent Systems, Max-Planck-Ring 4, Tübingen, 72076, Germany
| | - Gabriele Schweikert
- School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK.
- Interfaculty Institute for Biomedical Informatics, University of Tübingen, Sand 13, Tübingen, 72076, Germany.
| |
Collapse
|
5
|
Satam H, Joshi K, Mangrolia U, Waghoo S, Zaidi G, Rawool S, Thakare RP, Banday S, Mishra AK, Das G, Malonia SK. Next-Generation Sequencing Technology: Current Trends and Advancements. BIOLOGY 2023; 12:997. [PMID: 37508427 PMCID: PMC10376292 DOI: 10.3390/biology12070997] [Citation(s) in RCA: 93] [Impact Index Per Article: 93.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 07/09/2023] [Accepted: 07/11/2023] [Indexed: 07/30/2023]
Abstract
The advent of next-generation sequencing (NGS) has brought about a paradigm shift in genomics research, offering unparalleled capabilities for analyzing DNA and RNA molecules in a high-throughput and cost-effective manner. This transformative technology has swiftly propelled genomics advancements across diverse domains. NGS allows for the rapid sequencing of millions of DNA fragments simultaneously, providing comprehensive insights into genome structure, genetic variations, gene expression profiles, and epigenetic modifications. The versatility of NGS platforms has expanded the scope of genomics research, facilitating studies on rare genetic diseases, cancer genomics, microbiome analysis, infectious diseases, and population genetics. Moreover, NGS has enabled the development of targeted therapies, precision medicine approaches, and improved diagnostic methods. This review provides an insightful overview of the current trends and recent advancements in NGS technology, highlighting its potential impact on diverse areas of genomic research. Moreover, the review delves into the challenges encountered and future directions of NGS technology, including endeavors to enhance the accuracy and sensitivity of sequencing data, the development of novel algorithms for data analysis, and the pursuit of more efficient, scalable, and cost-effective solutions that lie ahead.
Collapse
Affiliation(s)
- Heena Satam
- miBiome Therapeutics, Mumbai 400102, India; (H.S.); (K.J.); (U.M.); (S.W.); (G.Z.); (S.R.)
| | - Kandarp Joshi
- miBiome Therapeutics, Mumbai 400102, India; (H.S.); (K.J.); (U.M.); (S.W.); (G.Z.); (S.R.)
| | - Upasana Mangrolia
- miBiome Therapeutics, Mumbai 400102, India; (H.S.); (K.J.); (U.M.); (S.W.); (G.Z.); (S.R.)
| | - Sanober Waghoo
- miBiome Therapeutics, Mumbai 400102, India; (H.S.); (K.J.); (U.M.); (S.W.); (G.Z.); (S.R.)
| | - Gulnaz Zaidi
- miBiome Therapeutics, Mumbai 400102, India; (H.S.); (K.J.); (U.M.); (S.W.); (G.Z.); (S.R.)
| | - Shravani Rawool
- miBiome Therapeutics, Mumbai 400102, India; (H.S.); (K.J.); (U.M.); (S.W.); (G.Z.); (S.R.)
| | - Ritesh P. Thakare
- Department of Molecular Cell and Cancer Biology, UMass Chan Medical School, Worcester, MA 01605, USA; (R.P.T.); (S.B.); (A.K.M.)
| | - Shahid Banday
- Department of Molecular Cell and Cancer Biology, UMass Chan Medical School, Worcester, MA 01605, USA; (R.P.T.); (S.B.); (A.K.M.)
| | - Alok K. Mishra
- Department of Molecular Cell and Cancer Biology, UMass Chan Medical School, Worcester, MA 01605, USA; (R.P.T.); (S.B.); (A.K.M.)
| | - Gautam Das
- miBiome Therapeutics, Mumbai 400102, India; (H.S.); (K.J.); (U.M.); (S.W.); (G.Z.); (S.R.)
| | - Sunil K. Malonia
- Department of Molecular Cell and Cancer Biology, UMass Chan Medical School, Worcester, MA 01605, USA; (R.P.T.); (S.B.); (A.K.M.)
| |
Collapse
|
6
|
Daunesse M, Legendre R, Varet H, Pain A, Chica C. ePeak: from replicated chromatin profiling data to epigenomic dynamics. NAR Genom Bioinform 2022; 4:lqac041. [PMID: 35664802 PMCID: PMC9154330 DOI: 10.1093/nargab/lqac041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 04/05/2022] [Accepted: 05/05/2022] [Indexed: 11/14/2022] Open
Abstract
We present ePeak, a Snakemake-based pipeline for the identification and quantification of reproducible peaks from raw ChIP-seq, CUT&RUN and CUT&Tag epigenomic profiling techniques. It also includes a statistical module to perform tailored differential marking and binding analysis with state of the art methods. ePeak streamlines critical steps like the quality assessment of the immunoprecipitation, spike-in calibration and the selection of reproducible peaks between replicates for both narrow and broad peaks. It generates complete reports for data quality control assessment and optimal interpretation of the results. We advocate for a differential analysis that accounts for the biological dynamics of each chromatin factor. Thus, ePeak provides linear and nonlinear methods for normalisation as well as conservative and stringent models for variance estimation and significance testing of the observed marking/binding differences. Using a published ChIP-seq dataset, we show that distinct populations of differentially marked/bound peaks can be identified. We study their dynamics in terms of read coverage and summit position, as well as the expression of the neighbouring genes. We propose that ePeak can be used to measure the richness of the epigenomic landscape underlying a biological process by identifying diverse regulatory regimes.
Collapse
Affiliation(s)
- Maëlle Daunesse
- Bioinformatics and Biostatistics Hub, Institut Pasteur, Université de Paris, Paris F-75015, France
| | - Rachel Legendre
- Bioinformatics and Biostatistics Hub, Institut Pasteur, Université de Paris, Paris F-75015, France
| | - Hugo Varet
- Bioinformatics and Biostatistics Hub, Institut Pasteur, Université de Paris, Paris F-75015, France
| | - Adrien Pain
- Bioinformatics and Biostatistics Hub, Institut Pasteur, Université de Paris, Paris F-75015, France
| | - Claudia Chica
- Bioinformatics and Biostatistics Hub, Institut Pasteur, Université de Paris, Paris F-75015, France
| |
Collapse
|
7
|
Taguchi YH, Turki T. Unsupervised tensor decomposition-based method to extract candidate transcription factors as histone modification bookmarks in post-mitotic transcriptional reactivation. PLoS One 2021; 16:e0251032. [PMID: 34032804 PMCID: PMC8148352 DOI: 10.1371/journal.pone.0251032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 04/17/2021] [Indexed: 11/25/2022] Open
Abstract
The histone group added to a gene sequence must be removed during mitosis to halt transcription during the DNA replication stage of the cell cycle. However, the detailed mechanism of this transcription regulation remains unclear. In particular, it is not realistic to reconstruct all appropriate histone modifications throughout the genome from scratch after mitosis. Thus, it is reasonable to assume that there might be a type of “bookmark” that retains the positions of histone modifications, which can be readily restored after mitosis. We developed a novel computational approach comprising tensor decomposition (TD)-based unsupervised feature extraction (FE) to identify transcription factors (TFs) that bind to genes associated with reactivated histone modifications as candidate histone bookmarks. To the best of our knowledge, this is the first application of TD-based unsupervised FE to the cell division context and phases pertaining to the cell cycle in general. The candidate TFs identified with this approach were functionally related to cell division, suggesting the suitability of this method and the potential of the identified TFs as bookmarks for histone modification during mitosis.
Collapse
Affiliation(s)
- Y-h. Taguchi
- Department of Physics, Chuo University, Tokyo, Japan
- * E-mail:
| | - Turki Turki
- Department of Computer Science, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
8
|
Cauceglia JW, Nelson AC, Rubinstein ND, Kukreja S, Sasso LN, Beaufort JA, Rando OJ, Potts WK. Transitions in paternal social status predict patterns of offspring growth and metabolic transcription. Mol Ecol 2020; 29:624-638. [PMID: 31885115 DOI: 10.1111/mec.15346] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 11/27/2019] [Accepted: 12/16/2019] [Indexed: 12/17/2022]
Abstract
One type of parental effect occurs when changes in parental phenotype or environment trigger changes to offspring phenotype. Such nongenetic parental effects can be precisely triggered in response to an environmental cue in time-locked fashion, or in other cases, persist for multiple generations after the cue has been removed, suggesting multiple timescales of action. For parental effects to serve as reliable signals of current environmental conditions, they should be reversible, such that when cues change, offspring phenotypes change in accordance. Social hierarchy is a prevalent feature of the environment, and current parental social status could signal the environment in which offspring will be born. Here, we sought to address parental effects of social status and their timescale of action in mice. We show that territorial competition in seminatural environments affects offspring growth. Although dominant males are not heavier than nondominant or control males, they produce faster growing offspring, particularly sons. The timing, effect-size, and sex-specificity of this association are modulated by maternal social experience. We show that a change in paternal social status is sufficient to modulate offspring weight: from one breeding cycle to the next, status-ascending males produce heavier sons than before, and status-descending males produce lighter sons than before. Current paternal status is also highly predictive of liver transcription in sons, including molecular pathways controlling oxidative phosphorylation and iron metabolism. These results are consistent with a parental effect of social experience, although alternative explanations are considered. In summary, changes in paternal social status are associated with changes in offspring growth and metabolism.
Collapse
Affiliation(s)
- Joseph W Cauceglia
- School of Biological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Adam C Nelson
- Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA, USA
| | | | - Shweta Kukreja
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Lynsey N Sasso
- School of Biological Sciences, University of Utah, Salt Lake City, UT, USA
| | - John A Beaufort
- School of Biological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Oliver J Rando
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Wayne K Potts
- School of Biological Sciences, University of Utah, Salt Lake City, UT, USA
| |
Collapse
|
9
|
Choudhary K, Lai YH, Tran EJ, Aviran S. dStruct: identifying differentially reactive regions from RNA structurome profiling data. Genome Biol 2019; 20:40. [PMID: 30791935 PMCID: PMC6385470 DOI: 10.1186/s13059-019-1641-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Accepted: 01/24/2019] [Indexed: 12/16/2022] Open
Abstract
RNA biology is revolutionized by recent developments of diverse high-throughput technologies for transcriptome-wide profiling of molecular RNA structures. RNA structurome profiling data can be used to identify differentially structured regions between groups of samples. Existing methods are limited in scope to specific technologies and/or do not account for biological variation. Here, we present dStruct which is the first broadly applicable method for differential analysis accounting for biological variation in structurome profiling data. dStruct is compatible with diverse profiling technologies, is validated with experimental data and simulations, and outperforms existing methods.
Collapse
Affiliation(s)
- Krishna Choudhary
- Department of Biomedical Engineering and Genome Center, University of California, Davis, One Shields Avenue, Davis, 95616 CA USA
| | - Yu-Hsuan Lai
- Department of Biochemistry, Purdue University, BCHM 305, 175 S. University Street, West Lafayette, 47907-2063 IN USA
| | - Elizabeth J. Tran
- Department of Biochemistry, Purdue University, BCHM 305, 175 S. University Street, West Lafayette, 47907-2063 IN USA
- Purdue University Center for Cancer Research, Purdue University, Hansen Life Sciences Research Building, Room 141, 201 S. University Street, West Lafayette, 47907-2064 IN USA
| | - Sharon Aviran
- Department of Biomedical Engineering and Genome Center, University of California, Davis, One Shields Avenue, Davis, 95616 CA USA
| |
Collapse
|
10
|
Cremona MA, Xu H, Makova KD, Reimherr M, Chiaromonte F, Madrigal P. Functional data analysis for computational biology. Bioinformatics 2019; 35:3211-3213. [PMID: 30668667 DOI: 10.1093/bioinformatics/btz045] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2018] [Revised: 01/01/2019] [Accepted: 01/17/2019] [Indexed: 12/25/2022] Open
Abstract
SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marzia A Cremona
- Department of Statistics, The Pennsylvania State University, University Park, PA, USA
| | - Hongyan Xu
- Department of Population Health Sciences, Medical College of Georgia, Augusta University, Augusta, GA, USA
| | - Kateryna D Makova
- Department of Biology, The Pennsylvania State University, University Park, PA, USA.,Center for Medical Genomics, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Matthew Reimherr
- Department of Statistics, The Pennsylvania State University, University Park, PA, USA
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, PA, USA.,Institute of Economics, Sant'Anna School of Advanced Studies, EMbeDS Economics and Management in the era of Data Science, Pisa, Italy
| | - Pedro Madrigal
- Wellcome Trust - MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK.,Department of Haematology, University of Cambridge, Cambridge, UK
| |
Collapse
|
11
|
PREDICTD PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition. Nat Commun 2018; 9:1402. [PMID: 29643364 PMCID: PMC5895786 DOI: 10.1038/s41467-018-03635-9] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Accepted: 03/02/2018] [Indexed: 11/24/2022] Open
Abstract
The Encyclopedia of DNA Elements (ENCODE) and the Roadmap Epigenomics Project seek to characterize the epigenome in diverse cell types using assays that identify, for example, genomic regions with modified histones or accessible chromatin. These efforts have produced thousands of datasets but cannot possibly measure each epigenomic factor in all cell types. To address this, we present a method, PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition (PREDICTD), to computationally impute missing experiments. PREDICTD leverages an elegant model called “tensor decomposition” to impute many experiments simultaneously. Compared with the current state-of-the-art method, ChromImpute, PREDICTD produces lower overall mean squared error, and combining the two methods yields further improvement. We show that PREDICTD data captures enhancer activity at noncoding human accelerated regions. PREDICTD provides reference imputed data and open-source software for investigating new cell types, and demonstrates the utility of tensor decomposition and cloud computing, both promising technologies for bioinformatics. Assays to characterize the epigenome and interrogate chromatin state genome wide have so far been performed in a selected set of conditions. Here, Durham et al. develop a computational method based on tensor decomposition to impute missing experiments in collections of epigenomics experiments.
Collapse
|
12
|
Cartier J, Smith T, Thomson JP, Rose CM, Khulan B, Heger A, Meehan RR, Drake AJ. Investigation into the role of the germline epigenome in the transmission of glucocorticoid-programmed effects across generations. Genome Biol 2018; 19:50. [PMID: 29636086 PMCID: PMC5891941 DOI: 10.1186/s13059-018-1422-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 03/16/2018] [Indexed: 12/13/2022] Open
Abstract
Background Early life exposure to adverse environments affects cardiovascular and metabolic systems in the offspring. These programmed effects are transmissible to a second generation through both male and female lines, suggesting germline transmission. We have previously shown that prenatal overexposure to the synthetic glucocorticoid dexamethasone (Dex) in rats reduces birth weight in the first generation (F1), a phenotype which is transmitted to a second generation (F2), particularly through the male line. We hypothesize that Dex exposure affects developing germ cells, resulting in transmissible alterations in DNA methylation, histone marks and/or small RNA in the male germline. Results We profile epigenetic marks in sperm from F1 Sprague Dawley rats expressing a germ cell-specific GFP transgene following Dex or vehicle treatment of the mothers, using methylated DNA immunoprecipitation sequencing, small RNA sequencing and chromatin immunoprecipitation sequencing for H3K4me3, H3K4me1, H3K27me3 and H3K9me3. Although effects on birth weight are transmitted to the F2 generation through the male line, no differences in DNA methylation, histone modifications or small RNA were detected between germ cells and sperm from Dex-exposed animals and controls. Conclusions Although the phenotype is transmitted to a second generation, we are unable to detect specific changes in DNA methylation, common histone modifications or small RNA profiles in sperm. Dex exposure is associated with more variable 5mC levels, particularly at non-promoter loci. Although this could be one mechanism contributing to the observed phenotype, other germline epigenetic modifications or non-epigenetic mechanisms may be responsible for the transmission of programmed effects across generations in this model. Electronic supplementary material The online version of this article (10.1186/s13059-018-1422-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jessy Cartier
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, The Queen's Medical Research Institute, 47 Little France Crescent, Edinburgh, EH16 4TJ, UK
| | - Thomas Smith
- MRC Computational Genomics Analysis and Training Programme, University of Oxford, MRC WIMM Centre for Computational Biology, The Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, Headley Way, Oxford, OX3 9DS, UK
| | - John P Thomson
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Catherine M Rose
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, The Queen's Medical Research Institute, 47 Little France Crescent, Edinburgh, EH16 4TJ, UK
| | - Batbayar Khulan
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, The Queen's Medical Research Institute, 47 Little France Crescent, Edinburgh, EH16 4TJ, UK
| | - Andreas Heger
- MRC Computational Genomics Analysis and Training Programme, University of Oxford, MRC WIMM Centre for Computational Biology, The Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, Headley Way, Oxford, OX3 9DS, UK
| | - Richard R Meehan
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Amanda J Drake
- University/British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, The Queen's Medical Research Institute, 47 Little France Crescent, Edinburgh, EH16 4TJ, UK.
| |
Collapse
|
13
|
Stricker G, Engelhardt A, Schulz D, Schmid M, Tresch A, Gagneur J. GenoGAM: genome-wide generalized additive models for ChIP-Seq analysis. Bioinformatics 2018; 33:2258-2265. [PMID: 28369277 DOI: 10.1093/bioinformatics/btx150] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 03/20/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) is a widely used approach to study protein-DNA interactions. Often, the quantities of interest are the differential occupancies relative to controls, between genetic backgrounds, treatments, or combinations thereof. Current methods for differential occupancy of ChIP-Seq data rely however on binning or sliding window techniques, for which the choice of the window and bin sizes are subjective. Results Here, we present GenoGAM (Genome-wide Generalized Additive Model), which brings the well-established and flexible generalized additive models framework to genomic applications using a data parallelism strategy. We model ChIP-Seq read count frequencies as products of smooth functions along chromosomes. Smoothing parameters are objectively estimated from the data by cross-validation, eliminating ad hoc binning and windowing needed by current approaches. GenoGAM provides base-level and region-level significance testing for full factorial designs. Application to a ChIP-Seq dataset in yeast showed increased sensitivity over existing differential occupancy methods while controlling for type I error rate. By analyzing a set of DNA methylation data and illustrating an extension to a peak caller, we further demonstrate the potential of GenoGAM as a generic statistical modeling tool for genome-wide assays. Availability and Implementation Software is available from Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/GenoGAM.html . Contact gagneur@in.tum.de. Supplementary information Supplementary information is available at Bioinformatics online.
Collapse
Affiliation(s)
- Georg Stricker
- Gene Center and Department of Biochemistry, Ludwig-Maximilians-Universität München, 80333 Munich, Germany.,Department of Informatics, Technische Universität München, 85748 Garching, Germany
| | - Alexander Engelhardt
- Gene Center and Department of Biochemistry, Ludwig-Maximilians-Universität München, 80333 Munich, Germany
| | - Daniel Schulz
- Gene Center and Department of Biochemistry, Ludwig-Maximilians-Universität München, 80333 Munich, Germany
| | - Matthias Schmid
- Institut für Medizinische Biometrie, Informatik und Epidemiologie, University Hospital Bonn, 53105 Bonn, Germany
| | - Achim Tresch
- Institute for Genetics, University of Cologne, 50647 Cologne, Germany
| | - Julien Gagneur
- Gene Center and Department of Biochemistry, Ludwig-Maximilians-Universität München, 80333 Munich, Germany.,Department of Informatics, Technische Universität München, 85748 Garching, Germany
| |
Collapse
|
14
|
Lee Y, Park D, Iyer VR. The ATP-dependent chromatin remodeler Chd1 is recruited by transcription elongation factors and maintains H3K4me3/H3K36me3 domains at actively transcribed and spliced genes. Nucleic Acids Res 2017; 45:7180-7190. [PMID: 28460001 PMCID: PMC5499586 DOI: 10.1093/nar/gkx321] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2016] [Revised: 04/09/2017] [Accepted: 04/14/2017] [Indexed: 12/20/2022] Open
Abstract
Chd1 (Chromodomain Helicase DNA Binding Protein 1) is a conserved ATP-dependent chromatin remodeler that maintains the nucleosomal structure of chromatin, but the determinants of its specificity and its impact on gene expression are not well defined. To identify the determinants of Chd1 binding specificity in the yeast genome, we investigated Chd1 occupancy in mutants of several candidate factors. We found that several components of the PAF1 transcription elongation complex contribute to Chd1 recruitment to highly transcribed genes and identified Spt4 as a factor that appears to negatively modulate Chd1 binding to chromatin. We discovered that CHD1 loss alters H3K4me3 and H3K36me3 patterns throughout the yeast genome. Interestingly, the aberrant histone H3 methylation patterns were predominantly observed within 1 kb from the transcription start site, where both histone H3 methylation marks co-occur. A reciprocal change between the two marks was obvious in the absence of Chd1, suggesting a role for CHD1 in establishing or maintaining the boundaries of these largely mutually exclusive histone marks. Strikingly, intron-containing genes were most susceptible to CHD1 loss and exhibited a high degree of histone H3 methylation changes. Intron retention was significantly lower in the absence of CHD1, suggesting that CHD1 function as a chromatin remodeler could indirectly affect RNA splicing.
Collapse
Affiliation(s)
- Yaelim Lee
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Daechan Park
- Center for Theragnosis, Biomedical Research Institute, Korea Institute of Science and Technology (KIST), Seoul 02792, Republic of Korea
| | - Vishwanath R. Iyer
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
15
|
Lukauskas S, Visintainer R, Sanguinetti G, Schweikert GB. DGW: an exploratory data analysis tool for clustering and visualisation of epigenomic marks. BMC Bioinformatics 2016; 17:447. [PMID: 28105912 PMCID: PMC5249015 DOI: 10.1186/s12859-016-1306-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background Functional genomic and epigenomic research relies fundamentally on sequencing based methods like ChIP-seq for the detection of DNA-protein interactions. These techniques return large, high dimensional data sets with visually complex structures, such as multi-modal peaks extended over large genomic regions. Current tools for visualisation and data exploration represent and leverage these complex features only to a limited extent. Results We present DGW, an open source software package for simultaneous alignment and clustering of multiple epigenomic marks. DGW uses Dynamic Time Warping to adaptively rescale and align genomic distances which allows to group regions of interest with similar shapes, thereby capturing the structure of epigenomic marks. We demonstrate the effectiveness of the approach in a simulation study and on a real epigenomic data set from the ENCODE project. Conclusions Our results show that DGW automatically recognises and aligns important genomic features such as transcription start sites and splicing sites from histone marks. DGW is available as an open source Python package.
Collapse
Affiliation(s)
- Saulius Lukauskas
- Department of Chemical Engineering, Imperial College London, London, SW7 2AZ, UK.
| | | | - Guido Sanguinetti
- School of Informatics, University of Edinburgh, 10 Crichton St, Edinburgh, EH8 9AB, Scotland
| | - Gabriele B Schweikert
- School of Informatics, University of Edinburgh, 10 Crichton St, Edinburgh, EH8 9AB, Scotland
| |
Collapse
|
16
|
Qin Z, Li B, Conneely KN, Wu H, Hu M, Ayyala D, Park Y, Jin VX, Zhang F, Zhang H, Li L, Lin S. Statistical challenges in analyzing methylation and long-range chromosomal interaction data. STATISTICS IN BIOSCIENCES 2016; 8:284-309. [PMID: 28008337 PMCID: PMC5167536 DOI: 10.1007/s12561-016-9145-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Revised: 02/22/2016] [Accepted: 02/22/2016] [Indexed: 12/21/2022]
Abstract
With the rapid development of high throughput technologies such as array and next generation sequencing (NGS), genome-wide, nucleotide-resolution epigenomic data are increasingly available. In recent years, there has been particular interest in data on DNA methylation and 3-dimensional (3D) chromosomal organization, which are believed to hold keys to understand biological mechanisms, such as transcription regulation, that are closely linked to human health and diseases. However, small sample size, complicated correlation structure, substantial noise, biases, and uncertainties, all present difficulties for performing statistical inference. In this review, we present an overview of the new technologies that are frequently utilized in studying DNA methylation and 3D chromosomal organization. We focus on reviewing recent developments in statistical methodologies designed for better interrogating epigenomic data, pointing out statistical challenges facing the field whenever appropriate.
Collapse
Affiliation(s)
- Zhaohui Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Ben Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Karen N Conneely
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Ming Hu
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016, USA
| | - Deepak Ayyala
- Department of Statistics, The Ohio State University, Columbus, OH 43210, USA
| | - Yongseok Park
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261 USA
| | - Victor X Jin
- Department of Molecular Medicine, The University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Fangyuan Zhang
- Department of Mathematics & Statistics, Texas Tech University, Lubbock, TX 79409, USA
| | - Han Zhang
- Department of Statistics, The Ohio State University, Columbus, OH 43210, USA
| | - Li Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
17
|
Vegas E, Oller JM, Reverter F. Inferring differentially expressed pathways using kernel maximum mean discrepancy-based test. BMC Bioinformatics 2016; 17 Suppl 5:205. [PMID: 27294256 PMCID: PMC4905616 DOI: 10.1186/s12859-016-1046-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Background Pathway expression is multivariate in nature. Thus, from a statistical perspective, to detect differentially expressed pathways between two conditions, methods for inferring differences between mean vectors need to be applied. Maximum mean discrepancy (MMD) is a statistical test to determine whether two samples are from the same distribution, its implementation being greatly simplified using the kernel method. Results An MMD-based test successfully detected the differential expression between two conditions, specifically the expression of a set of genes involved in certain fatty acid metabolic pathways. Furthermore, we exploited the ability of the kernel method to integrate data and successfully added hepatic fatty acid levels to the test procedure. Conclusion MMD is a non-parametric test that acquires several advantages when combined with the kernelization of data: 1) the number of variables can be greater than the sample size; 2) omics data can be integrated; 3) it can be applied not only to vectors, but to strings, sequences and other common structured data types arising in molecular biology.
Collapse
Affiliation(s)
- Esteban Vegas
- Department of Statistics, University of Barcelona, Diagonal, 643, Barcelona, 08028, Spain.
| | - Josep M Oller
- Department of Statistics, University of Barcelona, Diagonal, 643, Barcelona, 08028, Spain
| | - Ferran Reverter
- Department of Statistics, University of Barcelona, Diagonal, 643, Barcelona, 08028, Spain.,Center of Genomic Regulation, Parc de Recerca Biomedica de Barcelona, Dr. Aiguader, 88, Barcelona, 08003, Spain
| |
Collapse
|
18
|
Steinhauser S, Kurzawa N, Eils R, Herrmann C. A comprehensive comparison of tools for differential ChIP-seq analysis. Brief Bioinform 2016; 17:953-966. [PMID: 26764273 PMCID: PMC5142015 DOI: 10.1093/bib/bbv110] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Revised: 11/21/2015] [Indexed: 11/13/2022] Open
Abstract
ChIP-seq has become a widely adopted genomic assay in recent years to determine binding sites for transcription factors or enrichments for specific histone modifications. Beside detection of enriched or bound regions, an important question is to determine differences between conditions. While this is a common analysis for gene expression, for which a large number of computational approaches have been validated, the same question for ChIP-seq is particularly challenging owing to the complexity of ChIP-seq data in terms of noisiness and variability. Many different tools have been developed and published in recent years. However, a comprehensive comparison and review of these tools is still missing. Here, we have reviewed 14 tools, which have been developed to determine differential enrichment between two conditions. They differ in their algorithmic setups, and also in the range of applicability. Hence, we have benchmarked these tools on real data sets for transcription factors and histone modifications, as well as on simulated data sets to quantitatively evaluate their performance. Overall, there is a great variety in the type of signal detected by these tools with a surprisingly low level of agreement. Depending on the type of analysis performed, the choice of method will crucially impact the outcome.
Collapse
|
19
|
Cremona MA, Sangalli LM, Vantini S, Dellino GI, Pelicci PG, Secchi P, Riva L. Peak shape clustering reveals biological insights. BMC Bioinformatics 2015; 16:349. [PMID: 26511446 PMCID: PMC4625869 DOI: 10.1186/s12859-015-0787-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Accepted: 10/20/2015] [Indexed: 11/29/2022] Open
Abstract
Background ChIP-seq experiments are widely used to detect and study DNA-protein interactions, such as transcription factor binding and chromatin modifications. However, downstream analysis of ChIP-seq data is currently restricted to the evaluation of signal intensity and the detection of enriched regions (peaks) in the genome. Other features of peak shape are almost always neglected, despite the remarkable differences shown by ChIP-seq for different proteins, as well as by distinct regions in a single experiment. Results We hypothesize that statistically significant differences in peak shape might have a functional role and a biological meaning. Thus, we design five indices able to summarize peak shapes and we employ multivariate clustering techniques to divide peaks into groups according to both their complexity and the intensity of their coverage function. In addition, our novel analysis pipeline employs a range of statistical and bioinformatics techniques to relate the obtained peak shapes to several independent genomic datasets, including other genome-wide protein-DNA maps and gene expression experiments. To clarify the meaning of peak shape, we apply our methodology to the study of the erythroid transcription factor GATA-1 in K562 cell line and in megakaryocytes. Conclusions Our study demonstrates that ChIP-seq profiles include information regarding the binding of other proteins beside the one used for precipitation. In particular, peak shape provides new insights into cooperative transcriptional regulation and is correlated to gene expression. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0787-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Marzia A Cremona
- MOX - Dipartimento di Matematica, Politecnico di Milano, Milan, Italy.
| | - Laura M Sangalli
- MOX - Dipartimento di Matematica, Politecnico di Milano, Milan, Italy.
| | - Simone Vantini
- MOX - Dipartimento di Matematica, Politecnico di Milano, Milan, Italy.
| | - Gaetano I Dellino
- Department of Experimental Oncology, European Institute of Oncology, Milan, Italy. .,Dipartimento di Scienze della salute, Università degli Studi di Milano, Milan, Italy.
| | - Pier Giuseppe Pelicci
- Department of Experimental Oncology, European Institute of Oncology, Milan, Italy. .,Dipartimento di Scienze della salute, Università degli Studi di Milano, Milan, Italy.
| | - Piercesare Secchi
- MOX - Dipartimento di Matematica, Politecnico di Milano, Milan, Italy.
| | - Laura Riva
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, Milan, Italy.
| |
Collapse
|
20
|
Lun ATL, Smyth GK. From reads to regions: a Bioconductor workflow to detect differential binding in ChIP-seq data. F1000Res 2015; 4:1080. [PMID: 26834993 DOI: 10.12688/f1000research.7016.1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/07/2015] [Indexed: 01/17/2023] Open
Abstract
Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) is widely used to identify the genomic binding sites for protein of interest. Most conventional approaches to ChIP-seq data analysis involve the detection of the absolute presence (or absence) of a binding site. However, an alternative strategy is to identify changes in the binding intensity between two biological conditions, i.e., differential binding (DB). This may yield more relevant results than conventional analyses, as changes in binding can be associated with the biological difference being investigated. The aim of this article is to facilitate the implementation of DB analyses, by comprehensively describing a computational workflow for the detection of DB regions from ChIP-seq data. The workflow is based primarily on R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, from alignment of read sequences to interpretation and visualization of putative DB regions. In particular, detection of DB regions will be conducted using the counts for sliding windows from the csaw package, with statistical modelling performed using methods in the edgeR package. Analyses will be demonstrated on real histone mark and transcription factor data sets. This will provide readers with practical usage examples that can be applied in their own studies.
Collapse
Affiliation(s)
- Aaron T L Lun
- The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia; Department of Medical Biology, The University of Melbourne, Melbourne, Australia
| | - Gordon K Smyth
- The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia; Department of Mathematics and Statistics, The University of Melbourne, Melbourne, Australia
| |
Collapse
|
21
|
Lun ATL, Smyth GK. From reads to regions: a Bioconductor workflow to detect differential binding in ChIP-seq data. F1000Res 2015; 4:1080. [PMID: 26834993 PMCID: PMC4706055 DOI: 10.12688/f1000research.7016.2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/06/2016] [Indexed: 12/19/2022] Open
Abstract
Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) is widely used to identify the genomic binding sites for protein of interest. Most conventional approaches to ChIP-seq data analysis involve the detection of the absolute presence (or absence) of a binding site. However, an alternative strategy is to identify changes in the binding intensity between two biological conditions, i.e., differential binding (DB). This may yield more relevant results than conventional analyses, as changes in binding can be associated with the biological difference being investigated. The aim of this article is to facilitate the implementation of DB analyses, by comprehensively describing a computational workflow for the detection of DB regions from ChIP-seq data. The workflow is based primarily on R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, from alignment of read sequences to interpretation and visualization of putative DB regions. In particular, detection of DB regions will be conducted using the counts for sliding windows from the csaw package, with statistical modelling performed using methods in the edgeR package. Analyses will be demonstrated on real histone mark and transcription factor data sets. This will provide readers with practical usage examples that can be applied in their own studies.
Collapse
Affiliation(s)
- Aaron T L Lun
- The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia; Department of Medical Biology, The University of Melbourne, Melbourne, Australia
| | - Gordon K Smyth
- The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia; Department of Mathematics and Statistics, The University of Melbourne, Melbourne, Australia
| |
Collapse
|
22
|
Quantitative genomic analysis of RecA protein binding during DNA double-strand break repair reveals RecBCD action in vivo. Proc Natl Acad Sci U S A 2015; 112:E4735-42. [PMID: 26261330 PMCID: PMC4553759 DOI: 10.1073/pnas.1424269112] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Understanding molecular mechanisms in the context of living cells requires the development of new methods of in vivo biochemical analysis to complement established in vitro biochemistry. A critically important molecular mechanism is genetic recombination, required for the beneficial reassortment of genetic information and for DNA double-strand break repair (DSBR). Central to recombination is the RecA (Rad51) protein that assembles into a spiral filament on DNA and mediates genetic exchange. Here we have developed a method that combines chromatin immunoprecipitation with next-generation sequencing (ChIP-Seq) and mathematical modeling to quantify RecA protein binding during the active repair of a single DSB in the chromosome of Escherichia coli. We have used quantitative genomic analysis to infer the key in vivo molecular parameters governing RecA loading by the helicase/nuclease RecBCD at recombination hot-spots, known as Chi. Our genomic analysis has also revealed that DSBR at the lacZ locus causes a second RecBCD-mediated DSBR event to occur in the terminus region of the chromosome, over 1 Mb away.
Collapse
|
23
|
Madrigal P, Krajewski P. Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve transform. BioData Min 2015; 8:20. [PMID: 26140054 PMCID: PMC4488123 DOI: 10.1186/s13040-015-0051-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Accepted: 06/17/2015] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Larger variation exists in epigenomes than in genomes, as a single genome shapes the identity of multiple cell types. With the advent of next-generation sequencing, one of the key problems in computational epigenomics is the poor understanding of correlations and quantitative differences between large scale data sets. RESULTS Here we bring to genomics a scenario of functional principal component analysis, a finite Karhunen-Loève transform, and explicitly decompose the variation in the coverage profiles of 27 chromatin mark ChIP-seq datasets at transcription start sites for H1, one of the most used human embryonic stem cell lines. Using this approach we identify positive correlations between H3K4me3 and H3K36me3, as well as between H3K9ac and H3K36me3, so far undetected by the most commonly used Pearson correlation between read enrichment coverages. We uncover highly negative correlations between H2A.Z, H3K4me3, and several histone acetylation marks, but these occur only between principal components of first and second order. We also demonstrate that levels of gene expression correlate significantly with scores of components of order higher than one, demonstrating that transcriptional regulation by histone marks escapes simple one-to-one relationships. This correlations were higher in significance and magnitude in protein coding genes than in non-coding RNAs. CONCLUSIONS In summary, we present a methodology to explore and uncover novel patterns of epigenomic variability and covariability in genomic data sets by using a functional eigenvalue decomposition of genomic data. R code is available at: http://github.com/pmb59/KLTepigenome.
Collapse
Affiliation(s)
- Pedro Madrigal
- Department of Biometry and Bioinformatics, Institute of Plant Genetics of the Polish Academy of Sciences, Strzeszyńska 34, Poznań, 60-479 Poland ; Present address: Wellcome Trust-MRC Cambridge Stem Cell Institute, Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery, University of Cambridge, West Forvie Building, Forvie Site, Robinson Way, Cambridge, CB2 0SZ UK ; Present address: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA UK
| | - Paweł Krajewski
- Department of Biometry and Bioinformatics, Institute of Plant Genetics of the Polish Academy of Sciences, Strzeszyńska 34, Poznań, 60-479 Poland
| |
Collapse
|
24
|
Mayo TR, Schweikert G, Sanguinetti G. M3D: a kernel-based test for spatially correlated changes in methylation profiles. ACTA ACUST UNITED AC 2014; 31:809-16. [PMID: 25398611 PMCID: PMC4380032 DOI: 10.1093/bioinformatics/btu749] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Motivation: DNA methylation is an intensely studied epigenetic mark implicated in many biological processes of direct clinical relevance. Although sequencing-based technologies are increasingly allowing high-resolution measurements of DNA methylation, statistical modelling of such data is still challenging. In particular, statistical identification of differentially methylated regions across different conditions poses unresolved challenges in accounting for spatial correlations within the statistical testing procedure. Results: We propose a non-parametric, kernel-based method, M3D, to detect higher order changes in methylation profiles, such as shape, across pre-defined regions. The test statistic explicitly accounts for differences in coverage levels between samples, thus handling in a principled way a major confounder in the analysis of methylation data. Empirical tests on real and simulated datasets show an increased power compared to established methods, as well as considerable robustness with respect to coverage and replication levels. Availability and implementation: R/Bioconductor package M3D. Contact:G.Sanguinetti@ed.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tom R Mayo
- IANC, School of Informatics, University of Edinburgh, Edinburgh EH8 9AB and Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3JR, UK
| | - Gabriele Schweikert
- IANC, School of Informatics, University of Edinburgh, Edinburgh EH8 9AB and Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3JR, UK IANC, School of Informatics, University of Edinburgh, Edinburgh EH8 9AB and Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3JR, UK
| | - Guido Sanguinetti
- IANC, School of Informatics, University of Edinburgh, Edinburgh EH8 9AB and Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3JR, UK
| |
Collapse
|
25
|
Transcription factor binding predicts histone modifications in human cell lines. Proc Natl Acad Sci U S A 2014; 111:13367-72. [PMID: 25187560 DOI: 10.1073/pnas.1412081111] [Citation(s) in RCA: 84] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Gene expression in higher organisms is thought to be regulated by a complex network of transcription factor binding and chromatin modifications, yet the relative importance of these two factors remains a matter of debate. Here, we show that a computational approach allows surprisingly accurate prediction of histone modifications solely from knowledge of transcription factor binding both at promoters and at potential distal regulatory elements. This accuracy significantly and substantially exceeds what could be achieved by using DNA sequence as an input feature. Remarkably, we show that transcription factor binding enables strikingly accurate predictions across different cell lines. Analysis of the relative importance of specific transcription factors as predictors of specific histone marks recapitulated known interactions between transcription factors and histone modifiers. Our results demonstrate that reported associations between histone marks and gene expression may be indirect effects caused by interactions between transcription factors and histone-modifying complexes.
Collapse
|