2
|
Correcting nucleotide-specific biases in high-throughput sequencing data. BMC Bioinformatics 2017; 18:357. [PMID: 28764645 PMCID: PMC5540620 DOI: 10.1186/s12859-017-1766-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2017] [Accepted: 07/19/2017] [Indexed: 01/07/2023] Open
Abstract
Background High-throughput sequence (HTS) data exhibit position-specific nucleotide biases that obscure the intended signal and reduce the effectiveness of these data for downstream analyses. These biases are particularly evident in HTS assays for identifying regulatory regions in DNA (DNase-seq, ChIP-seq, FAIRE-seq, ATAC-seq). Biases may result from many experiment-specific factors, including selectivity of DNA restriction enzymes and fragmentation method, as well as sequencing technology-specific factors, such as choice of adapters/primers and sample amplification methods. Results We present a novel method to detect and correct position-specific nucleotide biases in HTS short read data. Our method calculates read-specific weights based on aligned reads to correct the over- or underrepresentation of position-specific nucleotide subsequences, both within and adjacent to the aligned read, relative to a baseline calculated in assay-specific enriched regions. Using HTS data from a variety of ChIP-seq, DNase-seq, FAIRE-seq, and ATAC-seq experiments, we show that our weight-adjusted reads reduce the position-specific nucleotide imbalance across reads and improve the utility of these data for downstream analyses, including identification and characterization of open chromatin peaks and transcription-factor binding sites. Conclusions A general-purpose method to characterize and correct position-specific nucleotide sequence biases fills the need to recognize and deal with, in a systematic manner, binding-site preference for the growing number of HTS-based epigenetic assays. As the breadth and impact of these biases are better understood, the availability of a standard toolkit to correct them will be important. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1766-x) contains supplementary material, which is available to authorized users.
Collapse
|
3
|
Dhingra P, Martinez-Fundichely A, Berger A, Huang FW, Forbes AN, Liu EM, Liu D, Sboner A, Tamayo P, Rickman DS, Rubin MA, Khurana E. Identification of novel prostate cancer drivers using RegNetDriver: a framework for integration of genetic and epigenetic alterations with tissue-specific regulatory network. Genome Biol 2017; 18:141. [PMID: 28750683 PMCID: PMC5530464 DOI: 10.1186/s13059-017-1266-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2016] [Accepted: 06/27/2017] [Indexed: 11/22/2022] Open
Abstract
We report a novel computational method, RegNetDriver, to identify tumorigenic drivers using the combined effects of coding and non-coding single nucleotide variants, structural variants, and DNA methylation changes in the DNase I hypersensitivity based regulatory network. Integration of multi-omics data from 521 prostate tumor samples indicated a stronger regulatory impact of structural variants, as they affect more transcription factor hubs in the tissue-specific network. Moreover, crosstalk between transcription factor hub expression modulated by structural variants and methylation levels likely leads to the differential expression of target genes. We report known prostate tumor regulatory drivers and nominate novel transcription factors (ERF, CREB3L1, and POU2F2), which are supported by functional validation.
Collapse
Affiliation(s)
- Priyanka Dhingra
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, 10065, USA
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, 10021, USA
| | - Alexander Martinez-Fundichely
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, 10065, USA
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, 10021, USA
| | - Adeline Berger
- Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, 10065, USA
| | - Franklin W Huang
- Department of Medical Oncology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02215, USA
- Department of Medicine, Harvard Medical School, 25 Shattuck Street, Boston, MA, 02115, USA
- Cancer Program, The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, MA, 02142, USA
| | - Andre Neil Forbes
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, 10065, USA
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, 10021, USA
| | - Eric Minwei Liu
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, 10065, USA
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, 10021, USA
| | - Deli Liu
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, 10065, USA
- Department of Urology, Weill Cornell Medical College, New York, New York, 10065, USA
| | - Andrea Sboner
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, 10021, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, 10065, USA
- Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital-Weill Cornell Medicine, New York, NY, 10065, USA
| | - Pablo Tamayo
- Cancer Program, The Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, MA, 02142, USA
- Department of Medicine, University of California San Diego, La Jolla, California, USA
- Moores Cancer Center, University of California San Diego, La Jolla, California, USA
| | - David S Rickman
- Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, 10065, USA.
- Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital-Weill Cornell Medicine, New York, NY, 10065, USA.
- Meyer Cancer Center, Weill Cornell Medical College, New York, New York, 10065, USA.
| | - Mark A Rubin
- Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York, 10065, USA
- Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital-Weill Cornell Medicine, New York, NY, 10065, USA
- Meyer Cancer Center, Weill Cornell Medical College, New York, New York, 10065, USA
| | - Ekta Khurana
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, 10065, USA.
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, 10021, USA.
- Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital-Weill Cornell Medicine, New York, NY, 10065, USA.
- Meyer Cancer Center, Weill Cornell Medical College, New York, New York, 10065, USA.
| |
Collapse
|
4
|
Fontaine F, Overman J, François M. Pharmacological manipulation of transcription factor protein-protein interactions: opportunities and obstacles. CELL REGENERATION (LONDON, ENGLAND) 2015; 4:2. [PMID: 25848531 PMCID: PMC4365538 DOI: 10.1186/s13619-015-0015-x] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2014] [Accepted: 02/10/2015] [Indexed: 12/19/2022]
Abstract
Much research on transcription factor biology and their genetic pathways has been undertaken over the last 30 years, especially in the field of developmental biology and cancer. Yet, very little is known about the molecular modalities of highly dynamic interactions between transcription factors, genomic DNA, and protein partners. Methodological breakthroughs such as RNA-seq (RNA-sequencing), ChIP-seq (chromatin immunoprecipitation sequencing), RIME (rapid immunoprecipitation mass spectrometry of endogenous proteins), and single-molecule imaging will dramatically accelerate the discovery rate of their molecular mode of action in the next few years. From a pharmacological viewpoint, conventional methods used to target transcription factor activity with molecules mimicking endogenous ligands fail to achieve high specificity and are limited by a lack of identification of new molecular targets. Protein-protein interactions are likely to represent one of the next major classes of therapeutic targets. Transcription factors, known to act mostly via protein-protein interaction, may well be at the forefront of this type of drug development. One hurdle in this field remains the difficulty to collate structural data into meaningful information for rational drug design. Another hurdle is the lack of chemical libraries meeting the structural requirements of protein-protein interaction disruption. As more attempts at modulating transcription factor activity are undertaken, valuable knowledge will be accumulated on the modality of action required to modulate transcription and how these findings can be applied to developing transcription factor drugs. Key discoveries will spawn into new therapeutic approaches not only as anticancer targets but also for other indications, such as those with an inflammatory component including neurodegenerative disorders, diabetes, and chronic liver and kidney diseases.
Collapse
Affiliation(s)
- Frank Fontaine
- Division of Genomics of Development and Diseases, Institute for Molecular Bioscience, The University of Queensland, 306 Carmody Road, St Lucia, QLD 4072 Australia
| | - Jeroen Overman
- Division of Genomics of Development and Diseases, Institute for Molecular Bioscience, The University of Queensland, 306 Carmody Road, St Lucia, QLD 4072 Australia
| | - Mathias François
- Division of Genomics of Development and Diseases, Institute for Molecular Bioscience, The University of Queensland, 306 Carmody Road, St Lucia, QLD 4072 Australia
| |
Collapse
|