Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hoffman MM, Buske OJ, Noble WS. The Genomedata format for storing large-scale functional genomics data. Bioinformatics 2010;26:1458-9. [PMID: 20435580 PMCID: PMC2872006 DOI: 10.1093/bioinformatics/btq164] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open

For:	Hoffman MM, Buske OJ, Noble WS. The Genomedata format for storing large-scale functional genomics data. Bioinformatics 2010;26:1458-9. [PMID: 20435580 PMCID: PMC2872006 DOI: 10.1093/bioinformatics/btq164] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open

Number

Cited by Other Article(s)

Viner C, Ishak CA, Johnson J, Walker NJ, Shi H, Sjöberg-Herrera MK, Shen SY, Lardo SM, Adams DJ, Ferguson-Smith AC, De Carvalho DD, Hainer SJ, Bailey TL, Hoffman MM. Modeling methyl-sensitive transcription factor motifs with an expanded epigenetic alphabet. Genome Biol 2024;25:11. [PMID: 38191487 PMCID: PMC10773111 DOI: 10.1186/s13059-023-03070-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 09/21/2023] [Indexed: 01/10/2024] Open

Daneshpajouh H, Chen B, Shokraneh N, Masoumi S, Wiese KC, Libbrecht MW. Continuous chromatin state feature annotation of the human epigenome. Bioinformatics 2022;38:3029-3036. [PMID: 35451453 PMCID: PMC9154241 DOI: 10.1093/bioinformatics/btac283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 02/18/2022] [Accepted: 04/18/2022] [Indexed: 12/02/2022] Open

Abstract

Motivation

Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These methods take as input a set of sequencing-based assays of epigenomic activity, such as ChIP-seq measurements of histone modification and transcription factor binding. They output an annotation of the genome that assigns a chromatin state label to each genomic position. Existing SAGA methods have several limitations caused by the discrete annotation framework: such annotations cannot easily represent varying strengths of genomic elements, and they cannot easily represent combinatorial elements that simultaneously exhibit multiple types of activity. To remedy these limitations, we propose an annotation strategy that instead outputs a vector of chromatin state features at each position rather than a single discrete label. Continuous modeling is common in other fields, such as in topic modeling of text documents. We propose a method, epigenome-ssm-nonneg, that uses a non-negative state space model to efficiently annotate the genome with chromatin state features. We also propose several measures of the quality of a chromatin state feature annotation and we compare the performance of several alternative methods according to these quality measures.

Results

We show that chromatin state features from epigenome-ssm-nonneg are more useful for several downstream applications than both continuous and discrete alternatives, including their ability to identify expressed genes and enhancers. Therefore, we expect that these continuous chromatin state features will be valuable reference annotations to be used in visualization and downstream analysis.

Availability and implementation

Source code for epigenome-ssm is available at https://github.com/habibdanesh/epigenome-ssm and Zenodo (DOI: 10.5281/zenodo.6507585).

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Kyoda K, Ho KHL, Tohsato Y, Itoga H, Onami S. BD5: An open HDF5-based data format to represent quantitative biological dynamics data. PLoS One 2020;15:e0237468. [PMID: 32785254 PMCID: PMC7423140 DOI: 10.1371/journal.pone.0237468] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 07/27/2020] [Indexed: 11/18/2022] Open

Nti-Addae Y, Matthews D, Ulat VJ, Syed R, Sempéré G, Pétel A, Renner J, Larmande P, Guignon V, Jones E, Robbins K. Benchmarking database systems for Genomic Selection implementation. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020;2019:5566651. [PMID: 31508797 PMCID: PMC6737464 DOI: 10.1093/database/baz096] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Revised: 05/29/2019] [Accepted: 07/01/2019] [Indexed: 01/07/2023]

Dronamraju R, Jha DK, Eser U, Adams AT, Dominguez D, Choudhury R, Chiang YC, Rathmell WK, Emanuele MJ, Churchman LS, Strahl BD. Set2 methyltransferase facilitates cell cycle progression by maintaining transcriptional fidelity. Nucleic Acids Res 2019;46:1331-1344. [PMID: 29294086 PMCID: PMC5814799 DOI: 10.1093/nar/gkx1276] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Accepted: 12/18/2017] [Indexed: 12/14/2022] Open

Kumar R, Sobhy H, Stenberg P, Lizana L. Genome contact map explorer: a platform for the comparison, interactive visualization and analysis of genome contact maps. Nucleic Acids Res 2017;45:e152. [PMID: 28973466 PMCID: PMC5622372 DOI: 10.1093/nar/gkx644] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2016] [Accepted: 07/19/2017] [Indexed: 12/23/2022] Open

Huang F, Shen J, Guo Q, Shi Y. eRFSVM: a hybrid classifier to predict enhancers-integrating random forests with support vector machines. Hereditas 2016;153:6. [PMID: 28096768 PMCID: PMC5226099 DOI: 10.1186/s41065-016-0012-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Accepted: 06/16/2016] [Indexed: 01/03/2023] Open

Huy Hoang D, Sung WK. CWig: compressed representation of Wiggle/BedGraph format. Bioinformatics 2014;30:2543-50. [PMID: 24867943 DOI: 10.1093/bioinformatics/btu330] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Dale RK, Matzat LH, Lei EP. metaseq: a Python package for integrative genome-wide analysis reveals relationships between chromatin insulators and associated nuclear mRNA. Nucleic Acids Res 2014;42:9158-70. [PMID: 25063299 DOI: 10.1093/nar/gku644] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Hoffman MM, Buske OJ, Wang J, Weng Z, Bilmes JA, Noble WS. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods 2012;9:473-6. [PMID: 22426492 DOI: 10.1038/nmeth.1937] [Citation(s) in RCA: 395] [Impact Index Per Article: 32.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Accepted: 02/14/2012] [Indexed: 01/24/2023]

Identifying elemental genomic track types and representing them uniformly. BMC Bioinformatics 2011;12:494. [PMID: 22208806 PMCID: PMC3315820 DOI: 10.1186/1471-2105-12-494] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2011] [Accepted: 12/30/2011] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

With the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. The variation in biological context, and the increasingly dispersed mode of data generation, imply a need for precise, interoperable and flexible representations of genomic features through formats that are easy to parse. A host of alternative formats are currently available and in use, complicating analysis and tool development. The issue of whether and how the multitude of formats reflects varying underlying characteristics of data has to our knowledge not previously been systematically treated.

RESULTS

We here identify intrinsic distinctions between genomic features, and argue that the distinctions imply that a certain variation in the representation of features as genomic tracks is warranted. Four core informational properties of tracks are discussed: gaps, lengths, values and interconnections. From this we delineate fifteen generic track types. Based on the track type distinctions, we characterize major existing representational formats and find that the track types are not adequately supported by any single format. We also find, in contrast to the XML formats, that none of the existing tabular formats are conveniently extendable to support all track types. We thus propose two unified formats for track data, an improved XML format, BioXSD 1.1, and a new tabular format, GTrack 1.0.

CONCLUSIONS

The defined track types are shown to capture relevant distinctions between genomic annotation tracks, resulting in varying representational needs and analysis possibilities. The proposed formats, GTrack 1.0 and BioXSD 1.1, cater to the identified track distinctions and emphasize preciseness, flexibility and parsing convenience.

Collapse

Steinbiss S, Kurtz S. A new efficient data structure for storage and retrieval of multiple biosequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;9:330-344. [PMID: 22084150 DOI: 10.1109/tcbb.2011.146] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Buske OJ, Hoffman MM, Ponts N, Le Roch KG, Noble WS. Exploratory analysis of genomic segmentations with Segtools. BMC Bioinformatics 2011;12:415. [PMID: 22029426 PMCID: PMC3224787 DOI: 10.1186/1471-2105-12-415] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2011] [Accepted: 10/26/2011] [Indexed: 11/23/2022] Open