Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Li W, Bernaola-Galván P, Haghighi F, Grosse I. Applications of recursive segmentation to the analysis of DNA sequences. Comput Chem 2002;26:491-510. [PMID: 12144178 DOI: 10.1016/s0097-8485(02)00010-4] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

For:	Li W, Bernaola-Galván P, Haghighi F, Grosse I. Applications of recursive segmentation to the analysis of DNA sequences. Comput Chem 2002;26:491-510. [PMID: 12144178 DOI: 10.1016/s0097-8485(02)00010-4] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Number

Cited by Other Article(s)

Brejová B, Gagie T, Herencsárová E, Vinař T. Maximum-scoring path sets on pangenome graphs of constant treewidth. FRONTIERS IN BIOINFORMATICS 2024;4:1391086. [PMID: 39011297 PMCID: PMC11246863 DOI: 10.3389/fbinf.2024.1391086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Accepted: 06/03/2024] [Indexed: 07/17/2024] Open

Peters TJ, Buckley MJ, Chen Y, Smyth GK, Goodnow CC, Clark SJ. Calling differentially methylated regions from whole genome bisulphite sequencing with DMRcate. Nucleic Acids Res 2021;49:e109. [PMID: 34320181 PMCID: PMC8565305 DOI: 10.1093/nar/gkab637] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 05/31/2021] [Accepted: 07/19/2021] [Indexed: 11/12/2022] Open

Li W, Freudenberg J, Freudenberg J. Alignment-free approaches for predicting novel Nuclear Mitochondrial Segments (NUMTs) in the human genome. Gene 2019;691:141-152. [PMID: 30630097 DOI: 10.1016/j.gene.2018.12.040] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 12/07/2018] [Accepted: 12/14/2018] [Indexed: 10/27/2022]

A model selection approach for multiple sequence segmentation and dimensionality reduction. J MULTIVARIATE ANAL 2018. [DOI: 10.1016/j.jmva.2018.05.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Singh VK, Krishnamachari A. Context based computational analysis and characterization of ARS consensus sequences (ACS) of Saccharomyces cerevisiae genome. GENOMICS DATA 2016;9:130-6. [PMID: 27508123 PMCID: PMC4971157 DOI: 10.1016/j.gdata.2016.07.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2016] [Revised: 06/27/2016] [Accepted: 07/06/2016] [Indexed: 01/08/2023]

Suvorova YM, Korotkova MA, Korotkov EV. Study of the Paired Change Points in Bacterial Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014;11:955-964. [PMID: 26356866 DOI: 10.1109/tcbb.2014.2321154] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Algama M, Keith JM. Investigating genomic structure using changept: A Bayesian segmentation model. Comput Struct Biotechnol J 2014;10:107-15. [PMID: 25349679 PMCID: PMC4204429 DOI: 10.1016/j.csbj.2014.08.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open

Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics. BMC Genomics 2012;13 Suppl 8:S19. [PMID: 23282225 PMCID: PMC3535712 DOI: 10.1186/1471-2164-13-s8-s19] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Azad RK, Li J. Interpreting genomic data via entropic dissection. Nucleic Acids Res 2012;41:e23. [PMID: 23036836 PMCID: PMC3592408 DOI: 10.1093/nar/gks917] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Bickel PJ, Boley N, Brown JB, Huang H, Zhang NR. Subsampling methods for genomic inference. Ann Appl Stat 2010. [DOI: 10.1214/10-aoas363] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Zhang W, Wu W, Lin W, Zhou P, Dai L, Zhang Y, Huang J, Zhang D. Deciphering heterogeneity in pig genome assembly Sscrofa9 by isochore and isochore-like region analyses. PLoS One 2010;5:e13303. [PMID: 20948965 PMCID: PMC2952626 DOI: 10.1371/journal.pone.0013303] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2010] [Accepted: 09/15/2010] [Indexed: 11/18/2022] Open

A binary search approach to whole-genome data analysis. Proc Natl Acad Sci U S A 2010;107:16893-8. [PMID: 20833816 DOI: 10.1073/pnas.1011134107] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Hutter B, Paulsen M, Helms V. Identifying CpG islands by different computational techniques. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2010;13:153-64. [PMID: 19196100 DOI: 10.1089/omi.2008.0046] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Stuart PE, Nair RP, Hiremagalore R, Kullavanijaya P, Kullavanijaya P, Tejasvi T, Lim HW, Voorhees JJ, Elder JT. Comparison of MHC class I risk haplotypes in Thai and Caucasian psoriatics shows locus heterogeneity at PSORS1. ACTA ACUST UNITED AC 2010;76:387-97. [PMID: 20604894 DOI: 10.1111/j.1399-0039.2010.01526.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Elhaik E, Graur D, Josić K, Landan G. Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm. Nucleic Acids Res 2010;38:e158. [PMID: 20571085 PMCID: PMC2926622 DOI: 10.1093/nar/gkq532] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open

Hackenberg M, Barturen G, Carpena P, Luque-Escamilla PL, Previti C, Oliver JL. Prediction of CpG-island function: CpG clustering vs. sliding-window methods. BMC Genomics 2010;11:327. [PMID: 20500903 PMCID: PMC2887419 DOI: 10.1186/1471-2164-11-327] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2010] [Accepted: 05/26/2010] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Unmethylated stretches of CpG dinucleotides (CpG islands) are an outstanding property of mammal genomes. Conventionally, these regions are detected by sliding window approaches using %G + C, CpG observed/expected ratio and length thresholds as main parameters. Recently, clustering methods directly detect clusters of CpG dinucleotides as a statistical property of the genome sequence.

RESULTS

We compare sliding-window to clustering (i.e. CpGcluster) predictions by applying new ways to detect putative functionality of CpG islands. Analyzing the co-localization with several genomic regions as a function of window size vs. statistical significance (p-value), CpGcluster shows a higher overlap with promoter regions and highly conserved elements, at the same time showing less overlap with Alu retrotransposons. The major difference in the prediction was found for short islands (CpG islets), often exclusively predicted by CpGcluster. Many of these islets seem to be functional, as they are unmethylated, highly conserved and/or located within the promoter region. Finally, we show that window-based islands can spuriously overlap several, differentially regulated promoters as well as different methylation domains, which might indicate a wrong merge of several CpG islands into a single, very long island. The shorter CpGcluster islands seem to be much more specific when concerning the overlap with alternative transcription start sites or the detection of homogenous methylation domains.

CONCLUSIONS

The main difference between sliding-window approaches and clustering methods is the length of the predicted islands. Short islands, often differentially methylated, are almost exclusively predicted by CpGcluster. This suggests that CpGcluster may be the algorithm of choice to explore the function of these short, but putatively functional CpG islands.

Collapse

Pehkonen P, Wong G, Törönen P. Heuristic Bayesian segmentation for discovery of coexpressed genes within genomic regions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010;7:37-49. [PMID: 20150667 DOI: 10.1109/tcbb.2008.56] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]

Elhaik E, Graur D, Josic K. Comparative testing of DNA segmentation algorithms using benchmark simulations. Mol Biol Evol 2009;27:1015-24. [PMID: 20018981 DOI: 10.1093/molbev/msp307] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Characterisation of inactivation domains and evolutionary strata in human X chromosome through Markov segmentation. PLoS One 2009;4:e7885. [PMID: 19946363 PMCID: PMC2776969 DOI: 10.1371/journal.pone.0007885] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2009] [Accepted: 10/09/2009] [Indexed: 11/19/2022] Open

Arvey AJ, Azad RK, Raval A, Lawrence JG. Detection of genomic islands via segmental genome heterogeneity. Nucleic Acids Res 2009;37:5255-66. [PMID: 19589805 PMCID: PMC2760805 DOI: 10.1093/nar/gkp576] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

Zhang Y. Relations between Shannon entropy and genome order index in segmenting DNA sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2009;79:041918. [PMID: 19518267 DOI: 10.1103/physreve.79.041918] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2008] [Revised: 03/14/2009] [Indexed: 05/27/2023]

Keith JM, Adams P, Stephen S, Mattick JS. Delineating slowly and rapidly evolving fractions of the Drosophila genome. J Comput Biol 2008;15:407-30. [PMID: 18435570 DOI: 10.1089/cmb.2007.0173] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open

Abstract

Evolutionary conservation is an important indicator of function and a major component of bioinformatic methods to identify non-protein-coding genes. We present a new Bayesian method for segmenting pairwise alignments of eukaryotic genomes while simultaneously classifying segments into slowly and rapidly evolving fractions. We also describe an information criterion similar to the Akaike Information Criterion (AIC) for determining the number of classes. Working with pairwise alignments enables detection of differences in conservation patterns among closely related species. We analyzed three whole-genome and three partial-genome pairwise alignments among eight Drosophila species. Three distinct classes of conservation level were detected. Sequences comprising the most slowly evolving component were consistent across a range of species pairs, and constituted approximately 62-66% of the D. melanogaster genome. Almost all (>90%) of the aligned protein-coding sequence is in this fraction, suggesting much of it (comprising the majority of the Drosophila genome, including approximately 56% of non-protein-coding sequences) is functional. The size and content of the most rapidly evolving component was species dependent, and varied from 1.6% to 4.8%. This fraction is also enriched for protein-coding sequence (while containing significant amounts of non-protein-coding sequence), suggesting it is under positive selection. We also classified segments according to conservation and GC content simultaneously. This analysis identified numerous sub-classes of those identified on the basis of conservation alone, but was nevertheless consistent with that classification. Software, data, and results available at www.maths.qut.edu.au/-keithj/. Genomic segments comprising the conservation classes available in BED format.

Collapse

Gao F, Zhang CT. Prediction of replication time zones at single nucleotide resolution in the human genome. FEBS Lett 2008;582:2441-4. [PMID: 18555015 DOI: 10.1016/j.febslet.2008.06.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2008] [Revised: 06/03/2008] [Accepted: 06/04/2008] [Indexed: 10/22/2022]

Multipattern consensus regions in multiple aligned protein sequences and their segmentation. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2008:35809. [PMID: 18427583 DOI: 10.1155/bsb/2006/35809] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2005] [Revised: 05/22/2006] [Accepted: 06/07/2006] [Indexed: 01/10/2023]

Zheng WX, Zhang CT. Biological Implications of Isochore Boundaries in the Human Genome. J Biomol Struct Dyn 2008;25:327-36. [DOI: 10.1080/07391102.2008.10507181] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Sequence segmentation. Methods Mol Biol 2008;452:207-29. [PMID: 18566767 DOI: 10.1007/978-1-60327-159-2_11] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Haiminen N, Mannila H. Discovering isochores by least-squares optimal segmentation. Gene 2007;394:53-60. [PMID: 17389148 DOI: 10.1016/j.gene.2007.01.028] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2006] [Revised: 01/16/2007] [Accepted: 01/22/2007] [Indexed: 10/23/2022]

Haiminen N, Mannila H, Terzi E. Comparing segmentations by applying randomization techniques. BMC Bioinformatics 2007;8:171. [PMID: 17521423 PMCID: PMC1904250 DOI: 10.1186/1471-2105-8-171] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2007] [Accepted: 05/23/2007] [Indexed: 11/25/2022] Open

Bock C, Walter J, Paulsen M, Lengauer T. CpG island mapping by epigenome prediction. PLoS Comput Biol 2007;3:e110. [PMID: 17559301 PMCID: PMC1892605 DOI: 10.1371/journal.pcbi.0030110] [Citation(s) in RCA: 129] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2006] [Accepted: 05/01/2007] [Indexed: 12/04/2022] Open

Abstract

CpG islands were originally identified by epigenetic and functional properties, namely, absence of DNA methylation and frequent promoter association. However, this concept was quickly replaced by simple DNA sequence criteria, which allowed for genome-wide annotation of CpG islands in the absence of large-scale epigenetic datasets. Although widely used, the current CpG island criteria incur significant disadvantages: (1) reliance on arbitrary threshold parameters that bear little biological justification, (2) failure to account for widespread heterogeneity among CpG islands, and (3) apparent lack of specificity when applied to the human genome. This study is driven by the idea that a quantitative score of “CpG island strength” that incorporates epigenetic and functional aspects can help resolve these issues. We construct an epigenome prediction pipeline that links the DNA sequence of CpG islands to their epigenetic states, including DNA methylation, histone modifications, and chromatin accessibility. By training support vector machines on epigenetic data for CpG islands on human Chromosomes 21 and 22, we identify informative DNA attributes that correlate with open versus compact chromatin structures. These DNA attributes are used to predict the epigenetic states of all CpG islands genome-wide. Combining predictions for multiple epigenetic features, we estimate the inherent CpG island strength for each CpG island in the human genome, i.e., its inherent tendency to exhibit an open and transcriptionally competent chromatin structure. We extensively validate our results on independent datasets, showing that the CpG island strength predictions are applicable and informative across different tissues and cell types, and we derive improved maps of predicted “bona fide” CpG islands. The mapping of CpG islands by epigenome prediction is conceptually superior to identifying CpG islands by widely used sequence criteria since it links CpG island detection to their characteristic epigenetic and functional states. And it is superior to purely experimental epigenome mapping for CpG island detection since it abstracts from specific properties that are limited to a single cell type or tissue. In addition, using computational epigenetics methods we could identify high correlation between the epigenome and characteristics of the DNA sequence, a finding which emphasizes the need for a better understanding of the mechanistic links between genome and epigenome.

A key challenge for bioinformatic research is the identification of regulatory regions in the human genome. Regulatory regions are DNA elements that control gene expression and thereby contribute to the organism's phenotype. An important class of regulatory regions consists of so-called CpG islands, which are characterized by frequent occurrence of the CG sequence pattern. CpG islands are strongly associated with open and transcriptionally competent chromatin structure, they play a critical role in gene regulation, and they are involved in the epigenetic causes of cancer. In this article we make several conceptual improvements to the definition and mapping of CpG islands. First, we show that the traditional distinction between CpG islands and non-CpG islands is too harsh, and instead we propose a quantitative measure of CpG island strength to gradually distinguish between stronger and weaker regulatory regions. Second, by genome-wide comparison of multiple epigenome datasets we identify high correlation between features of the genome's DNA sequence and the epigenome, indicating strong functional interdependence. Third, we develop and apply a novel method for predicting the strength of all CpG islands in the human genome, giving rise to an improved and more accurate CpG island mapping.

Collapse

Thakur V, Azad RK, Ramaswamy R. Markov models of genome segmentation. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2007;75:011915. [PMID: 17358192 DOI: 10.1103/physreve.75.011915] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2006] [Revised: 06/19/2006] [Indexed: 05/14/2023]

Fearnhead P, Sherlock C. An exact Gibbs sampler for the Markov-modulated Poisson process. J R Stat Soc Series B Stat Methodol 2006. [DOI: 10.1111/j.1467-9868.2006.00566.x] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Hackenberg M, Previti C, Luque-Escamilla PL, Carpena P, Martínez-Aroza J, Oliver JL. CpGcluster: a distance-based algorithm for CpG-island detection. BMC Bioinformatics 2006;7:446. [PMID: 17038168 PMCID: PMC1617122 DOI: 10.1186/1471-2105-7-446] [Citation(s) in RCA: 110] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2006] [Accepted: 10/12/2006] [Indexed: 01/09/2023] Open

Abstract

Background

Despite their involvement in the regulation of gene expression and their importance as genomic markers for promoter prediction, no objective standard exists for defining CpG islands (CGIs), since all current approaches rely on a large parameter space formed by the thresholds of length, CpG fraction and G+C content.

Results

Given the higher frequency of CpG dinucleotides at CGIs, as compared to bulk DNA, the distance distributions between neighboring CpGs should differ for bulk and island CpGs. A new algorithm (CpGcluster) is presented, based on the physical distance between neighboring CpGs on the chromosome and able to predict directly clusters of CpGs, while not depending on the subjective criteria mentioned above. By assigning a p-value to each of these clusters, the most statistically significant ones can be predicted as CGIs. CpGcluster was benchmarked against five other CGI finders by using a test sequence set assembled from an experimental CGI library. CpGcluster reached the highest overall accuracy values, while showing the lowest rate of false-positive predictions. Since a minimum-length threshold is not required, CpGcluster can find short but fully functional CGIs usually missed by other algorithms. The CGIs predicted by CpGcluster present the lowest degree of overlap with Alu retrotransposons and, simultaneously, the highest overlap with vertebrate Phylogenetic Conserved Elements (PhastCons). CpGcluster's CGIs overlapping with the Transcription Start Site (TSS) show the highest statistical significance, as compared to the islands in other genome locations, thus qualifying CpGcluster as a valuable tool in discriminating functional CGIs from the remaining islands in the bulk genome.

Conclusion

CpGcluster uses only integer arithmetic, thus being a fast and computationally efficient algorithm able to predict statistically significant clusters of CpG dinucleotides. Another outstanding feature is that all predicted CGIs start and end with a CpG dinucleotide, which should be appropriate for a genomic feature whose functionality is based precisely on CpG dinucleotides. The only search parameter in CpGcluster is the distance between two consecutive CpGs, in contrast to previous algorithms. Therefore, none of the main statistical properties of CpG islands (neither G+C content, CpG fraction nor length threshold) are needed as search parameters, which may lead to the high specificity and low overlap with spurious Alu elements observed for CpGcluster predictions.

Collapse

Gao F, Zhang CT. GC-Profile: a web-based tool for visualizing and analyzing the variation of GC content in genomic sequences. Nucleic Acids Res 2006;34:W686-91. [PMID: 16845098 PMCID: PMC1538862 DOI: 10.1093/nar/gkl040] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Tempel S, Giraud M, Lavenier D, Lerman IC, Valin AS, Couée I, Amrani AE, Nicolas J. Domain organization within repeated DNA sequences: application to the study of a family of transposable elements. Bioinformatics 2006;22:1948-54. [PMID: 16809391 DOI: 10.1093/bioinformatics/btl337] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Gao F, Zhang CT. Isochore structures in the chicken genome. FEBS J 2006;273:1637-48. [PMID: 16623701 DOI: 10.1111/j.1742-4658.2006.05178.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Nicorici D, Yli-Harja O, Astola J. Finding large domains of similarly expressed genes. A novel method using the MDL principle and the recursive segmentation procedure. IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE : THE QUARTERLY MAGAZINE OF THE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY 2006;25:82-9. [PMID: 16485395 DOI: 10.1109/memb.2006.1578667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]

Zhang CT, Gao F, Zhang R. Segmentation algorithm for DNA sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005;72:041917. [PMID: 16383430 DOI: 10.1103/physreve.72.041917] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2005] [Indexed: 05/05/2023]

Barral P J, Cantini L, Hasmy A, Jiménez J, Marcano A. Correlation between strand asymmetry and phylogeny in mitochondrial DNA. J Theor Biol 2005;236:422-6. [PMID: 15927203 DOI: 10.1016/j.jtbi.2005.03.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2004] [Revised: 03/17/2005] [Accepted: 03/17/2005] [Indexed: 11/25/2022]

Guéguen L. Sarment: Python modules for HMM analysis and partitioning of sequences. Bioinformatics 2005;21:3427-8. [PMID: 15947017 DOI: 10.1093/bioinformatics/bti533] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Luque-Escamilla PL, Martínez-Aroza J, Oliver JL, Gómez-Lopera JF, Román-Roldán R. Compositional searching of CpG islands in the human genome. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005;71:061925. [PMID: 16089783 DOI: 10.1103/physreve.71.061925] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2004] [Revised: 01/31/2005] [Indexed: 05/03/2023]

Cohen N, Dagan T, Stone L, Graur D. GC composition of the human genome: in search of isochores. Mol Biol Evol 2005;22:1260-72. [PMID: 15728737 DOI: 10.1093/molbev/msi115] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Zhang CT, Zhang R. Isochore structures in the mouse genome. Genomics 2004;83:384-94. [PMID: 14962664 DOI: 10.1016/j.ygeno.2003.09.011] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2003] [Accepted: 09/04/2003] [Indexed: 10/26/2022]

Li W, Holste D. An unusual 500,000 bases long oscillation of guanine and cytosine content in human chromosome 21. Comput Biol Chem 2004;28:393-9. [PMID: 15556480 DOI: 10.1016/j.compbiolchem.2004.09.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2004] [Revised: 09/30/2004] [Accepted: 09/30/2004] [Indexed: 01/09/2023]

Csurös M. Maximum-scoring segment sets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2004;1:139-50. [PMID: 17051696 DOI: 10.1109/tcbb.2004.43] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]

Bernaola-Galván P, Oliver JL, Carpena P, Clay O, Bernardi G. Quantifying intrachromosomal GC heterogeneity in prokaryotic genomes. Gene 2004;333:121-33. [PMID: 15177687 DOI: 10.1016/j.gene.2004.02.042] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2003] [Revised: 11/14/2003] [Accepted: 02/10/2004] [Indexed: 11/15/2022]

Zhang R, Zhang CT. Isochore Structures in the Genome of the Plant Arabidopsis thaliana. J Mol Evol 2004;59:227-38. [PMID: 15486696 DOI: 10.1007/s00239-004-2617-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2003] [Accepted: 02/10/2004] [Indexed: 10/26/2022]

Krishnamachari A, moy Mandal V. Study of DNA binding sites using the Rényi parametric entropy measure. J Theor Biol 2004;227:429-36. [PMID: 15019509 DOI: 10.1016/j.jtbi.2003.11.026] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2003] [Revised: 11/06/2003] [Accepted: 11/17/2003] [Indexed: 10/26/2022]

Wen SY, Zhang CT. Identification of isochore boundaries in the human genome using the technique of wavelet multiresolution analysis. Biochem Biophys Res Commun 2004;311:215-22. [PMID: 14575716 DOI: 10.1016/j.bbrc.2003.09.198] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Zhang CT, Zhang R. An isochore map of the human genome based on the Z curve method. Gene 2003;317:127-35. [PMID: 14604800 DOI: 10.1016/s0378-1119(03)00665-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2003. [PMCID: PMC2447381 DOI: 10.1002/cfg.226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open