Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Schmitt AO, Herzel H. Estimating the entropy of DNA sequences. J Theor Biol 1997;188:369-77. [PMID: 9344742 DOI: 10.1006/jtbi.1997.0493] [Citation(s) in RCA: 111] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Number

Cited by Other Article(s)

Hussein M, Andrade dos Ramos Z, Vink MA, Kroon P, Yu Z, Enjuanes L, Zuñiga S, Berkhout B, Herrera-Carrillo E. Efficient CRISPR-Cas13d-Based Antiviral Strategy to Combat SARS-CoV-2. Viruses 2023;15:v15030686. [PMID: 36992394 PMCID: PMC10051389 DOI: 10.3390/v15030686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 02/27/2023] [Accepted: 02/28/2023] [Indexed: 03/08/2023] Open

Mesa-Rodríguez A, Gonzalez A, Estevez-Rams E, Valdes-Sosa PA. Cancer Segmentation by Entropic Analysis of Ordered Gene Expression Profiles. ENTROPY (BASEL, SWITZERLAND) 2022;24:1744. [PMID: 36554151 PMCID: PMC9777913 DOI: 10.3390/e24121744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Revised: 11/24/2022] [Accepted: 11/24/2022] [Indexed: 06/17/2023]

Hussein M, Andrade dos Ramos Z, Berkhout B, Herrera-Carrillo E. In Silico Prediction and Selection of Target Sequences in the SARS-CoV-2 RNA Genome for an Antiviral Attack. Viruses 2022;14:v14020385. [PMID: 35215977 PMCID: PMC8880226 DOI: 10.3390/v14020385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 02/07/2022] [Accepted: 02/08/2022] [Indexed: 12/10/2022] Open

Antich A, Palacín C, Turon X, Wangensteen OS. DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets. PeerJ 2022;10:e12758. [PMID: 35111399 PMCID: PMC8783565 DOI: 10.7717/peerj.12758] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 12/16/2021] [Indexed: 01/07/2023] Open

Bussi Y, Kapon R, Reich Z. Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy. PLoS One 2021;16:e0258693. [PMID: 34648558 PMCID: PMC8516232 DOI: 10.1371/journal.pone.0258693] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 10/02/2021] [Indexed: 12/24/2022] Open

Information Entropy in Chemistry: An Overview. ENTROPY 2021;23:e23101240. [PMID: 34681964 PMCID: PMC8534366 DOI: 10.3390/e23101240] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 09/19/2021] [Accepted: 09/20/2021] [Indexed: 12/20/2022]

Pasookhush P, Usmani A, Suwannahong K, Palittapongarnpim P, Rukseree K, Ariyachaokun K, Buates S, Siripattanapipong S, Ajawatanawong P. Single-Strand Conformation Polymorphism Fingerprint Method for Dictyostelids. Front Microbiol 2021;12:708685. [PMID: 34512585 PMCID: PMC8431811 DOI: 10.3389/fmicb.2021.708685] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 07/22/2021] [Indexed: 11/13/2022] Open

Lewis RN, Soma M, de Kort SR, Gilman RT. Like Father Like Son: Cultural and Genetic Contributions to Song Inheritance in an Estrildid Finch. Front Psychol 2021;12:654198. [PMID: 34149539 PMCID: PMC8213215 DOI: 10.3389/fpsyg.2021.654198] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 05/05/2021] [Indexed: 11/25/2022] Open

Antich A, Palacin C, Wangensteen OS, Turon X. To denoise or to cluster, that is not the question: optimizing pipelines for COI metabarcoding and metaphylogeography. BMC Bioinformatics 2021;22:177. [PMID: 33820526 PMCID: PMC8020537 DOI: 10.1186/s12859-021-04115-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Accepted: 03/30/2021] [Indexed: 01/04/2023] Open

Abstract

BACKGROUND

The recent blooming of metabarcoding applications to biodiversity studies comes with some relevant methodological debates. One such issue concerns the treatment of reads by denoising or by clustering methods, which have been wrongly presented as alternatives. It has also been suggested that denoised sequence variants should replace clusters as the basic unit of metabarcoding analyses, missing the fact that sequence clusters are a proxy for species-level entities, the basic unit in biodiversity studies. We argue here that methods developed and tested for ribosomal markers have been uncritically applied to highly variable markers such as cytochrome oxidase I (COI) without conceptual or operational (e.g., parameter setting) adjustment. COI has a naturally high intraspecies variability that should be assessed and reported, as it is a source of highly valuable information. We contend that denoising and clustering are not alternatives. Rather, they are complementary and both should be used together in COI metabarcoding pipelines.

RESULTS

Using a COI dataset from benthic marine communities, we compared two denoising procedures (based on the UNOISE3 and the DADA2 algorithms), set suitable parameters for denoising and clustering, and applied these steps in different orders. Our results indicated that the UNOISE3 algorithm preserved a higher intra-cluster variability. We introduce the program DnoisE to implement the UNOISE3 algorithm taking into account the natural variability (measured as entropy) of each codon position in protein-coding genes. This correction increased the number of sequences retained by 88%. The order of the steps (denoising and clustering) had little influence on the final outcome.

CONCLUSIONS

We highlight the need for combining denoising and clustering, with adequate choice of stringency parameters, in COI metabarcoding. We present a program that uses the coding properties of this marker to improve the denoising step. We recommend researchers to report their results in terms of both denoised sequences (a proxy for haplotypes) and clusters formed (a proxy for species), and to avoid collapsing the sequences of the latter into a single representative. This will allow studies at the cluster (ideally equating species-level diversity) and at the intra-cluster level, and will ease additivity and comparability between studies.

Collapse

Górski AZ, Piwowar M. Nucleotide spacing distribution analysis for human genome. Mamm Genome 2021;32:123-128. [PMID: 33723659 PMCID: PMC8012312 DOI: 10.1007/s00335-021-09865-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 03/02/2021] [Indexed: 11/30/2022]

Nykrynova M, Barton V, Sedlar K, Bezdicek M, Lengerova M, Skutkova H. Word Entropy-Based Approach to Detect Highly Variable Genetic Markers for Bacterial Genotyping. Front Microbiol 2021;12:631605. [PMID: 33613503 PMCID: PMC7886790 DOI: 10.3389/fmicb.2021.631605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 01/13/2021] [Indexed: 11/13/2022] Open

Markić I, Štula M, Zorić M, Stipaničev D. Entropy-Based Approach in Selection Exact String-Matching Algorithms. ENTROPY (BASEL, SWITZERLAND) 2020;23:E31. [PMID: 33379282 PMCID: PMC7824336 DOI: 10.3390/e23010031] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2020] [Revised: 12/19/2020] [Accepted: 12/22/2020] [Indexed: 11/16/2022]

Abstract

The string-matching paradigm is applied in every computer science and science branch in general. The existence of a plethora of string-matching algorithms makes it hard to choose the best one for any particular case. Expressing, measuring, and testing algorithm efficiency is a challenging task with many potential pitfalls. Algorithm efficiency can be measured based on the usage of different resources. In software engineering, algorithmic productivity is a property of an algorithm execution identified with the computational resources the algorithm consumes. Resource usage in algorithm execution could be determined, and for maximum efficiency, the goal is to minimize resource usage. Guided by the fact that standard measures of algorithm efficiency, such as execution time, directly depend on the number of executed actions. Without touching the problematics of computer power consumption or memory, which also depends on the algorithm type and the techniques used in algorithm development, we have developed a methodology which enables the researchers to choose an efficient algorithm for a specific domain. String searching algorithms efficiency is usually observed independently from the domain texts being searched. This research paper aims to present the idea that algorithm efficiency depends on the properties of searched string and properties of the texts being searched, accompanied by the theoretical analysis of the proposed approach. In the proposed methodology, algorithm efficiency is expressed through character comparison count metrics. The character comparison count metrics is a formal quantitative measure independent of algorithm implementation subtleties and computer platform differences. The model is developed for a particular problem domain by using appropriate domain data (patterns and texts) and provides for a specific domain the ranking of algorithms according to the patterns' entropy. The proposed approach is limited to on-line exact string-matching problems based on information entropy for a search pattern. Meticulous empirical testing depicts the methodology implementation and purports soundness of the methodology.

Collapse

Identification of Regulatory SNPs Associated with Vicine and Convicine Content of Vicia faba Based on Genotyping by Sequencing Data Using Deep Learning. Genes (Basel) 2020;11:genes11060614. [PMID: 32516876 PMCID: PMC7349281 DOI: 10.3390/genes11060614] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 05/26/2020] [Accepted: 05/28/2020] [Indexed: 12/15/2022] Open

Sheng Q, Yu H, Oyebamiji O, Wang J, Chen D, Ness S, Zhao YY, Guo Y. AnnoGen: annotating genome-wide pragmatic features. Bioinformatics 2020;36:2899-2901. [PMID: 31930398 PMCID: PMC7203733 DOI: 10.1093/bioinformatics/btaa027] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 12/19/2019] [Accepted: 01/08/2020] [Indexed: 11/13/2022] Open

Turon X, Antich A, Palacín C, Præbel K, Wangensteen OS. From metabarcoding to metaphylogeography: separating the wheat from the chaff. ECOLOGICAL APPLICATIONS : A PUBLICATION OF THE ECOLOGICAL SOCIETY OF AMERICA 2020;30:e02036. [PMID: 31709684 PMCID: PMC7078904 DOI: 10.1002/eap.2036] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 07/31/2019] [Accepted: 10/03/2019] [Indexed: 05/31/2023]

Abstract

Metabarcoding is by now a well-established method for biodiversity assessment in terrestrial, freshwater, and marine environments. Metabarcoding data sets are usually used for α- and β-diversity estimates, that is, interspecies (or inter-MOTU [molecular operational taxonomic unit]) patterns. However, the use of hypervariable metabarcoding markers may provide an enormous amount of intraspecies (intra-MOTU) information-mostly untapped so far. The use of cytochrome oxidase (COI) amplicons is gaining momentum in metabarcoding studies targeting eukaryote richness. COI has been for a long time the marker of choice in population genetics and phylogeographic studies. Therefore, COI metabarcoding data sets may be used to study intraspecies patterns and phylogeographic features for hundreds of species simultaneously, opening a new field that we suggest to name metaphylogeography. The main challenge for the implementation of this approach is the separation of erroneous sequences from true intra-MOTU variation. Here, we develop a cleaning protocol based on changes in entropy of the different codon positions of the COI sequence, together with co-occurrence patterns of sequences. Using a data set of community DNA from several benthic littoral communities in the Mediterranean and Atlantic seas, we first tested by simulation on a subset of sequences a two-step cleaning approach consisting of a denoising step followed by a minimal abundance filtering. The procedure was then applied to the whole data set. We obtained a total of 563 MOTUs that were usable for phylogeographic inference. We used semiquantitative rank data instead of read abundances to perform AMOVAs and haplotype networks. Genetic variability was mainly concentrated within samples, but with an important between seas component as well. There were intergroup differences in the amount of variability between and within communities in each sea. For two species, the results could be compared with traditional Sanger sequence data available for the same zones, giving similar patterns. Our study shows that metabarcoding data can be used to infer intra- and interpopulation genetic variability of many species at a time, providing a new method with great potential for basic biogeography, connectivity and dispersal studies, and for the more applied fields of conservation genetics, invasion genetics, and design of protected areas.

Collapse

Czech L, Barbera P, Stamatakis A. Methods for automatic reference trees and multilevel phylogenetic placement. Bioinformatics 2020;35:1151-1158. [PMID: 30169747 PMCID: PMC6449752 DOI: 10.1093/bioinformatics/bty767] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Revised: 07/24/2018] [Accepted: 08/30/2018] [Indexed: 12/28/2022] Open

Waters NR, Abram F, Brennan F, Holmes A, Pritchard L. riboSeed: leveraging prokaryotic genomic architecture to assemble across ribosomal regions. Nucleic Acids Res 2019;46:e68. [PMID: 29608703 PMCID: PMC6009695 DOI: 10.1093/nar/gky212] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Accepted: 03/12/2018] [Indexed: 11/12/2022] Open

Li J, Zhang L, Li H, Ping Y, Xu Q, Wang R, Tan R, Wang Z, Liu B, Wang Y. Integrated entropy-based approach for analyzing exons and introns in DNA sequences. BMC Bioinformatics 2019;20:283. [PMID: 31182012 PMCID: PMC6557737 DOI: 10.1186/s12859-019-2772-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Kycia RA. Landauer's Principle as a Special Case of Galois Connection. ENTROPY 2018;20:e20120971. [PMID: 33266695 PMCID: PMC7512571 DOI: 10.3390/e20120971] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2018] [Revised: 12/03/2018] [Accepted: 12/12/2018] [Indexed: 11/30/2022]

Hernández-Orozco S, Kiani NA, Zenil H. Algorithmically probable mutations reproduce aspects of evolution, such as convergence rate, genetic memory and modularity. ROYAL SOCIETY OPEN SCIENCE 2018;5:180399. [PMID: 30225028 PMCID: PMC6124114 DOI: 10.1098/rsos.180399] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 07/20/2018] [Indexed: 05/07/2023]

Barbosa VC. Information-theoretic signatures of biodiversity in the barcoding gene. J Theor Biol 2018;451:111-116. [PMID: 29750998 DOI: 10.1016/j.jtbi.2018.05.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Revised: 04/30/2018] [Accepted: 05/08/2018] [Indexed: 11/16/2022]

Skene KR. Thermodynamics, ecology and evolutionary biology: A bridge over troubled water or common ground? ACTA OECOLOGICA-INTERNATIONAL JOURNAL OF ECOLOGY 2017. [DOI: 10.1016/j.actao.2017.10.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Gebert D, Hewel C, Rosenkranz D. unitas: the universal tool for annotation of small RNAs. BMC Genomics 2017;18:644. [PMID: 28830358 PMCID: PMC5567656 DOI: 10.1186/s12864-017-4031-9] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 08/07/2017] [Indexed: 12/21/2022] Open

Kistler L, Ware R, Smith O, Collins M, Allaby RG. A new model for ancient DNA decay based on paleogenomic meta-analysis. Nucleic Acids Res 2017;45:6310-6320. [PMID: 28486705 PMCID: PMC5499742 DOI: 10.1093/nar/gkx361] [Citation(s) in RCA: 91] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Revised: 04/15/2017] [Accepted: 04/20/2017] [Indexed: 01/04/2023] Open

Genotypic Complexity of Fisher's Geometric Model. Genetics 2017;206:1049-1079. [PMID: 28450460 DOI: 10.1534/genetics.116.199497] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 04/15/2017] [Indexed: 01/30/2023] Open

A novel numerical mapping method based on entropy for digitizing DNA sequences. Neural Comput Appl 2017. [DOI: 10.1007/s00521-017-2871-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Wu C, Yao S, Li X, Chen C, Hu X. Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human. Int J Mol Sci 2017;18:E420. [PMID: 28212312 PMCID: PMC5343954 DOI: 10.3390/ijms18020420] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Revised: 02/03/2017] [Accepted: 02/08/2017] [Indexed: 02/02/2023] Open

Benjamin A, Keten S. Polymer Conjugation as a Strategy for Long-Range Order in Supramolecular Polymers. J Phys Chem B 2016;120:3425-33. [DOI: 10.1021/acs.jpcb.5b12547] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Paci G, Cristadoro G, Monti B, Lenci M, Degli Esposti M, Castellani GC, Remondini D. Characterization of DNA methylation as a function of biological complexity via dinucleotide inter-distances. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2016;374:rsta.2015.0227. [PMID: 26857665 DOI: 10.1098/rsta.2015.0227] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 11/23/2015] [Indexed: 06/05/2023]

Characterizing Protease Specificity: How Many Substrates Do We Need? PLoS One 2015;10:e0142658. [PMID: 26559682 PMCID: PMC4641643 DOI: 10.1371/journal.pone.0142658] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2015] [Accepted: 10/26/2015] [Indexed: 12/26/2022] Open

Lahens NF, Kavakli IH, Zhang R, Hayer K, Black MB, Dueck H, Pizarro A, Kim J, Irizarry R, Thomas RS, Grant GR, Hogenesch JB. IVT-seq reveals extreme bias in RNA sequencing. Genome Biol 2014;15:R86. [PMID: 24981968 PMCID: PMC4197826 DOI: 10.1186/gb-2014-15-6-r86] [Citation(s) in RCA: 105] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Accepted: 06/30/2014] [Indexed: 01/22/2023] Open

Clustering of giant virus-DNA based on variations in local entropy. Viruses 2014;6:2259-67. [PMID: 24887142 PMCID: PMC4074927 DOI: 10.3390/v6062259] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2014] [Revised: 05/19/2014] [Accepted: 05/21/2014] [Indexed: 11/17/2022] Open

Vinga S. Information theory applications for biological sequence analysis. Brief Bioinform 2014;15:376-89. [PMID: 24058049 PMCID: PMC7109941 DOI: 10.1093/bib/bbt068] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2013] [Accepted: 08/17/2013] [Indexed: 01/13/2023] Open

Hudson NJ, Porto-Neto LR, Kijas J, McWilliam S, Taft RJ, Reverter A. Information compression exploits patterns of genome composition to discriminate populations and highlight regions of evolutionary interest. BMC Bioinformatics 2014;15:66. [PMID: 24606587 PMCID: PMC4015654 DOI: 10.1186/1471-2105-15-66] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2013] [Accepted: 02/26/2014] [Indexed: 11/20/2022] Open

Abstract

Background

Genomic information allows population relatedness to be inferred and selected genes to be identified. Single nucleotide polymorphism microarray (SNP-chip) data, a proxy for genome composition, contains patterns in allele order and proportion. These patterns can be quantified by compression efficiency (CE). In principle, the composition of an entire genome can be represented by a CE number quantifying allele representation and order.

Results

We applied a compression algorithm (DEFLATE) to genome-wide high-density SNP data from 4,155 human, 1,800 cattle, 1,222 sheep, 81 dogs and 49 mice samples. All human ethnic groups can be clustered by CE and the clusters recover phylogeography based on traditional fixation index (F_ST) analyses. CE analysis of other mammals results in segregation by breed or species, and is sensitive to admixture and past effective population size. This clustering is a consequence of individual patterns such as runs of homozygosity. Intriguingly, a related approach can also be used to identify genomic loci that show population-specific CE segregation. A high resolution CE ‘sliding window’ scan across the human genome, organised at the population level, revealed genes known to be under evolutionary pressure. These include SLC24A5 (European and Gujarati Indian skin pigmentation), HERC2 (European eye color), LCT (European and Maasai milk digestion) and EDAR (Asian hair thickness). We also identified a set of previously unidentified loci with high population-specific CE scores including the chromatin remodeler SCMH1 in Africans and EDA2R in Asians. Closer inspection reveals that these prioritised genomic regions do not correspond to simple runs of homozygosity but rather compositionally complex regions that are shared by many individuals of a given population. Unlike F_ST, CE analyses do not require ab initio population comparisons and are amenable to the hemizygous X chromosome.

Conclusions

We conclude with a discussion of the implications of CE for a complex systems science view of genome evolution. CE allows one to clearly visualise the evolution of individual genomes and populations through a formal, mathematically-rigorous information space. Overall, CE makes a set of biological predictions, some of which are unique and await functional validation.

Collapse

Bakouche N, Vandenbroucke AT, Goubau P, Ruelle J. Study of the HIV-2 Env cytoplasmic tail variability and its impact on Tat, Rev and Nef. PLoS One 2013;8:e79129. [PMID: 24223892 PMCID: PMC3815105 DOI: 10.1371/journal.pone.0079129] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2013] [Accepted: 09/18/2013] [Indexed: 11/24/2022] Open

Abstract

Background

The HIV-2 env’s 3’ end encodes the cytoplasmic tail (CT) of the Env protein. This genomic region also encodes the rev, Tat and Nef protein in overlapping reading frames. We studied the variability in the CT coding region in 46 clinical specimens and in 2 reference strains by sequencing and by culturing. The aims were to analyse the variability of Env CT and the evolution of proteins expressed from overlapping coding sequences.

Results

A 70% reduction of the length of the CT region affected the HIV-2 ROD and EHO strains invitro due to a premature stop codon in the env gene. In clinical samples this wasn’t observed, but the CT length varied due to insertions and deletions. We noted 3 conserved and 3 variable regions in the CT. The conserved regions were those containing residues involved in Env endocytosis, the potential HIV-2 CT region implicated in the NF-kB activation and the potential end of the lentiviral lytic peptide one. The variable regions were the potential HIV-2 Kennedy region, the potential lentiviral lytic peptide two and the beginning of the potential lentiviral lytic peptide one. A very hydrophobic region was coded downstream of the premature stop codon observed invitro, suggesting a membrane spanning region. Interestingly, the nucleotides that are responsible for the variability of the CT don’t impact rev and Nef. However, in the Kennedy-like coding region variability resulted only from nucleotide changes that impacted Env and Tat together.

Conclusion

The HIV-2 Env, Tat and Rev C-terminal part are subject to major length variations in both clinical samples and cultured strains. The HIV-2 Env CT contains variable and conserved regions. These regions don’t affect the rev and Nef amino acids composition which evolves independently. In contrast, Tat co-evolves with the Env CT.

Collapse

On the fractal geometry of DNA by the binary image analysis. Bull Math Biol 2013;75:1544-70. [PMID: 23760660 DOI: 10.1007/s11538-013-9859-9] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Accepted: 05/21/2013] [Indexed: 12/15/2022]

Skene KR. The energetics of ecological succession: A logistic model of entropic output. Ecol Modell 2013. [DOI: 10.1016/j.ecolmodel.2012.11.020] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Wei D, Jiang Q, Wei Y, Wang S. A novel hierarchical clustering algorithm for gene sequences. BMC Bioinformatics 2012;13:174. [PMID: 22823405 PMCID: PMC3443659 DOI: 10.1186/1471-2105-13-174] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2011] [Accepted: 06/30/2012] [Indexed: 11/10/2022] Open

Energetic loads and informational entropy during insect metamorphosis: measuring structural variability and self-organization. J Theor Biol 2011;286:1-12. [PMID: 21756920 DOI: 10.1016/j.jtbi.2011.06.029] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2010] [Revised: 06/21/2011] [Accepted: 06/22/2011] [Indexed: 11/23/2022]

Koslicki D. Topological entropy of DNA sequences. ACTA ACUST UNITED AC 2011;27:1061-7. [PMID: 21317142 DOI: 10.1093/bioinformatics/btr077] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Athanasopoulou L, Athanasopoulos S, Karamanos K, Almirantis Y. Scaling properties and fractality in the distribution of coding segments in eukaryotic genomes revealed through a block entropy approach. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2010;82:051917. [PMID: 21230510 DOI: 10.1103/physreve.82.051917] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2010] [Revised: 09/19/2010] [Indexed: 05/30/2023]

Abstract

Statistical methods, including block entropy based approaches, have already been used in the study of long-range features of genomic sequences seen as symbol series, either considering the full alphabet of the four nucleotides or the binary purine or pyrimidine character set. Here we explore the alternation of short protein-coding segments with long noncoding spacers in entire chromosomes, focusing on the scaling properties of block entropy. In previous studies, it has been shown that the sizes of noncoding spacers follow power-law-like distributions in most chromosomes of eukaryotic organisms from distant taxa. We have developed a simple evolutionary model based on well-known molecular events (segmental duplications followed by elimination of most of the duplicated genes) which reproduces the observed linearity in log-log plots. The scaling properties of block entropy H(n) have been studied in several works. Their findings suggest that linearity in semilogarithmic scale characterizes symbol sequences which exhibit fractal properties and long-range order, while this linearity has been shown in the case of the logistic map at the Feigenbaum accumulation point. The present work starts with the observation that the block entropy of the Cantor-like binary symbol series scales in a similar way. Then, we perform the same analysis for the full set of human chromosomes and for several chromosomes of other eukaryotes. A similar but less extended linearity in semilogarithmic scale, indicating fractality, is observed, while randomly formed surrogate sequences clearly lack this type of scaling. Genomic sequences always present entropy values much lower than their random surrogates. Symbol sequences produced by the aforementioned evolutionary model follow the scaling found in genomic sequences, thus corroborating the conjecture that "segmental duplication-gene elimination" dynamics may have contributed to the observed long rangeness in the coding or noncoding alternation in genomes.

Collapse

A Markov model of the Indus script. Proc Natl Acad Sci U S A 2009;106:13685-90. [PMID: 19666571 DOI: 10.1073/pnas.0906237106] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Jin NZ, Liu ZX, Qi YJ, Qiu WY. Repeat Sequences and Base Correlations in Human Y Chromosome Palindromes. CHINESE J CHEM PHYS 2009. [DOI: 10.1088/1674-0068/22/03/255-261] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Giancarlo R, Scaturro D, Utro F. Textual data compression in computational biology: a synopsis. Bioinformatics 2009;25:1575-86. [DOI: 10.1093/bioinformatics/btp117] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Brattico P. Shallow Reductionism and the Problem of Complexity in Psychology. THEORY & PSYCHOLOGY 2008. [DOI: 10.1177/0959354308091840] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Rocha LB, Adam RL, Leite NJ, Metze K, Rossi MA. Shannon's entropy and fractal dimension provide an objective account of bone tissue organization during calvarial bone regeneration. Microsc Res Tech 2008;71:619-25. [DOI: 10.1002/jemt.20598] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Legendre M, Verstrepen KJ. Using the SERV Applet to Detect Tandem Repeats in DNA Sequences and to Predict Their Variability. ACTA ACUST UNITED AC 2008;2008:pdb.ip50. [PMID: 21356663 DOI: 10.1101/pdb.ip50] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Piqueira JRC, Serboncini FA, Monteiro LHA. Biological models: Measuring variability with classical and quantum information. J Theor Biol 2006;242:309-13. [PMID: 16603194 DOI: 10.1016/j.jtbi.2006.02.019] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2005] [Revised: 02/23/2006] [Accepted: 02/27/2006] [Indexed: 11/26/2022]

Larsabal E, Danchin A. Genomes are covered with ubiquitous 11 bp periodic patterns, the "class A flexible patterns". BMC Bioinformatics 2005;6:206. [PMID: 16120222 PMCID: PMC1242344 DOI: 10.1186/1471-2105-6-206] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2005] [Accepted: 08/24/2005] [Indexed: 11/17/2022] Open

Nikolaou C, Almirantis Y. “Word” Preference in the Genomic Text and Genome Evolution: Different Modes of n-tuplet Usage in Coding and Noncoding Sequences. J Mol Evol 2005;61:23-35. [PMID: 16059753 DOI: 10.1007/s00239-004-0209-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2004] [Accepted: 02/02/2005] [Indexed: 10/25/2022]