1
|
Gardes J, Maldivi C, Boisset D, Aubourg T, Demongeot J. An Unsupervised Classifier for Whole-Genome Phylogenies, the Maxwell© Tool. Int J Mol Sci 2023; 24:16278. [PMID: 38003468 PMCID: PMC10671764 DOI: 10.3390/ijms242216278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 10/20/2023] [Accepted: 11/02/2023] [Indexed: 11/26/2023] Open
Abstract
The development of phylogenetic trees based on RNA or DNA sequences generally requires a precise and limited choice of important RNAs, e.g., messenger RNAs of essential proteins or ribosomal RNAs (like 16S), but rarely complete genomes, making it possible to explain evolution and speciation. In this article, we propose revisiting a classic phylogeny of archaea from only the information on the succession of nucleotides of their entire genome. For this purpose, we use a new tool, the unsupervised classifier Maxwell, whose principle lies in the Burrows-Wheeler compression transform, and we show its efficiency in clustering whole archaeal genomes.
Collapse
Affiliation(s)
- Joël Gardes
- Orange Labs, 38229 Meylan, France; (J.G.); (C.M.); (D.B.)
| | | | - Denis Boisset
- Orange Labs, 38229 Meylan, France; (J.G.); (C.M.); (D.B.)
| | - Timothée Aubourg
- Faculty of Medicine, Université Grenoble Alpes, AGEIS EA 7407 Tools for e-Gnosis Medical, 38700 La Tronche, France;
| | - Jacques Demongeot
- Faculty of Medicine, Université Grenoble Alpes, AGEIS EA 7407 Tools for e-Gnosis Medical, 38700 La Tronche, France;
| |
Collapse
|
2
|
Cohen AR, Vitanyi PMB. The Cluster Structure Function. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:11309-11320. [PMID: 37018105 PMCID: PMC10525042 DOI: 10.1109/tpami.2023.3264690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
For each partition of a data set into a given number of parts there is a partition such that every part is as much as possible a good model (an "algorithmic sufficient statistic") for the data in that part. Since this can be done for every number between one and the number of data, the result is a function, the cluster structure function. It maps the number of parts of a partition to values related to the deficiencies of being good models by the parts. Such a function starts with a value at least zero for no partition of the data set and descents to zero for the partition of the data set into singleton parts. The optimal clustering is the one selected by analyzing the cluster structure function. The theory behind the method is expressed in algorithmic information theory (Kolmogorov complexity). In practice the Kolmogorov complexities involved are approximated by a concrete compressor. We give examples using real data sets: the MNIST handwritten digits and the segmentation of real cells as used in stem cell research.
Collapse
|
3
|
Reznikova Z. Information Theory Opens New Dimensions in Experimental Studies of Animal Behaviour and Communication. Animals (Basel) 2023; 13:ani13071174. [PMID: 37048430 PMCID: PMC10093743 DOI: 10.3390/ani13071174] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/15/2023] [Accepted: 03/24/2023] [Indexed: 03/29/2023] Open
Abstract
Over the last 40–50 years, ethology has become increasingly quantitative and computational. However, when analysing animal behavioural sequences, researchers often need help finding an adequate model to assess certain characteristics of these sequences while using a relatively small number of parameters. In this review, I demonstrate that the information theory approaches based on Shannon entropy and Kolmogorov complexity can furnish effective tools to analyse and compare animal natural behaviours. In addition to a comparative analysis of stereotypic behavioural sequences, information theory can provide ideas for particular experiments on sophisticated animal communications. In particular, it has made it possible to discover the existence of a developed symbolic “language” in leader-scouting ant species based on the ability of these ants to transfer abstract information about remote events.
Collapse
|
4
|
Ramos G, Boratto L, Caleiro C. On the negative impact of social influence in recommender systems: A study of bribery in collaborative hybrid algorithms. Inf Process Manag 2020. [DOI: 10.1016/j.ipm.2019.102058] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
5
|
Machado JAT, Rocha-Neves JM, Andrade JP. Computational analysis of the SARS-CoV-2 and other viruses based on the Kolmogorov's complexity and Shannon's information theories. NONLINEAR DYNAMICS 2020; 101:1731-1750. [PMID: 32836811 PMCID: PMC7335223 DOI: 10.1007/s11071-020-05771-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Accepted: 06/14/2020] [Indexed: 05/06/2023]
Abstract
This paper tackles the information of 133 RNA viruses available in public databases under the light of several mathematical and computational tools. First, the formal concepts of distance metrics, Kolmogorov complexity and Shannon information are recalled. Second, the computational tools available presently for tackling and visualizing patterns embedded in datasets, such as the hierarchical clustering and the multidimensional scaling, are discussed. The synergies of the common application of the mathematical and computational resources are then used for exploring the RNA data, cross-evaluating the normalized compression distance, entropy and Jensen-Shannon divergence, versus representations in two and three dimensions. The results of these different perspectives give extra light in what concerns the relations between the distinct RNA viruses.
Collapse
Affiliation(s)
- J. A. Tenreiro Machado
- Department of Electrical Engineering, Institute of Engineering, Polytechnic of Porto, Rua Dr. António Bernardino de Almeida, 431, 4249-015 Porto, Portugal
| | - João M. Rocha-Neves
- Department of Biomedicine – Unity of Anatomy, Faculty of Medicine of University of Porto, Porto, Portugal
- Department of Physiology and Surgery, Faculty of Medicine of University of Porto, Porto, Portugal
| | - José P. Andrade
- Department of Biomedicine – Unity of Anatomy, Faculty of Medicine of University of Porto, Porto, Portugal
- Center for Health Technology and Services Research (CINTESIS), Porto, Portugal
| |
Collapse
|
6
|
Alagoz C, Cohen AR, Frisch DR, Tunç B, Phatharodom S, Guez A. Spiral waves characterization: Implications for an automated cardiodynamic tissue characterization. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 161:15-24. [PMID: 29852958 DOI: 10.1016/j.cmpb.2018.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Revised: 02/25/2018] [Accepted: 04/04/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVE Spiral waves are phenomena observed in cardiac tissue especially during fibrillatory activities. Spiral waves are revealed through in-vivo and in-vitro studies using high density mapping that requires special experimental setup. Also, in-silico spiral wave analysis and classification is performed using membrane potentials from entire tissue. In this study, we report a characterization approach that identifies spiral wave behaviors using intracardiac electrogram (EGM) readings obtained with commonly used multipolar diagnostic catheters that perform localized but high-resolution readings. Specifically, the algorithm is designed to distinguish between stationary, meandering, and break-up rotors. METHODS The clustering and classification algorithms are tested on simulated data produced using a phenomenological 2D model of cardiac propagation. For EGM measurements, unipolar-bipolar EGM readings from various locations on tissue using two catheter types are modeled. The distance measure between spiral behaviors are assessed using normalized compression distance (NCD), an information theoretical distance. NCD is a universal metric in the sense it is solely based on compressibility of dataset and not requiring feature extraction. We also introduce normalized FFT distance (NFFTD) where compressibility is replaced with a FFT parameter. RESULTS Overall, outstanding clustering performance was achieved across varying EGM reading configurations. We found that effectiveness in distinguishing was superior in case of NCD than NFFTD. We demonstrated that distinct spiral activity identification on a behaviorally heterogeneous tissue is also possible. CONCLUSIONS This report demonstrates a theoretical validation of clustering and classification approaches that provide an automated mapping from EGM signals to assessment of spiral wave behaviors and hence offers a potential mapping and analysis framework for cardiac tissue wavefront propagation patterns.
Collapse
Affiliation(s)
- Celal Alagoz
- ECE Department, Drexel University, Philadelphia, PA 19104, USA.
| | - Andrew R Cohen
- ECE Department, Drexel University, Philadelphia, PA 19104, USA
| | - Daniel R Frisch
- Thomas Jefferson University Hospital, Philadelphia, PA 19107, USA
| | - Birkan Tunç
- Center for Biomedical Image Computing and Analytics, Department of Radiology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Allon Guez
- ECE Department, Drexel University, Philadelphia, PA 19104, USA.
| |
Collapse
|
7
|
Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes. ENTROPY 2018; 20:e20060393. [PMID: 33265483 PMCID: PMC7512912 DOI: 10.3390/e20060393] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Revised: 05/16/2018] [Accepted: 05/21/2018] [Indexed: 11/26/2022]
Abstract
An efficient DNA compressor furnishes an approximation to measure and compare information quantities present in, between and across DNA sequences, regardless of the characteristics of the sources. In this paper, we compare directly two information measures, the Normalized Compression Distance (NCD) and the Normalized Relative Compression (NRC). These measures answer different questions; the NCD measures how similar both strings are (in terms of information content) and the NRC (which, in general, is nonsymmetric) indicates the fraction of one of them that cannot be constructed using information from the other one. This leads to the problem of finding out which measure (or question) is more suitable for the answer we need. For computing both, we use a state of the art DNA sequence compressor that we benchmark with some top compressors in different compression modes. Then, we apply the compressor on DNA sequences with different scales and natures, first using synthetic sequences and then on real DNA sequences. The last include mitochondrial DNA (mtDNA), messenger RNA (mRNA) and genomic DNA (gDNA) of seven primates. We provide several insights into evolutionary acceleration rates at different scales, namely, the observation and confirmation across the whole genomes of a higher variation rate of the mtDNA relative to the gDNA. We also show the importance of relative compression for localizing similar information regions using mtDNA.
Collapse
|
8
|
Coltuc D, Datcu M, Coltuc D. On the Use of Normalized Compression Distances for Image Similarity Detection. ENTROPY 2018; 20:e20020099. [PMID: 33265190 PMCID: PMC7512663 DOI: 10.3390/e20020099] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/02/2017] [Revised: 01/21/2018] [Accepted: 01/22/2018] [Indexed: 12/04/2022]
Abstract
This paper investigates the usefulness of the normalized compression distance (NCD) for image similarity detection. Instead of the direct NCD between images, the paper considers the correlation between NCD based feature vectors extracted for each image. The vectors are derived by computing the NCD between the original image and sequences of translated (rotated) versions. Feature vectors for simple transforms (circular translations on horizontal, vertical, diagonal directions and rotations around image center) and several standard compressors are generated and tested in a very simple experiment of similarity detection between the original image and two filtered versions (median and moving average). The promising vector configurations (geometric transform, lossless compressor) are further tested for similarity detection on the 24 images of the Kodak set subject to some common image processing. While the direct computation of NCD fails to detect image similarity even in the case of simple median and moving average filtering in 3 × 3 windows, for certain transforms and compressors, the proposed approach appears to provide robustness at similarity detection against smoothing, lossy compression, contrast enhancement, noise addition and some robustness against geometrical transforms (scaling, cropping and rotation).
Collapse
Affiliation(s)
- Dinu Coltuc
- Faculty of Electrical Engineering, Electronics and Information Technology, Valahia University of Targoviste, Târgoviște 130024, Romania
- Correspondence:
| | - Mihai Datcu
- Remote Sensing Technology Institute, German Aerospace Center (DLR), Germany and Research Centre for Spatial Information, Politehnica University of Bucharest, București 060042, Romania
| | - Daniela Coltuc
- Research Centre for Spatial Information, Politehnica University of Bucharest, București 060042, Romania
| |
Collapse
|
9
|
Joshi R, Mankowski W, Winter M, Saini JS, Blenkinsop TA, Stern JH, Temple S, Cohen AR. Automated Measurement of Cobblestone Morphology for Characterizing Stem Cell Derived Retinal Pigment Epithelial Cell Cultures. J Ocul Pharmacol Ther 2016; 32:331-9. [PMID: 27191513 DOI: 10.1089/jop.2015.0163] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
PURPOSE Assessing the morphologic properties of cells in microscopy images is an important task to evaluate cell health, identity, and purity. Typically, subjective visual assessments are accomplished by an experienced researcher. This subjective human step makes transfer of the evaluation process from the laboratory to the cell manufacturing facility difficult and time consuming. METHODS Automated image analysis can provide rapid, objective measurements of cultured cells, greatly aiding manufacturing, regulatory, and research goals. Automated algorithms for classifying images based on appearance characteristics typically either extract features from the image and use those features for classification or use the images directly as input to the classification algorithm. In this study we have developed both feature and nonfeature extraction methods for automatically measuring "cobblestone" structure in human retinal pigment epithelial (RPE) cell cultures. RESULTS A new approach using image compression combined with a Kolmogorov complexity-based distance metric enables robust classification of microscopy images of RPE cell cultures. The automated measurements corroborate determinations made by experienced cell biologists. We have also developed an approach for using steerable wavelet filters for extracting features to characterize the individual cellular junctions. CONCLUSIONS Two image analysis techniques enable robust and accurate characterization of the cobblestone morphology that is indicative of viable RPE cultures for therapeutic applications.
Collapse
Affiliation(s)
- Rohini Joshi
- 1 Department of Electrical and Computer Engineering, Drexel University , Philadelphia, Pennsylvania
| | - Walter Mankowski
- 1 Department of Electrical and Computer Engineering, Drexel University , Philadelphia, Pennsylvania
| | - Mark Winter
- 1 Department of Electrical and Computer Engineering, Drexel University , Philadelphia, Pennsylvania
| | | | - Timothy A Blenkinsop
- 3 Developmental and Regenerative Biology, Mount Sinai Hospital , New York, New York
| | | | - Sally Temple
- 2 Neural Stem Cell Institute , Rensselaer, New York
| | - Andrew R Cohen
- 1 Department of Electrical and Computer Engineering, Drexel University , Philadelphia, Pennsylvania
| |
Collapse
|