1
|
Wąż P, Zorena K, Murawska A, Bielińska-Wąż D. Classification Maps: A New Mathematical Tool Supporting the Diagnosis of Age-Related Macular Degeneration. J Pers Med 2023; 13:1074. [PMID: 37511686 PMCID: PMC10381320 DOI: 10.3390/jpm13071074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 06/18/2023] [Accepted: 06/26/2023] [Indexed: 07/30/2023] Open
Abstract
OBJECTIVE A new diagnostic graphical tool-classification maps-supporting the detection of Age-Related Macular Degeneration (AMD) has been constructed. METHODS The classification maps are constructed using the ordinal regression model. In the ordinal regression model, the ordinal variable (the dependent variable) is the degree of the advancement of AMD. The other variables, such as CRT (Central Retinal Thickness), GCC (Ganglion Cell Complex), MPOD (Macular Pigment Optical Density), ETDRS (Early Treatment Diabetic Retinopathy Study), Snellen and Age have also been used in the analysis and are represented on the axes of the maps. RESULTS Here, 132 eyes were examined and classified to the AMD advancement level according to the four-point Age-Related Eye Disease Scale (AREDS): AREDS 1, AREDS 2, AREDS 3 and AREDS 4. These data were used for the creation of two-dimensional classification maps for each of the four stages of AMD. CONCLUSIONS The maps allow us to perform the classification of the patient's eyes to particular stages of AMD. The pairs of the variables represented on the axes of the maps can be treated as diagnostic identifiers necessary for the classification to particular stages of AMD.
Collapse
Affiliation(s)
- Piotr Wąż
- Department of Nuclear Medicine, Medical University of Gdańsk, 80-210 Gdańsk, Poland
| | - Katarzyna Zorena
- Department of Immunobiology and Environment Microbiology, Medical University of Gdańsk, 80-210 Gdańsk, Poland
| | - Anna Murawska
- Department of Immunobiology and Environment Microbiology, Medical University of Gdańsk, 80-210 Gdańsk, Poland
| | - Dorota Bielińska-Wąż
- Department of Radiological Informatics and Statistics, Medical University of Gdańsk, 80-210 Gdańsk, Poland
| |
Collapse
|
2
|
4D-Dynamic Representation of DNA/RNA Sequences: Studies on Genetic Diversity of Echinococcus multilocularis in Red Foxes in Poland. LIFE (BASEL, SWITZERLAND) 2022; 12:life12060877. [PMID: 35743908 PMCID: PMC9227292 DOI: 10.3390/life12060877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 05/20/2022] [Accepted: 06/08/2022] [Indexed: 11/17/2022]
Abstract
The 4D-Dynamic Representation of DNA/RNA Sequences, an alignment-free bioinformatics method recently developed by us, has been used to study the genetic diversity of Echinococcus multilocularis in red foxes in Poland. Sequences of three mitochondrial genes, i.e., NADH dehydrogenase subunit 2 (nad2), cytochrome b (cob), and cytochrome c oxidase subunit 1 (cox1), are analyzed. The sequences are represented by sets of material points in a 4D space, i.e., 4D-dynamic graphs. As a visualization of the sequences, projections of the graphs into 3D space are shown. The differences between 3D graphs corresponding to European, Asian, and American haplotypes are small. Numerical characteristics (sequence descriptors) applied in the studies can recognize the differences. The concept of creating descriptors of 4D-dynamic graphs has been borrowed from classical dynamics; these are coordinates of the centers or mass and moments of inertia of 4D-dynamic graphs. Based on these descriptors, classification maps are constructed. The concentrations of points in the maps indicate one Polish haplotype (EmPL9) of Asian origin.
Collapse
|
3
|
Paul T, Vainio S, Roning J. Detection of intra-family coronavirus genome sequences through graphical representation and artificial neural network. EXPERT SYSTEMS WITH APPLICATIONS 2022; 194:116559. [PMID: 35095217 PMCID: PMC8779865 DOI: 10.1016/j.eswa.2022.116559] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 12/29/2021] [Accepted: 01/16/2022] [Indexed: 05/06/2023]
Abstract
In this study, chaos game representation (CGR) is introduced for investigating the pattern of genome sequences. It is an image representation of the genome for the overall visualization of the sequence. The CGR representation is a mapping technique that assigns each sequence base into the respective position in the two-dimension plane to portray the DNA sequence. Importantly, CGR provides one to one mapping to nucleotides as well as sequence. A coordinate of the CGR plane can tell the corresponding base and its location in the original genome. Therefore, the whole nucleotide sequence (until the current nucleotide) can be restored from the one point of the CGR. In this study, CGR coupled with artificial neural network (ANN) is introduced as a new way to represent the genome and to classify intra-coronavirus sequences. A hierarchy clustering study is done to validate the approach and found to be more than 90% accurate while comparing the result with the phylogenetic tree of the corresponding genomes. Interestingly, the method makes the genome sequence significantly shorter (more than 99% compressed) saving the data space while preserving the genome features.
Collapse
Affiliation(s)
- Tirthankar Paul
- InfoTech Oulu, Faculty of Information Technology and Electrical Engineering, Biomimetics and Intelligent Systems Group (BISG), University of Oulu, Oulu, Finland
| | - Seppo Vainio
- Infotech Oulu and Kvantum Institute, Faculty of Biochemistry and Molecular Medicine, Disease Networks, University of Oulu, Oulu, Finland
| | - Juha Roning
- InfoTech Oulu, Faculty of Information Technology and Electrical Engineering, Biomimetics and Intelligent Systems Group (BISG), University of Oulu, Oulu, Finland
| |
Collapse
|
4
|
Bielińska-Wąż D, Wąż P, Nandy A. Graphical Representations of Biological Sequences. Comb Chem High Throughput Screen 2022; 25:347-348. [PMID: 35038979 DOI: 10.2174/1386207325666220104221516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
| | - Piotr Wąż
- Medical University of Gdańsk 80-210 Gdańsk, Poland
| | - Ashesh Nandy
- Centre for Interdisciplinary Research and Education Kolkata 700068, India
| |
Collapse
|
5
|
Bielińska-Wąż D, Wąż P, Panas D. Applications of 2D and 3D-Dynamic Representations of DNA/RNA Sequences for a description of genome sequences of viruses. Comb Chem High Throughput Screen 2021; 25:429-438. [PMID: 34348613 DOI: 10.2174/1386207324666210804120454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 06/16/2021] [Accepted: 06/27/2021] [Indexed: 11/22/2022]
Abstract
The aim of the studies is to show that graphical bioinformatics methods are good tools for the description of genome sequences of viruses. A new approach to the identification of unknown virus strains is proposed. METHODS Biological sequences have been represented graphically through 2D and 3D-Dynamic Representations of DNA/RNA Sequences - theoretical methods for the graphical representation of the sequences developed by us earlier. In these approaches, some ideas of the classical dynamics have been introduced to bioinformatics. The sequences are represented by sets of material points in 2D or 3D spaces. The distribution of the points in space is characteristic of the sequence. The numerical parameters (descriptors) characterizing the sequences correspond to the quantities typical for classical dynamics. RESULTS Some applications of the theoretical methods have been reviewed briefly. 2D-dynamic graphs representing the complete genome sequences of SARS-CoV-2 are shown. CONCLUSION It is proved that the 3D-Dynamic Representation of DNA/RNA Sequences, coupled with the random forest algorithm, classifies successfully the subtypes of influenza A virus strains.
Collapse
Affiliation(s)
- Dorota Bielińska-Wąż
- Department of Radiological Informatics and Statistics, Medical University of Gdańsk, 80-210 Gdańsk. Poland
| | - Piotr Wąż
- Department of Nuclear Medicine, Medical University of Gdańsk, 80-210 Gdańsk. Poland
| | - Damian Panas
- Department of Radiological Informatics and Statistics, Medical University of Gdańsk, 80-210 Gdańsk. Poland
| |
Collapse
|
6
|
Bielińska-Wąż D, Wąż P. Non-standard bioinformatics characterization of SARS-CoV-2. Comput Biol Med 2021; 131:104247. [PMID: 33611129 PMCID: PMC7966820 DOI: 10.1016/j.compbiomed.2021.104247] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Revised: 01/22/2021] [Accepted: 01/26/2021] [Indexed: 12/16/2022]
Abstract
A non-standard bioinformatics method, 4D-Dynamic Representation of DNA/RNA Sequences, aiming at an analysis of the information available in nucleotide databases, has been formulated. The sequences are represented by sets of "material points" in a 4D space - 4D-dynamic graphs. The graphs representing the sequences are treated as "rigid bodies" and characterized by values analogous to the ones used in the classical dynamics. As the graphical representations of the sequences, the projections of the graphs into 2D and 3D spaces are used. The method has been applied to an analysis of the complete genome sequences of the 2019 novel coronavirus. As a result, 2D and 3D classification maps are obtained. The coordinate axes in the maps correspond to the values derived from the exact formulas characterizing the graphs: the coordinates of the centers of mass and the 4D moments of inertia. The points in the maps represent sequences and their coordinates are used as the classifiers. The main result of this work has been derived from the 3D classification maps. The distribution of clusters of points which emerged in these maps, supports the hypothesis that SARS-CoV-2 may have originated in bat and in pangolin. Pilot calculations for Zika virus sequence data prove that the proposed approach is also applicable to a description of time evolution of genome sequences of viruses.
Collapse
Affiliation(s)
- Dorota Bielińska-Wąż
- Department of Radiological Informatics and Statistics, Medical University of Gdańsk, 80-210, Gdańsk, Poland.
| | - Piotr Wąż
- Department of Nuclear Medicine, Medical University of Gdańsk, 80-210, Gdańsk, Poland.
| |
Collapse
|
7
|
Huang J, Dai Q, Yao Y, He PA. A Generalized Iterative Map for Analysis of Protein Sequences. Comb Chem High Throughput Screen 2020; 25:381-391. [PMID: 33045963 DOI: 10.2174/1386207323666201012142318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 07/30/2020] [Accepted: 08/09/2020] [Indexed: 11/22/2022]
Abstract
AIM AND OBJECTIVE The similarities comparison of biological sequences is an important task in bioinformatics. The methods of the similarities comparison for biological sequences are divided into two classes: sequence alignment method and alignment-free method. The graphical representation of biological sequences is a kind of alignment-free method, which constitutes a tool for analyzing and visualizing the biological sequences. In this article, a generalized iterative map of protein sequences was suggested to analyze the similarities of biological sequences. MATERIALS AND METHODS Based on the normalized physicochemical indexes of 20 amino acids, each amino acid can be mapped into a point in 5D space. A generalized iterative function system was introduced to outline a generalized iterative map of protein sequences, which can not only reflect various physicochemical properties of amino acids but also incorporate with different compression ratios of the component of a generalized iterative map. Several properties were proved to illustrate the advantage of the generalized iterative map. The mathematical description of the generalized iterative map was suggested to compare the similarities and dissimilarities of protein sequences. Based on this method, similarities/dissimilarities were compared among ND5 protein sequences, as well as ND6 protein sequences of ten different species. RESULTS By correlation analysis, the ClustalW results were compared with our similarity/dissimilarity results and other graphical representation results to show the utility of our approach. The comparison results show that our approach has better correlations with ClustalW for all species than other approaches and illustrate the effectiveness of our approach. CONCLUSION Two examples show that our method not only has good performances and effects in the similarity/dissimilarity analysis of protein sequences but also does not require complex computation.
Collapse
Affiliation(s)
- Jiahe Huang
- School of Science, Zhejiang Sci-Tech University, Hangzhou,China
| | - Qi Dai
- College of Life Science, Zhejiang Sci-Tech University, Hangzhou,China
| | - Yuhua Yao
- School of Mathematics and Statistics, Hainan Normal University, Haikou,China
| | - Ping-An He
- School of Science, Zhejiang Sci-Tech University, Hangzhou,China
| |
Collapse
|
8
|
Abo-Elkhier MM, Abd Elwahaab MA, Abo El Maaty MI. Measuring Similarity among Protein Sequences Using a New Descriptor. BIOMED RESEARCH INTERNATIONAL 2019; 2019:2796971. [PMID: 31886192 PMCID: PMC6893242 DOI: 10.1155/2019/2796971] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Revised: 09/03/2019] [Accepted: 10/28/2019] [Indexed: 12/01/2022]
Abstract
The comparison of protein sequences according to similarity is a fundamental aspect of today's biomedical research. With the developments of sequencing technologies, a large number of protein sequences increase exponentially in the public databases. Famous sequences' comparison methods are alignment based. They generally give excellent results when the sequences under study are closely related and they are time consuming. Herein, a new alignment-free method is introduced. Our technique depends on a new graphical representation and descriptor. The graphical representation of protein sequence is a simple way to visualize protein sequences. The descriptor compresses the primary sequence into a single vector composed of only two values. Our approach gives good results with both short and long sequences within a little computation time. It is applied on nine beta globin, nine ND5 (NADH dehydrogenase subunit 5), and 24 spike protein sequences. Correlation and significance analyses are also introduced to compare our similarity/dissimilarity results with others' approaches, results, and sequence homology.
Collapse
Affiliation(s)
- Mervat M. Abo-Elkhier
- Department of Engineering Mathematics and Physics, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt
| | - Marwa A. Abd Elwahaab
- Department of Engineering Mathematics and Physics, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt
| | - Moheb I. Abo El Maaty
- Department of Engineering Mathematics and Physics, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt
| |
Collapse
|
9
|
Spectral Analysis of Codons in the DNA Sequence of Fragile X Syndrome. J Med Syst 2019; 43:261. [DOI: 10.1007/s10916-019-1408-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Accepted: 06/26/2019] [Indexed: 11/25/2022]
|
10
|
Wąż PH. Meet Our Editorial Board Member. Comb Chem High Throughput Screen 2019. [DOI: 10.2174/138620732110190226170020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Piotr Henryk Wąż
- Department of Nuclear Medicine, Medical University of Gdansk Tuwima 15, 80-210 Gdansk, Poland
| |
Collapse
|
11
|
Qi ZH, Li KC, Ma JL, Yao YH, Liu LY. Novel Method of 3-Dimensional Graphical Representation for Proteins and Its Application. Evol Bioinform Online 2018; 14:1176934318777755. [PMID: 29977111 PMCID: PMC6024350 DOI: 10.1177/1176934318777755] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 04/09/2018] [Indexed: 11/16/2022] Open
Abstract
In this article, we propose a 3-dimensional graphical representation of protein sequences based on 10 physicochemical properties of 20 amino acids and the BLOSUM62 matrix. It contains evolutionary information and provides intuitive visualization. To further analyze the similarity of proteins, we extract a specific vector from the graphical representation curve. The vector is used to calculate the similarity distance between 2 protein sequences. To prove the effectiveness of our approach, we apply it to 3 real data sets. The results are consistent with the known evolution fact and show that our method is effective in phylogenetic analysis.
Collapse
Affiliation(s)
- Zhao-Hui Qi
- School of Information Science and
Technology, Shijiazhuang Tiedao University, Shijiazhuang, Republic of China
| | - Ke-Cheng Li
- School of Information Science and
Technology, Shijiazhuang Tiedao University, Shijiazhuang, Republic of China
| | - Jin-Long Ma
- School of Information Science and
Technology, Shijiazhuang Tiedao University, Shijiazhuang, Republic of China
| | - Yu-Hua Yao
- School of Mathematics and Statistics,
Hainan Normal University, Haikou, Republic of China
| | - Ling-Yun Liu
- School of Information Science and
Technology, Shijiazhuang Tiedao University, Shijiazhuang, Republic of China
| |
Collapse
|
12
|
Mo Z, Zhu W, Sun Y, Xiang Q, Zheng M, Chen M, Li Z. One novel representation of DNA sequence based on the global and local position information. Sci Rep 2018; 8:7592. [PMID: 29765099 PMCID: PMC5953932 DOI: 10.1038/s41598-018-26005-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 04/27/2018] [Indexed: 11/28/2022] Open
Abstract
One novel representation of DNA sequence combining the global and local position information of the original sequence has been proposed to distinguish the different species. First, for the sufficient exploitation of global information, one graphical representation of DNA sequence has been formulated according to the curve of Fermat spiral. Then, for the consideration of local characteristics of DNA sequence, attaching each point in the curve of Fermat spiral with the related mass has been applied based on the relationships of neighboring four nucleotides. In this paper, the normalized moments of inertia of the curve of Fermat spiral which composed by the points with mass has been calculated as the numerical description of the corresponding DNA sequence on the first exons of beta-global genes. Choosing the Euclidean distance as the measurement of the numerical descriptions, the similarity between species has shown the performance of proposed method.
Collapse
Affiliation(s)
- Zhiyi Mo
- School of Information and Electronic Engineering, Wuzhou University, Wuzhu, China
| | - Wen Zhu
- College of Computer Science and Electronic Engineering, Hunan University, Hunan, China.
| | - Yi Sun
- College of Computer Science and Electronic Engineering, Hunan University, Hunan, China
| | - Qilin Xiang
- College of Computer Science and Electronic Engineering, Hunan University, Hunan, China
| | - Ming Zheng
- School of Information and Electronic Engineering, Wuzhou University, Wuzhu, China
| | - Min Chen
- College of Computer and Information Science, Hunan Institute of Technology, Hengyang, China
| | - Zejun Li
- College of Computer and Information Science, Hunan Institute of Technology, Hengyang, China
| |
Collapse
|
13
|
Computational Techniques for a Comprehensive Understanding of Different Genotype-Phenotype Factors in Biological Systems and Their Applications. Synth Biol (Oxf) 2018. [DOI: 10.1007/978-981-10-8693-9_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|