1
|
Ghosh S, Pal J, Maji B, Cattani C, Bhattacharya DK. Choice of Metric Divergence in Genome Sequence Comparison. Protein J 2024; 43:259-273. [PMID: 38492188 DOI: 10.1007/s10930-024-10189-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/28/2024] [Indexed: 03/18/2024]
Abstract
The paper introduces a novel probability descriptor for genome sequence comparison, employing a generalized form of Jensen-Shannon divergence. This divergence metric stems from a one-parameter family, comprising fractions up to a maximum value of half. Utilizing this metric as a distance measure, a distance matrix is computed for the new probability descriptor, shaping Phylogenetic trees via the neighbor-joining method. Initial exploration involves setting the parameter at half for various species. Assessing the impact of parameter variation, trees drawn at different parameter values (half, one-fourth, one-eighth). However, measurement scales decrease with parameter value increments, with higher similarity accuracy corresponding to lower scale values. Ultimately, the highest accuracy aligns with the maximum parameter value of half. Comparative analyses against previous methods, evaluating via Symmetric Distance (SD) values and rationalized perception, consistently favor the present approach's results. Notably, outcomes at the maximum parameter value exhibit the most accuracy, validating the method's efficacy against earlier approaches.
Collapse
Affiliation(s)
- Soumen Ghosh
- Information Technology, Narula Institute of Technology, Kolkata, West Bengal, India.
| | - Jayanta Pal
- Computer Science & Engineering, Narula Institute of Technology, Kolkata, West Bengal, India
| | - Bansibadan Maji
- Electronics & Communication Engineering, National Institute of Technology, Durgapur, West Bengal, India
| | - Carlo Cattani
- DEIM, University of Tuscia, Largo Dell'Universita, 01100, Viterbo, Italy
| | | |
Collapse
|
2
|
Wang T, Yu ZG, Li J. CGRWDL: alignment-free phylogeny reconstruction method for viruses based on chaos game representation weighted by dynamical language model. Front Microbiol 2024; 15:1339156. [PMID: 38572227 PMCID: PMC10987876 DOI: 10.3389/fmicb.2024.1339156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 02/23/2024] [Indexed: 04/05/2024] Open
Abstract
Traditional alignment-based methods meet serious challenges in genome sequence comparison and phylogeny reconstruction due to their high computational complexity. Here, we propose a new alignment-free method to analyze the phylogenetic relationships (classification) among species. In our method, the dynamical language (DL) model and the chaos game representation (CGR) method are used to characterize the frequency information and the context information of k-mers in a sequence, respectively. Then for each DNA sequence or protein sequence in a dataset, our method converts the sequence into a feature vector that represents the sequence information based on CGR weighted by the DL model to infer phylogenetic relationships. We name our method CGRWDL. Its performance was tested on both DNA and protein sequences of 8 datasets of viruses to construct the phylogenetic trees. We compared the Robinson-Foulds (RF) distance between the phylogenetic tree constructed by CGRWDL and the reference tree by other advanced methods for each dataset. The results show that the phylogenetic trees constructed by CGRWDL can accurately classify the viruses, and the RF scores between the trees and the reference trees are smaller than that with other methods.
Collapse
Affiliation(s)
- Ting Wang
- National Center for Applied Mathematics in Hunan, Xiangtan University, Xiangtan, Hunan, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, China
| | - Zu-Guo Yu
- National Center for Applied Mathematics in Hunan, Xiangtan University, Xiangtan, Hunan, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, China
| | - Jinyan Li
- School of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Shenzhen, Guangdong, China
| |
Collapse
|
3
|
Pu F, Wang R, Yang X, Hu X, Wang J, Zhang L, Zhao Y, Zhang D, Liu Z, Liu J. Nucleotide and codon usage biases involved in the evolution of African swine fever virus: A comparative genomics analysis. J Basic Microbiol 2023; 63:499-518. [PMID: 36782108 DOI: 10.1002/jobm.202200624] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/05/2023] [Accepted: 01/21/2023] [Indexed: 02/15/2023]
Abstract
Since African swine fever virus (ASFV) replication is closely related to its host's machinery, codon usage of viral genome can be subject to selection pressures. A better understanding of codon usage can give new insights into viral evolution. We implemented information entropy and revealed that the nucleotide usage pattern of ASFV is significantly associated with viral isolation factors (region and time), especially the usages of thymine and cytosine. Despite the domination of adenine and thymine in the viral genome, we found that mutation pressure alters the overall codon usage pattern of ASFV, followed by selective forces from natural selection. Moreover, the nucleotide skew index at the gene level indicates that nucleotide usages influencing synonymous codon bias of ASFV are significantly correlated with viral protein hydropathy. Finally, evolutionary plasticity is proved to contribute to the weakness in synonymous codons with A- or T-end serving as optimal codons of ASFV, suggesting that fine-tuning translation selection plays a role in synonymous codon usages of ASFV for adapting host. Taken together, ASFV is subject to evolutionary dynamics on nucleotide selections and synonymous codon usage, and our detailed analysis offers deeper insights into the genetic characteristics of this newly emerging virus around the world.
Collapse
Affiliation(s)
- Feiyang Pu
- Biomedical Research Center, Northwest Minzu University, Lanzhou, China.,College of Life Science and Engineering, Northwest Minzu University, Lanzhou, Gansu, China
| | - Rui Wang
- Viterbi School of Engineering, University of Southern California, Los Angeles, California, USA
| | - Xuanye Yang
- Biomedical Research Center, Northwest Minzu University, Lanzhou, China.,College of Life Science and Engineering, Northwest Minzu University, Lanzhou, Gansu, China
| | - Xinyan Hu
- Biomedical Research Center, Northwest Minzu University, Lanzhou, China.,College of Life Science and Engineering, Northwest Minzu University, Lanzhou, Gansu, China
| | - Jinqian Wang
- Biomedical Research Center, Northwest Minzu University, Lanzhou, China.,College of Life Science and Engineering, Northwest Minzu University, Lanzhou, Gansu, China
| | - Lijuan Zhang
- College of Life Science and Engineering, Northwest Minzu University, Lanzhou, Gansu, China
| | - Yongqing Zhao
- Biomedical Research Center, Northwest Minzu University, Lanzhou, China.,College of Life Science and Engineering, Northwest Minzu University, Lanzhou, Gansu, China
| | - Derong Zhang
- Biomedical Research Center, Northwest Minzu University, Lanzhou, China.,College of Life Science and Engineering, Northwest Minzu University, Lanzhou, Gansu, China
| | - Zewen Liu
- Biomedical Research Center, Northwest Minzu University, Lanzhou, China.,College of Life Science and Engineering, Northwest Minzu University, Lanzhou, Gansu, China
| | - Junlin Liu
- Biomedical Research Center, Northwest Minzu University, Lanzhou, China.,College of Life Science and Engineering, Northwest Minzu University, Lanzhou, Gansu, China
| |
Collapse
|
4
|
Dey S, Das S, Bhattacharya DK. Biochemical Property Based Positional Matrix: A New Approach Towards Genome Sequence Comparison. J Mol Evol 2023; 91:93-131. [PMID: 36587178 PMCID: PMC9805373 DOI: 10.1007/s00239-022-10082-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 12/01/2022] [Indexed: 01/01/2023]
Abstract
The growth of the genome sequence has become one of the emerging areas in the study of bioinformatics. It has led to an excessive demand for researchers to develop advanced methodologies for evolutionary relationships among species. The alignment-free methods have been proved to be more efficient and appropriate related to time and space than existing alignment-based methods for sequence analysis. In this study, a new alignment-free genome sequence comparison technique is proposed based on the biochemical properties of nucleotides. Each genome sequence can be distributed in four parameters to represent a 21-dimensional numerical descriptor using the Positional Matrix. To substantiate the proposed method, phylogenetic trees are constructed on the viral and mammalian datasets by applying the UPGMA/NJ clustering method. Further, the results of this method are compared with the results of the Feature Frequency Profiles method, the Positional Correlation Natural Vector method, the Graph-theoretic method, the Multiple Encoding Vector method, and the Fuzzy Integral Similarity method. In most cases, it is found that the present method produces more accurate results than the prior methods. Also, in the present method, the execution time for computation is comparatively small.
Collapse
Affiliation(s)
- Sudeshna Dey
- grid.440742.10000 0004 1799 6713Computer Science and Engineering, Narula Institute of Technology, Kolkata, 700109 India
| | - Subhram Das
- grid.440742.10000 0004 1799 6713Computer Science and Engineering, Narula Institute of Technology, Kolkata, 700109 India
| | - D. K. Bhattacharya
- grid.59056.3f0000 0001 0664 9773Pure Mathematics, Calcutta University, Kolkata, 700019 India
| |
Collapse
|
5
|
Oyewole GJ, Thopil GA. Data clustering: application and trends. Artif Intell Rev 2022; 56:6439-6475. [PMID: 36466764 PMCID: PMC9702941 DOI: 10.1007/s10462-022-10325-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/04/2022] [Indexed: 11/28/2022]
Abstract
Clustering has primarily been used as an analytical technique to group unlabeled data for extracting meaningful information. The fact that no clustering algorithm can solve all clustering problems has resulted in the development of several clustering algorithms with diverse applications. We review data clustering, intending to underscore recent applications in selected industrial sectors and other notable concepts. In this paper, we begin by highlighting clustering components and discussing classification terminologies. Furthermore, specific, and general applications of clustering are discussed. Notable concepts on clustering algorithms, emerging variants, measures of similarities/dissimilarities, issues surrounding clustering optimization, validation and data types are outlined. Suggestions are made to emphasize the continued interest in clustering techniques both by scholars and Industry practitioners. Key findings in this review show the size of data as a classification criterion and as data sizes for clustering become larger and varied, the determination of the optimal number of clusters will require new feature extracting methods, validation indices and clustering techniques. In addition, clustering techniques have found growing use in key industry sectors linked to the sustainable development goals such as manufacturing, transportation and logistics, energy, and healthcare, where the use of clustering is more integrated with other analytical techniques than a stand-alone clustering technique.
Collapse
Affiliation(s)
- Gbeminiyi John Oyewole
- Department of Engineering and Technology Management, University of Pretoria, Pretoria, South Africa
| | - George Alex Thopil
- Department of Engineering and Technology Management, University of Pretoria, Pretoria, South Africa
| |
Collapse
|
6
|
Wang X, Sun J, Lu L, Pu FY, Zhang DR, Xie FQ. Evolutionary dynamics of codon usages for peste des petits ruminants virus. Front Vet Sci 2022; 9:968034. [PMID: 36032280 PMCID: PMC9412750 DOI: 10.3389/fvets.2022.968034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 07/25/2022] [Indexed: 11/13/2022] Open
Abstract
Peste des petits ruminants virus (PPRV) is an important agent of contagious, acute and febrile viral diseases in small ruminants, while its evolutionary dynamics related to codon usage are still lacking. Herein, we adopted information entropy, the relative synonymous codon usage values and similarity indexes and codon adaptation index to analyze the viral genetic features for 45 available whole genomes of PPRV. Some universal, lineage-specific, and gene-specific genetic features presented by synonymous codon usages of the six genes of PPRV that encode N, P, M, F, H and L proteins reflected evolutionary plasticity and independence. The high adaptation of PPRV to hosts at codon usages reflected high viral gene expression, but some synonymous codons that are rare in the hosts were selected in high frequencies in the viral genes. Another obvious genetic feature was that the synonymous codons containing CpG dinucleotides had weak tendencies to be selected in viral genes. The synonymous codon usage patterns of PPRV isolated during 2007–2008 and 2013–2014 in China displayed independent evolutionary pathway, although the overall codon usage patterns of these PPRV strains matched the universal codon usage patterns of lineage IV. According to the interplay between nucleotide and synonymous codon usages of the six genes of PPRV, the evolutionary dynamics including mutation pressure and natural selection determined the viral survival and fitness to its host.
Collapse
Affiliation(s)
- Xin Wang
- School of Stomatology, Lanzhou University, Lanzhou, China
| | - Jing Sun
- Geriatrics Department, The Second Hospital of Lanzhou University, Lanzhou, China
| | - Lei Lu
- School of Stomatology, Lanzhou University, Lanzhou, China
| | - Fei-yang Pu
- Center for Biomedical Research, Northwest Minzu University, Lanzhou, China
| | - De-rong Zhang
- Center for Biomedical Research, Northwest Minzu University, Lanzhou, China
| | - Fu-qiang Xie
- Maxillofacial Surgery Department, The Second Hospital of Lanzhou University, Lanzhou, China
- *Correspondence: Fu-qiang Xie
| |
Collapse
|