1
|
Li W, Yang L, Qiu Y, Yuan Y, Li X, Meng Z. FFP: joint Fast Fourier transform and fractal dimension in amino acid property-aware phylogenetic analysis. BMC Bioinformatics 2022; 23:347. [PMID: 35986255 PMCID: PMC9392226 DOI: 10.1186/s12859-022-04889-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 08/11/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Amino acid property-aware phylogenetic analysis (APPA) refers to the phylogenetic analysis method based on amino acid property encoding, which is used for understanding and inferring evolutionary relationships between species from the molecular perspective. Fast Fourier transform (FFT) and Higuchi’s fractal dimension (HFD) have excellent performance in describing sequences’ structural and complexity information for APPA. However, with the exponential growth of protein sequence data, it is very important to develop a reliable APPA method for protein sequence analysis.
Results
Consequently, we propose a new method named FFP, it joints FFT and HFD. Firstly, FFP is used to encode protein sequences on the basis of the important physicochemical properties of amino acids, the dissociation constant, which determines acidity and basicity of protein molecules. Secondly, FFT and HFD are used to generate the feature vectors of encoded sequences, whereafter, the distance matrix is calculated from the cosine function, which describes the degree of similarity between species. The smaller the distance between them, the more similar they are. Finally, the phylogenetic tree is constructed. When FFP is tested for phylogenetic analysis on four groups of protein sequences, the results are obviously better than other comparisons, with the highest accuracy up to more than 97%.
Conclusion
FFP has higher accuracy in APPA and multi-sequence alignment. It also can measure the protein sequence similarity effectively. And it is hoped to play a role in APPA’s related research.
Collapse
|
2
|
Li W, Yang L, Meng Z, Qiu Y, Wang PSP, Li X. Phylogenetic Analysis: A Novel Method of Protein Sequence Similarity Analysis. INT J PATTERN RECOGN 2022. [DOI: 10.1142/s0218001422580071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein sequence similarity analysis (PSSA) is a significant task in bioinformatics, which can obtain information about unknown sequences such as protein structures and homology relationships. Protein sequence refers to the series of amino acids with rich physical and chemical properties, namely the basic structure of proteins. However, sequence similarity analysis and phylogenetic analysis between different species which have complex amino acid sequences is a challenging problem. In this paper, nine properties of amino acids were considered and the sequence was converted into numerical values by principal component analysis (PCA); with Haar Wavelet Transform, and Higuchi fractal dimension (HFD), a new feature vector is constructed to represent the sequence; Spearman distance was selected to calculate the distance matrix and the phylogenetic tree was constructed. In this paper, two representative protein sequences (9 ND5 (NADH dehydrogenase 5) and 8 ND6 (NADH dehydrogenase 6)) were selected for similarity analysis and phylogenetic analysis, and compared with MEGA software and other existing methods. The extensive results show that our method is outperforming and results consistent with the known facts.
Collapse
Affiliation(s)
- Wei Li
- School of Computer, Electronics and Information, Guangxi University, Nanning, P. R. China
| | - Lina Yang
- School of Computer, Electronics and Information, Guangxi University, Nanning, P. R. China
| | - Zuqiang Meng
- School of Computer, Electronics and Information, Guangxi University, Nanning, P. R. China
| | - Yu Qiu
- School of Computer, Electronics and Information, Guangxi University, Nanning, P. R. China
| | | | - Xichun Li
- Guangxi Normal University for Nationalities, Chongzuo 532200, China
| |
Collapse
|
3
|
Zheng Q, Chen T, Zhou W, Xie L, Su H. Gene prediction by the noise-assisted MEMD and wavelet transform for identifying the protein coding regions. Biocybern Biomed Eng 2021. [DOI: 10.1016/j.bbe.2020.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
4
|
Yang L, Wei P, Zhong C, Meng Z, Wang P, Tang YY. A Fractal Dimension and Empirical Mode Decomposition-Based Method for Protein Sequence Analysis. INT J PATTERN RECOGN 2019. [DOI: 10.1142/s0218001419400202] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In bioinformatics, the biological functions of proteins and their interactions can often be analyzed by the similarity of their sequences. In this paper, the authors combine the fractal dimension, empirical mode decomposition (EMD), and sliding window for protein sequence comparison. First, the protein sequence is characterized and digitized into a signal, and then the signal characteristics are obtained by using EMD and fractal dimension. Each protein sequence can be decomposed into Intrinsic Mode Functions (IMFs). The fixed window’s fractal dimension is applied to each IMF and the original signal to extract the protein sequence characteristics. Experiments have shown that the feature extracted by this hybrid method is superior to the EMD method alone.
Collapse
Affiliation(s)
- Lina Yang
- School of Computer, Electronics and Information, Guangxi University, Nanning, Guangxi, P. R. China
| | - Pu Wei
- School of Computer, Electronics and Information, Guangxi University, Nanning, Guangxi, P. R. China
| | - Cheng Zhong
- School of Computer, Electronics and Information, Guangxi University, Nanning, Guangxi, P. R. China
| | - Zuqiang Meng
- School of Computer, Electronics and Information, Guangxi University, Nanning, Guangxi, P. R. China
| | - Patrick Wang
- Computer and Information Science, Northeastern University, Boston, USA
| | - Yuan Yan Tang
- Beijing Advanced Innovation Center for Big Data and Brain Computing (BDBC), Beihang University, Beijing, P. R. China
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, P. R. China
| |
Collapse
|
5
|
Yu Y, Yang L, Liu Z, Zhu C. Gene essentiality prediction based on fractal features and machine learning. MOLECULAR BIOSYSTEMS 2017; 13:577-584. [PMID: 28145541 DOI: 10.1039/c6mb00806b] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Predicting bacterial essential genes using only fractal features.
Collapse
Affiliation(s)
- Yongming Yu
- Department of Biomedical Engineering
- Shandong University
- Jinan
- China
| | - Licai Yang
- Department of Biomedical Engineering
- Shandong University
- Jinan
- China
| | - Zhiping Liu
- Department of Biomedical Engineering
- Shandong University
- Jinan
- China
| | - Chuansheng Zhu
- Department of Hematology
- Shandong University Affiliated Qianfoshan Hospital
- Jinan
- China
| |
Collapse
|