1
|
Abo-Elkhier MM, Abd Elwahaab MA, Abo El Maaty MI. Measuring Similarity among Protein Sequences Using a New Descriptor. BIOMED RESEARCH INTERNATIONAL 2019; 2019:2796971. [PMID: 31886192 PMCID: PMC6893242 DOI: 10.1155/2019/2796971] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Revised: 09/03/2019] [Accepted: 10/28/2019] [Indexed: 12/01/2022]
Abstract
The comparison of protein sequences according to similarity is a fundamental aspect of today's biomedical research. With the developments of sequencing technologies, a large number of protein sequences increase exponentially in the public databases. Famous sequences' comparison methods are alignment based. They generally give excellent results when the sequences under study are closely related and they are time consuming. Herein, a new alignment-free method is introduced. Our technique depends on a new graphical representation and descriptor. The graphical representation of protein sequence is a simple way to visualize protein sequences. The descriptor compresses the primary sequence into a single vector composed of only two values. Our approach gives good results with both short and long sequences within a little computation time. It is applied on nine beta globin, nine ND5 (NADH dehydrogenase subunit 5), and 24 spike protein sequences. Correlation and significance analyses are also introduced to compare our similarity/dissimilarity results with others' approaches, results, and sequence homology.
Collapse
Affiliation(s)
- Mervat M. Abo-Elkhier
- Department of Engineering Mathematics and Physics, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt
| | - Marwa A. Abd Elwahaab
- Department of Engineering Mathematics and Physics, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt
| | - Moheb I. Abo El Maaty
- Department of Engineering Mathematics and Physics, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt
| |
Collapse
|
2
|
Yu JF, Qu A, Tang HC, Wang FH, Wang CL, Wang HM, Wang JH, Zhu HQ. A novel numerical model for protein sequences analysis based on spherical coordinates and multiple physicochemical properties of amino acids. Biopolymers 2019; 110:e23282. [PMID: 30977898 DOI: 10.1002/bip.23282] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 03/28/2019] [Accepted: 03/28/2019] [Indexed: 01/25/2023]
Abstract
How to characterize short protein sequences to make an effective connection to their functions is an unsolved problem. Here we propose to map the physicochemical properties of each amino acid onto unit spheres so that each protein sequence can be represented quantitatively. We demonstrate the usefulness of this representation by applying it to the prediction of cell penetrating peptides. We show that its combination with traditional composition features yields the best performance across different datasets, among several methods compared. For the convenience of users, a web server has been established for automatic calculations of the proposed features at http://biophy.dzu.edu.cn/SNumD/.
Collapse
Affiliation(s)
- Jia-Feng Yu
- Shandong Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China.,Department of Biomedical Engineering, College of Engineering, and Centre for Quantitative Biology, Peking University, Beijing, China
| | - Ang Qu
- Shandong Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Hu-Cheng Tang
- Shandong Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Fang-Hua Wang
- Shandong Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Chun-Ling Wang
- Shandong Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Hong-Mei Wang
- Shandong Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Ji-Hua Wang
- Shandong Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Huai-Qiu Zhu
- Department of Biomedical Engineering, College of Engineering, and Centre for Quantitative Biology, Peking University, Beijing, China
| |
Collapse
|
3
|
Bielińska-Wąż D, Wąż P. Spectral-dynamic representation of DNA sequences. J Biomed Inform 2017; 72:1-7. [PMID: 28587890 DOI: 10.1016/j.jbi.2017.06.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Revised: 05/03/2017] [Accepted: 06/01/2017] [Indexed: 11/25/2022]
Abstract
A graphical representation of DNA sequences in which the distribution of a particular base B=A,C,G,T is represented by a set of discrete lines has been formulated. The methodology of this approach has been borrowed from two areas of physics: spectroscopy and dynamics. Consequently, the set of discrete lines is referred to as the B-spectrum. Next, the B-spectrum is transformed to a rigid body composed of material points. In this way a dynamic representation of the DNA sequence has been obtained. The centers of mass of these rigid bodies, divided by their moments of inertia, have been taken as the descriptors of the spectra and, thus, of the DNA sequences. The performance of this method on a standard set of data commonly applied by authors introducing new approaches to bioinformatics (the first exons of β-globin genes of different species) proved to be very good.
Collapse
Affiliation(s)
- Dorota Bielińska-Wąż
- Department of Radiological Informatics and Statistics, Medical University of Gdańsk, Tuwima 15, 80-210 Gdańsk, Poland.
| | - Piotr Wąż
- Department of Nuclear Medicine, Medical University of Gdańsk, Tuwima 15, 80-210 Gdańsk, Poland.
| |
Collapse
|
4
|
Vector representation and its application of DNA sequences based on nucleotide triplet codons. J Mol Graph Model 2015; 62:150-156. [PMID: 26432013 DOI: 10.1016/j.jmgm.2015.09.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2015] [Revised: 09/08/2015] [Accepted: 09/10/2015] [Indexed: 11/22/2022]
Abstract
Compared with single nucleotide, nucleotide triplet appears to contain more biological and genetic information, so it has been applied widely. We propose a new 3D-vector representation method of DNA sequences, namely use molecular weight of nucleotide triplet to define the '0' molecular plane, then make the 3-D coordinate transformation to map a DNA sequence into a curve by the coordinate accumulation, and then extract the D/D matrix's eigenvalues to describe, compare and analyze the DNA sequences as the numerical characterization. It is a new idea for comparison and phylogenetic trees' reconstruction of biological sequences.
Collapse
|
5
|
Randić M. On the history of the connectivity index: from the connectivity index to the exact solution of the protein alignment problem. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2015; 26:523-555. [PMID: 26336983 DOI: 10.1080/1062936x.2015.1076890] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2015] [Accepted: 07/22/2015] [Indexed: 06/05/2023]
Abstract
We briefly review the history of the connectivity index from 1975 to date. We hope to throw some light on why this unique, by its design, graph theoretical molecular descriptor continues to be of interest in QSAR, having wide use in applications in structure-property and structure-activity studies. We will elaborate on its generalizations and the insights it offered on applications in Multiple Regression Analysis (MRA). Going beyond the connectivity index we will outline several related developments in the development of molecular descriptors used in MRA, including molecular ID numbers (1986), the variable connectivity index (1991), orthogonal regression (1991), irrelevance of co-linearity of descriptors (1997), anti-connectivity (2006), and high discriminatory descriptors characterizing molecular similarity (2015). We will comment on beauty in QSAR and recent progress in searching for similarity of DNA, proteins and the proteome. This review reports on several results which are little known to the structure-property-activity community, the significance of which may surprise those unfamiliar with the application of discrete mathematics to chemistry. It tells the reader many unknown stories about the connectivity index, which may help the reader to better understand the meaning of this index. Readers are not required to be familiar with graph theory.
Collapse
Affiliation(s)
- M Randić
- a National Institute of Chemistry , Ljubljana , Slovenia
| |
Collapse
|
6
|
Yu JF, Dou XH, Wang HB, Sun X, Zhao HY, Wang JH. A Novel Cylindrical Representation for Characterizing Intrinsic Properties of Protein Sequences. J Chem Inf Model 2015; 55:1261-70. [PMID: 25945398 DOI: 10.1021/ci500577m] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The composition and sequence order of amino acid residues are the two most important characteristics to describe a protein sequence. Graphical representations facilitate visualization of biological sequences and produce biologically useful numerical descriptors. In this paper, we propose a novel cylindrical representation by placing the 20 amino acid residue types in a circle and sequence positions along the z axis. This representation allows visualization of the composition and sequence order of amino acids at the same time. Ten numerical descriptors and one weighted numerical descriptor have been developed to quantitatively describe intrinsic properties of protein sequences on the basis of the cylindrical model. Their applications to similarity/dissimilarity analysis of nine ND5 proteins indicated that these numerical descriptors are more effective than several classical numerical matrices. Thus, the cylindrical representation obtained here provides a new useful tool for visualizing and charactering protein sequences. An online server is available at http://biophy.dzu.edu.cn:8080/CNumD/input.jsp .
Collapse
Affiliation(s)
- Jia-Feng Yu
- †Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China.,‡State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Xiang-Hua Dou
- †Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Hong-Bo Wang
- †Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Xiao Sun
- ‡State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Hui-Ying Zhao
- §Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, Queensland 4000, Australia
| | - Ji-Hua Wang
- †Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China.,∥College of Physics and Electronic Information, Dezhou University, Dezhou 253023, China
| |
Collapse
|