Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Liu N, Wang T. Protein-based phylogenetic analysis by using hydropathy profile of amino acids. FEBS Lett 2006;580:5321-7. [PMID: 16979630 DOI: 10.1016/j.febslet.2006.08.086] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2006] [Revised: 08/26/2006] [Accepted: 08/28/2006] [Indexed: 11/23/2022]

For:	Liu N, Wang T. Protein-based phylogenetic analysis by using hydropathy profile of amino acids. FEBS Lett 2006;580:5321-7. [PMID: 16979630 DOI: 10.1016/j.febslet.2006.08.086] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2006] [Revised: 08/26/2006] [Accepted: 08/28/2006] [Indexed: 11/23/2022]

Number

Cited by Other Article(s)

Dixson JD, Azad RK. A novel predictor of ACE2-binding ability among betacoronaviruses. Evol Med Public Health 2021;9:360-373. [PMID: 34858595 PMCID: PMC8634463 DOI: 10.1093/emph/eoab032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 10/05/2021] [Indexed: 11/21/2022] Open

Abstract

Background

Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has resulted in ~4.8 million deaths worldwide as of this writing. Almost all conceivable aspects of SARS-CoV-2 have been explored since the virus began spreading in the human population. Despite numerous proposals, it is still unclear how and when the virus gained the ability to efficiently bind to and infect human cells. In an effort to understand the evolution of receptor binding domain (RBD) of the spike protein of SARS-CoV-2, and specifically, how the ability of RBD to bind to angiotensin-converting enzyme 2 receptor (ACE2) of humans evolved in coronaviruses, we have applied an alignment-free technique to infer functional relatedness among betacoronaviruses. This technique, concurrently being optimized for identifying novel prions, was adapted to gain new insights into coronavirus evolution, specifically in the context of the ongoing COVID-19 pandemic. Novel methods for predicting the capacity for coronaviruses, in general, to infect human cells are urgently needed.

Methodology

proposed method utilizes physicochemical properties of amino acids to develop fully dynamic waveform representations of proteins that encode both the amino acid content and the context of amino acids. These waveforms are then subjected to dynamic time warping (DTW) and distance evaluation to develop a distance metric that is relatively less sensitive to variation in sequence length and primary amino acid composition.

Results and Conclusions

Using our proposed method, we show that in contrast to alignment-based maximum likelihood (ML) and neighbor-joining (NJ) phylogenetic analyses, all bat betacoronavirus spike protein RBDs known to bind to the ACE2 receptor are found within a single physicochemical cluster. Further, other RBDs within that cluster are from pangolin coronaviruses, two of which have already been shown to bind to ACE2 while the others are suspected, yet unverified ACE2 binding domains. This finding is important because both severe acute respiratory syndrome coronavirus (SARS-CoV) and SARS-CoV-2 use the host ACE2 receptor for cell entry. Surveillance for coronaviruses belonging to this cluster could potentially guide efforts to stifle or curtail potential and/or early zoonotic outbreaks with their associated deaths and financial devastation.

Lay Summary

Robust methods for predicting human ACE2 receptor binding by the spike protein of coronaviruses are needed for the early detection of zoonotic coronaviruses and biosurveillance to prevent future outbreaks. Here we present a new waveform-based approach that utilizes the physicochemical properties of amino acids to determine the propensity of betacoronaviruses to infect humans. Comparison with the established phylogenetic methods demonstrates the usefulness of this new approach in the biosurveillance of coronaviruses.

Collapse

Mapping sequence to feature vector using numerical representation of codons targeted to amino acids for alignment-free sequence analysis. Gene 2020;766:145096. [PMID: 32919006 DOI: 10.1016/j.gene.2020.145096] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Revised: 08/16/2020] [Accepted: 08/24/2020] [Indexed: 12/17/2022]

Abstract

The phylogenetic analysis based on sequence similarity targeted to real biological taxa is one of the major challenging tasks. In this paper, we propose a novel alignment-free method, CoFASA (Codon Feature based Amino acid Sequence Analyser), for similarity analysis of nucleotide sequences. At first, we assign numerical weights to the four nucleotides. We then calculate a score of each codon based on the numerical value of the constituent nucleotides, termed as degree of codons. Accordingly, we obtain the degree of each amino acid based on the degree of codons targeted towards a specific amino acid. Utilizing the degree of twenty amino acids and their relative abundance within a given sequence, we generate 20-dimensional features for every coding DNA sequence or protein sequence. We use the features for performing phylogenetic analysis of the set of candidate sequences. We use multiple protein sequences derived from Beta-globin (BG), NADH dehydrogenase subunit 5 (ND5), Transferrins (TFs), Xylanases, low identity (<40%) and high identity (⩾40%) protein sequences (encompassing 533 and 1064 protein families) for experimental assessments. We compare our results with sixteen (16) well-known methods, including both alignment-based and alignment-free methods. Various assessment indices are used, such as the Pearson correlation coefficient, RF (Robinson-Foulds) distance and ROC score for performance analysis. While comparing the performance of CoFASA with alignment-based methods (ClustalW, ClustalΩ, MAFFT, and MUSCLE), it shows very similar results. Further, CoFASA shows better performance in comparison to well-known alignment-free methods, including LZW-Kernal, jD2Stat, FFP, spaced, and AFKS-D2s in predicting taxonomic relationship among candidate taxa. Overall, we observe that the features derived by CoFASA are very much useful in isolating the sequences according to their taxonomic labels. While our method is cost-effective, at the same time, produces consistent and satisfactory outcomes.

Collapse

Saw AK, Tripathy BC, Nandi S. Alignment-free similarity analysis for protein sequences based on fuzzy integral. Sci Rep 2019;9:2775. [PMID: 30808983 PMCID: PMC6391537 DOI: 10.1038/s41598-019-39477-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 01/15/2019] [Indexed: 12/12/2022] Open

A new sequence based encoding for prediction of host-pathogen protein interactions. Comput Biol Chem 2018;78:170-177. [PMID: 30553999 DOI: 10.1016/j.compbiolchem.2018.12.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Revised: 08/23/2018] [Accepted: 12/01/2018] [Indexed: 12/22/2022]

Skibinski DOF, Ghiselli F, Diz AP, Milani L, Mullins JGL. Structure-Related Differences between Cytochrome Oxidase I Proteins in a Stable Heteroplasmic Mitochondrial System. Genome Biol Evol 2018;9:3265-3281. [PMID: 29149282 PMCID: PMC5726481 DOI: 10.1093/gbe/evx235] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/13/2017] [Indexed: 12/27/2022] Open

Huang YA, You ZH, Chen X, Chan K, Luo X. Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding. BMC Bioinformatics 2016;17:184. [PMID: 27112932 PMCID: PMC4845433 DOI: 10.1186/s12859-016-1035-4] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2015] [Accepted: 04/12/2016] [Indexed: 12/22/2022] Open

Abstract

BACKGROUND

Proteins are the important molecules which participate in virtually every aspect of cellular function within an organism in pairs. Although high-throughput technologies have generated considerable protein-protein interactions (PPIs) data for various species, the processes of experimental methods are both time-consuming and expensive. In addition, they are usually associated with high rates of both false positive and false negative results. Accordingly, a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods information) are not available. Therefore, it is very urgent to develop effective computational methods for prediction of PPIs solely using protein sequence information.

RESULTS

In this study, we present a novel computational model combining weighted sparse representation based classifier (WSRC) and global encoding (GE) of amino acid sequence. Two kinds of protein descriptors, composition and transition, are extracted for representing each protein sequence. On the basis of such a feature representation, novel weighted sparse representation based classifier is introduced to predict protein interaction class. When the proposed method was evaluated with the PPIs data of S. cerevisiae, Human and H. pylori, it achieved high prediction accuracies of 96.82, 97.66 and 92.83 % respectively. Extensive experiments were performed for cross-species PPIs prediction and the prediction accuracies were also very promising.

CONCLUSIONS

To further evaluate the performance of the proposed method, we then compared its performance with the method based on support vector machine (SVM). The results show that the proposed method achieved a significant improvement. Thus, the proposed method is a very efficient method to predict PPIs and may be a useful supplementary tool for future proteomics studies.

Collapse

Fan M, Zheng B, Li L. A novel Multi-Agent Ada-Boost algorithm for predicting protein structural class with the information of protein secondary structure. J Bioinform Comput Biol 2015;13:1550022. [DOI: 10.1142/s0219720015500225] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

DV-curve representation of protein sequences and its application. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2014;2014:203871. [PMID: 24899916 PMCID: PMC4034481 DOI: 10.1155/2014/203871] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Revised: 03/10/2014] [Accepted: 04/03/2014] [Indexed: 11/17/2022]

Wang J, Li Y, Liu X, Dai Q, Yao Y, He P. High-accuracy prediction of protein structural classes using PseAA structural properties and secondary structural patterns. Biochimie 2014;101:104-12. [PMID: 24412731 DOI: 10.1016/j.biochi.2013.12.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2013] [Accepted: 12/30/2013] [Indexed: 10/25/2022]

Xie XL, Zheng LF, Yu Y, Liang LP, Guo MC, Song J, Yuan ZF. Protein sequence analysis based on hydropathy profile of amino acids. J Zhejiang Univ Sci B 2012;13:152-8. [PMID: 22302429 PMCID: PMC3274743 DOI: 10.1631/jzus.b1100052] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Liu L, Li D, Bai F. A relative Lempel-Ziv complexity: Application to comparing biological sequences. Chem Phys Lett 2012;530:107-112. [PMID: 32226089 PMCID: PMC7094452 DOI: 10.1016/j.cplett.2012.01.061] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2011] [Accepted: 01/24/2012] [Indexed: 11/17/2022]

Dai Q, Wu L, Li L. Improving protein structural class prediction using novel combined sequence information and predicted secondary structural features. J Comput Chem 2011;32:3393-8. [DOI: 10.1002/jcc.21918] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2011] [Revised: 06/29/2011] [Accepted: 07/25/2011] [Indexed: 11/07/2022]

Chang G, Wang T. Phylogenetic analysis of protein sequences based on distribution of length about common sub-string. Protein J 2011;30:167-72. [PMID: 21461804 PMCID: PMC7088358 DOI: 10.1007/s10930-011-9318-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Zhang S, Wang T. A Complexity-based Method to Compare RNA Secondary Structures and its Application. J Biomol Struct Dyn 2010;28:247-58. [PMID: 20645657 DOI: 10.1080/07391102.2010.10507357] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]

Albayrak A, Otu HH, Sezerman UO. Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets. BMC Bioinformatics 2010;11:428. [PMID: 20718947 PMCID: PMC2936399 DOI: 10.1186/1471-2105-11-428] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2010] [Accepted: 08/18/2010] [Indexed: 11/30/2022] Open

Liu L, Wang T. Comparison of TOPS strings based on LZ complexity. J Theor Biol 2008;251:159-66. [PMID: 18166201 DOI: 10.1016/j.jtbi.2007.11.016] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2007] [Revised: 11/13/2007] [Accepted: 11/13/2007] [Indexed: 10/22/2022]