1
|
Dixson JD, Azad RK. A novel predictor of ACE2-binding ability among betacoronaviruses. Evol Med Public Health 2021; 9:360-373. [PMID: 34858595 PMCID: PMC8634463 DOI: 10.1093/emph/eoab032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 10/05/2021] [Indexed: 11/21/2022] Open
Abstract
Background Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has resulted in ~4.8 million deaths worldwide as of this writing. Almost all conceivable aspects of SARS-CoV-2 have been explored since the virus began spreading in the human population. Despite numerous proposals, it is still unclear how and when the virus gained the ability to efficiently bind to and infect human cells. In an effort to understand the evolution of receptor binding domain (RBD) of the spike protein of SARS-CoV-2, and specifically, how the ability of RBD to bind to angiotensin-converting enzyme 2 receptor (ACE2) of humans evolved in coronaviruses, we have applied an alignment-free technique to infer functional relatedness among betacoronaviruses. This technique, concurrently being optimized for identifying novel prions, was adapted to gain new insights into coronavirus evolution, specifically in the context of the ongoing COVID-19 pandemic. Novel methods for predicting the capacity for coronaviruses, in general, to infect human cells are urgently needed. Methodology proposed method utilizes physicochemical properties of amino acids to develop fully dynamic waveform representations of proteins that encode both the amino acid content and the context of amino acids. These waveforms are then subjected to dynamic time warping (DTW) and distance evaluation to develop a distance metric that is relatively less sensitive to variation in sequence length and primary amino acid composition. Results and Conclusions Using our proposed method, we show that in contrast to alignment-based maximum likelihood (ML) and neighbor-joining (NJ) phylogenetic analyses, all bat betacoronavirus spike protein RBDs known to bind to the ACE2 receptor are found within a single physicochemical cluster. Further, other RBDs within that cluster are from pangolin coronaviruses, two of which have already been shown to bind to ACE2 while the others are suspected, yet unverified ACE2 binding domains. This finding is important because both severe acute respiratory syndrome coronavirus (SARS-CoV) and SARS-CoV-2 use the host ACE2 receptor for cell entry. Surveillance for coronaviruses belonging to this cluster could potentially guide efforts to stifle or curtail potential and/or early zoonotic outbreaks with their associated deaths and financial devastation. Lay Summary Robust methods for predicting human ACE2 receptor binding by the spike protein of coronaviruses are needed for the early detection of zoonotic coronaviruses and biosurveillance to prevent future outbreaks. Here we present a new waveform-based approach that utilizes the physicochemical properties of amino acids to determine the propensity of betacoronaviruses to infect humans. Comparison with the established phylogenetic methods demonstrates the usefulness of this new approach in the biosurveillance of coronaviruses.
Collapse
Affiliation(s)
- Jamie D Dixson
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA
| | - Rajeev K Azad
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA
- Department of Mathematics, University of North Texas, Denton, TX 76203, USA
| |
Collapse
|
2
|
Mapping sequence to feature vector using numerical representation of codons targeted to amino acids for alignment-free sequence analysis. Gene 2020; 766:145096. [PMID: 32919006 DOI: 10.1016/j.gene.2020.145096] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Revised: 08/16/2020] [Accepted: 08/24/2020] [Indexed: 12/17/2022]
Abstract
The phylogenetic analysis based on sequence similarity targeted to real biological taxa is one of the major challenging tasks. In this paper, we propose a novel alignment-free method, CoFASA (Codon Feature based Amino acid Sequence Analyser), for similarity analysis of nucleotide sequences. At first, we assign numerical weights to the four nucleotides. We then calculate a score of each codon based on the numerical value of the constituent nucleotides, termed as degree of codons. Accordingly, we obtain the degree of each amino acid based on the degree of codons targeted towards a specific amino acid. Utilizing the degree of twenty amino acids and their relative abundance within a given sequence, we generate 20-dimensional features for every coding DNA sequence or protein sequence. We use the features for performing phylogenetic analysis of the set of candidate sequences. We use multiple protein sequences derived from Beta-globin (BG), NADH dehydrogenase subunit 5 (ND5), Transferrins (TFs), Xylanases, low identity (<40%) and high identity (⩾40%) protein sequences (encompassing 533 and 1064 protein families) for experimental assessments. We compare our results with sixteen (16) well-known methods, including both alignment-based and alignment-free methods. Various assessment indices are used, such as the Pearson correlation coefficient, RF (Robinson-Foulds) distance and ROC score for performance analysis. While comparing the performance of CoFASA with alignment-based methods (ClustalW, ClustalΩ, MAFFT, and MUSCLE), it shows very similar results. Further, CoFASA shows better performance in comparison to well-known alignment-free methods, including LZW-Kernal, jD2Stat, FFP, spaced, and AFKS-D2s in predicting taxonomic relationship among candidate taxa. Overall, we observe that the features derived by CoFASA are very much useful in isolating the sequences according to their taxonomic labels. While our method is cost-effective, at the same time, produces consistent and satisfactory outcomes.
Collapse
|
3
|
Saw AK, Tripathy BC, Nandi S. Alignment-free similarity analysis for protein sequences based on fuzzy integral. Sci Rep 2019; 9:2775. [PMID: 30808983 PMCID: PMC6391537 DOI: 10.1038/s41598-019-39477-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 01/15/2019] [Indexed: 12/12/2022] Open
Abstract
Sequence comparison is an essential part of modern molecular biology research. In this study, we estimated the parameters of Markov chain by considering the frequencies of occurrence of the all possible amino acid pairs from each alignment-free protein sequence. These estimated Markov chain parameters were used to calculate similarity between two protein sequences based on a fuzzy integral algorithm. For validation, our result was compared with both alignment-based (ClustalW) and alignment-free methods on six benchmark datasets. The results indicate that our developed algorithm has a better clustering performance for protein sequence comparison.
Collapse
Affiliation(s)
- Ajay Kumar Saw
- Institute of Advanced Study in Science and Technology, Mathematical Sciences Division, Guwahati, 781035, India
| | | | - Soumyadeep Nandi
- Institute of Advanced Study in Science and Technology, Life Science Division, Guwahati, 781035, India.
| |
Collapse
|
4
|
A new sequence based encoding for prediction of host-pathogen protein interactions. Comput Biol Chem 2018; 78:170-177. [PMID: 30553999 DOI: 10.1016/j.compbiolchem.2018.12.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Revised: 08/23/2018] [Accepted: 12/01/2018] [Indexed: 12/22/2022]
Abstract
Pathogen-host interactions are very important to figure out the infection process at the molecular level, where pathogen proteins physically bind to human proteins to manipulate critical biological processes in the host cell. Data scarcity and data unavailability are two major problems for computational approaches in the prediction of pathogen-host interactions. Developing a computational method to predict pathogen-host interactions with high accuracy, based on protein sequences alone, is of great importance because it can eliminate these problems. In this study, we propose a novel and robust sequence based feature extraction method, named Location Based Encoding, to predict pathogen-host interactions with machine learning based algorithms. In this context, we use Bacillus Anthracis and Yersinia Pestis data sets as the pathogen organisms and human proteins as the host model to compare our method with sequence based protein encoding methods, which are widely used in the literature, namely amino acid composition, amino acid pair, and conjoint triad. We use these encoding methods with decision trees (Random Forest, j48), statistical (Bayesian Networks, Naive Bayes), and instance based (kNN) classifiers to predict pathogen-host interactions. We conduct different experiments to evaluate the effectiveness of our method. We obtain the best results among all the experiments with RF classifier in terms of F1, accuracy, MCC, and AUC.
Collapse
|
5
|
Skibinski DOF, Ghiselli F, Diz AP, Milani L, Mullins JGL. Structure-Related Differences between Cytochrome Oxidase I Proteins in a Stable Heteroplasmic Mitochondrial System. Genome Biol Evol 2018; 9:3265-3281. [PMID: 29149282 PMCID: PMC5726481 DOI: 10.1093/gbe/evx235] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/13/2017] [Indexed: 12/27/2022] Open
Abstract
Many bivalve species have two types of mitochondrial DNA passed independently through the female line (F genome) and male line (M genome). Here we study the cytochrome oxidase I protein in such bivalve species and provide evidence for differences between the F and M proteins in amino acid property values, particularly relating to hydrophobicity and helicity. The magnitude of these differences varies between different regions of the protein and the change from the ancestor is most marked in the M protein. The observed changes occur in parallel and in the same direction in the different species studied. Two possible causes are considered, first relaxation of purifying selection with drift and second positive selection. These may operate in different ways in different regions of the protein. Many different amino acid substitutions contribute in a small way to the observed variation, but substitutions involving alanine and serine have a quantitatively large effect. Some of these substitutions are potential targets for phosphorylation and some are close to residues of functional importance in the catalytic mechanism. We propose that the observed changes in the F and M proteins might contribute to functional differences between them relating to ATP production and mitochondrial membrane potential with implications for sperm function.
Collapse
Affiliation(s)
- David O F Skibinski
- Institute of Life Science, Swansea University Medical School, United Kingdom
| | - Fabrizio Ghiselli
- Department of Biological, Geological, and Environmental Sciences, University of Bologna, Italy
| | - Angel P Diz
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Spain
| | - Liliana Milani
- Department of Biological, Geological, and Environmental Sciences, University of Bologna, Italy
| | | |
Collapse
|
6
|
Huang YA, You ZH, Chen X, Chan K, Luo X. Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding. BMC Bioinformatics 2016; 17:184. [PMID: 27112932 PMCID: PMC4845433 DOI: 10.1186/s12859-016-1035-4] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2015] [Accepted: 04/12/2016] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Proteins are the important molecules which participate in virtually every aspect of cellular function within an organism in pairs. Although high-throughput technologies have generated considerable protein-protein interactions (PPIs) data for various species, the processes of experimental methods are both time-consuming and expensive. In addition, they are usually associated with high rates of both false positive and false negative results. Accordingly, a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods information) are not available. Therefore, it is very urgent to develop effective computational methods for prediction of PPIs solely using protein sequence information. RESULTS In this study, we present a novel computational model combining weighted sparse representation based classifier (WSRC) and global encoding (GE) of amino acid sequence. Two kinds of protein descriptors, composition and transition, are extracted for representing each protein sequence. On the basis of such a feature representation, novel weighted sparse representation based classifier is introduced to predict protein interaction class. When the proposed method was evaluated with the PPIs data of S. cerevisiae, Human and H. pylori, it achieved high prediction accuracies of 96.82, 97.66 and 92.83 % respectively. Extensive experiments were performed for cross-species PPIs prediction and the prediction accuracies were also very promising. CONCLUSIONS To further evaluate the performance of the proposed method, we then compared its performance with the method based on support vector machine (SVM). The results show that the proposed method achieved a significant improvement. Thus, the proposed method is a very efficient method to predict PPIs and may be a useful supplementary tool for future proteomics studies.
Collapse
Affiliation(s)
- Yu-An Huang
- />College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060 China
| | - Zhu-Hong You
- />School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221116 China
| | - Xing Chen
- />Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190 China
| | - Keith Chan
- />Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong 999077 China
| | - Xin Luo
- />Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong 999077 China
| |
Collapse
|
7
|
Fan M, Zheng B, Li L. A novel Multi-Agent Ada-Boost algorithm for predicting protein structural class with the information of protein secondary structure. J Bioinform Comput Biol 2015; 13:1550022. [DOI: 10.1142/s0219720015500225] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.
Collapse
Affiliation(s)
- Ming Fan
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Bin Zheng
- Hunan Mechanical and Electrical Polytechnic, Chang Sha 410151, China
| | - Lihua Li
- Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, China
| |
Collapse
|
8
|
DV-curve representation of protein sequences and its application. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2014; 2014:203871. [PMID: 24899916 PMCID: PMC4034481 DOI: 10.1155/2014/203871] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Revised: 03/10/2014] [Accepted: 04/03/2014] [Indexed: 11/17/2022]
Abstract
Based on the detailed hydrophobic-hydrophilic(HP) model of amino acids, we propose dual-vector curve (DV-curve) representation of protein sequences, which uses two vectors to represent one alphabet of protein sequences. This graphical representation not only avoids degeneracy, but also has good visualization no matter how long these sequences are, and can reflect the length of protein sequence. Then we transform the 2D-graphical representation into a numerical characterization that can facilitate quantitative comparison of protein sequences. The utility of this approach is illustrated by two examples: one is similarity/dissimilarity comparison among different ND6 protein sequences based on their DV-curve figures the other is the phylogenetic analysis among coronaviruses based on their spike proteins.
Collapse
|
9
|
Wang J, Li Y, Liu X, Dai Q, Yao Y, He P. High-accuracy prediction of protein structural classes using PseAA structural properties and secondary structural patterns. Biochimie 2014; 101:104-12. [PMID: 24412731 DOI: 10.1016/j.biochi.2013.12.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2013] [Accepted: 12/30/2013] [Indexed: 10/25/2022]
Abstract
Since introduction of PseAAs and functional domains, promising results have been achieved in protein structural class predication, but some challenges still exist in the representation of the PseAA structural correlation and structural domains. This paper proposed a high-accuracy prediction method using novel PseAA structural properties and secondary structural patterns, reflecting the long-range and local structural properties of the PseAAs and certain compact structural domains. The proposed prediction method was tested against the competing prediction methods with four experiments. The experiment results indicate that the proposed method achieved the best performance. Its overall accuracies for datasets 25 PDB, D640, FC699 and 1189 are 88.8%, 90.9%, 96.4% and 87.4%, which are 4.5%, 7.6%, 2% and 3.9% higher than the existing best-performing method. This understanding can be used to guide development of more powerful methods for protein structural class prediction. The software and supplement material are freely available at http://bioinfo.zstu.edu.cn/PseAA-SSP.
Collapse
Affiliation(s)
- Junru Wang
- College of Mechanical Engineering and Automation, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Yan Li
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| | - Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China.
| | - Yuhua Yao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Pingan He
- College of Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| |
Collapse
|
10
|
Xie XL, Zheng LF, Yu Y, Liang LP, Guo MC, Song J, Yuan ZF. Protein sequence analysis based on hydropathy profile of amino acids. J Zhejiang Univ Sci B 2012; 13:152-8. [PMID: 22302429 PMCID: PMC3274743 DOI: 10.1631/jzus.b1100052] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Biology sequence comparison is a fundamental task in computational biology. According to the hydropathy profile of amino acids, a protein sequence is taken as a string with three letters. Three curves of the new protein sequence were defined to describe the protein sequence. A new method to analyze the similarity/dissimilarity of protein sequence was proposed based on the conditional probability of the protein sequence. Finally, the protein sequences of ND6 (NADH dehydrogenase subunit 6) protein of eight species were taken as an example to illustrate the new approach. The results demonstrated that the method is convenient and efficient.
Collapse
Affiliation(s)
- Xiao-li Xie
- College of Sciences, Northwest A&F University, Yangling 712100, China.
| | | | | | | | | | | | | |
Collapse
|
11
|
Liu L, Li D, Bai F. A relative Lempel-Ziv complexity: Application to comparing biological sequences. Chem Phys Lett 2012; 530:107-112. [PMID: 32226089 PMCID: PMC7094452 DOI: 10.1016/j.cplett.2012.01.061] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2011] [Accepted: 01/24/2012] [Indexed: 11/17/2022]
Abstract
One of the main tasks in biological sequence analysis is biological sequence comparison. Numerous efficient methods have been developed for sequence comparison. Traditional sequence comparison is based on sequence alignment. In this report, we propose a novel alignment-free method based on the relative Lempel-Ziv complexity to compare biological sequences. The vertebrate transferring genomes and the spike protein sequences are prepared and tested to evaluate the validity of the method. We use this method to build phylogenetic tree of two groups of the sequences. The result demonstrates that our method is powerful and efficient.
Collapse
Affiliation(s)
- Liwei Liu
- College of Science, Dalian Jiaotong University, Dalian 116028, PR China
| | - Dongbo Li
- Department of Otolaryngology, Affiliated Xinhua Hospital of Dalian University, Dalian 116021, PR China
| | - Fenglan Bai
- College of Science, Dalian Jiaotong University, Dalian 116028, PR China
| |
Collapse
|
12
|
Dai Q, Wu L, Li L. Improving protein structural class prediction using novel combined sequence information and predicted secondary structural features. J Comput Chem 2011; 32:3393-8. [DOI: 10.1002/jcc.21918] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2011] [Revised: 06/29/2011] [Accepted: 07/25/2011] [Indexed: 11/07/2022]
|
13
|
Abstract
Up to now, various approaches for phylogenetic analysis have been developed. Almost all of them put stress on analyzing nucleic acid sequences or protein primary sequences. In this paper, we propose a new sequence distance for efficient reconstruction of phylogenetic trees based on the distribution of length about common sub-sequences between two sequences. We describe some applications of this method, which not only show the validity of the method, but also suggest a number of novel phylogenetic insights.
Collapse
Affiliation(s)
- Guisong Chang
- School of Mathematical Sciences, Dalian University of Technology, 116024 Dalian, People's Republic of China.
| | | |
Collapse
|
14
|
Zhang S, Wang T. A Complexity-based Method to Compare RNA Secondary Structures and its Application. J Biomol Struct Dyn 2010; 28:247-58. [PMID: 20645657 DOI: 10.1080/07391102.2010.10507357] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
15
|
Albayrak A, Otu HH, Sezerman UO. Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets. BMC Bioinformatics 2010; 11:428. [PMID: 20718947 PMCID: PMC2936399 DOI: 10.1186/1471-2105-11-428] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2010] [Accepted: 08/18/2010] [Indexed: 11/30/2022] Open
Abstract
Background Phylogenetic analysis can be used to divide a protein family into subfamilies in the absence of experimental information. Most phylogenetic analysis methods utilize multiple alignment of sequences and are based on an evolutionary model. However, multiple alignment is not an automated procedure and requires human intervention to maintain alignment integrity and to produce phylogenies consistent with the functional splits in underlying sequences. To address this problem, we propose to use the alignment-free Relative Complexity Measure (RCM) combined with reduced amino acid alphabets to cluster protein families into functional subtypes purely on sequence criteria. Comparison with an alignment-based approach was also carried out to test the quality of the clustering. Results We demonstrate the robustness of RCM with reduced alphabets in clustering of protein sequences into families in a simulated dataset and seven well-characterized protein datasets. On protein datasets, crotonases, mandelate racemases, nucleotidyl cyclases and glycoside hydrolase family 2 were clustered into subfamilies with 100% accuracy whereas acyl transferase domains, haloacid dehalogenases, and vicinal oxygen chelates could be assigned to subfamilies with 97.2%, 96.9% and 92.2% accuracies, respectively. Conclusions The overall combination of methods in this paper is useful for clustering protein families into subtypes based on solely protein sequence information. The method is also flexible and computationally fast because it does not require multiple alignment of sequences.
Collapse
Affiliation(s)
- Aydin Albayrak
- Biological Sciences and Bioengineering, Sabanci University, Orhanli, Tuzla, Istanbul, Turkey
| | | | | |
Collapse
|
16
|
Liu L, Wang T. Comparison of TOPS strings based on LZ complexity. J Theor Biol 2008; 251:159-66. [PMID: 18166201 DOI: 10.1016/j.jtbi.2007.11.016] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2007] [Revised: 11/13/2007] [Accepted: 11/13/2007] [Indexed: 10/22/2022]
|