1
|
Proteomic Tools for the Analysis of Cytoskeleton Proteins. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2364:363-425. [PMID: 34542864 DOI: 10.1007/978-1-0716-1661-1_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Proteomic analyses have become an essential part of the toolkit of the molecular biologist, given the widespread availability of genomic data and open source or freely accessible bioinformatics software. Tools are available for detecting homologous sequences, recognizing functional domains, and modeling the three-dimensional structure for any given protein sequence, as well as for predicting interactions with other proteins or macromolecules. Although a wealth of structural and functional information is available for many cytoskeletal proteins, with representatives spanning all of the major subfamilies, the majority of cytoskeletal proteins remain partially or totally uncharacterized. Moreover, bioinformatics tools provide a means for studying the effects of synthetic mutations or naturally occurring variants of these cytoskeletal proteins. This chapter discusses various freely available proteomic analysis tools, with a focus on in silico prediction of protein structure and function. The selected tools are notable for providing an easily accessible interface for the novice while retaining advanced functionality for more experienced computational biologists.
Collapse
|
2
|
Gabler F, Nam S, Till S, Mirdita M, Steinegger M, Söding J, Lupas AN, Alva V. Protein Sequence Analysis Using the MPI Bioinformatics Toolkit. ACTA ACUST UNITED AC 2020; 72:e108. [DOI: 10.1002/cpbi.108] [Citation(s) in RCA: 189] [Impact Index Per Article: 47.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Felix Gabler
- Department of Protein Evolution Max Planck Institute for Developmental Biology Tübingen Germany
| | - Seung‐Zin Nam
- Department of Protein Evolution Max Planck Institute for Developmental Biology Tübingen Germany
| | - Sebastian Till
- Department of Protein Evolution Max Planck Institute for Developmental Biology Tübingen Germany
| | - Milot Mirdita
- Quantitative Biology and Bioinformatics Max Planck Institute for Biophysical Chemistry Göttingen Germany
| | - Martin Steinegger
- Quantitative Biology and Bioinformatics Max Planck Institute for Biophysical Chemistry Göttingen Germany
- Present address: Department of Biology Seoul National University Seoul South Korea
| | - Johannes Söding
- Quantitative Biology and Bioinformatics Max Planck Institute for Biophysical Chemistry Göttingen Germany
| | - Andrei N. Lupas
- Department of Protein Evolution Max Planck Institute for Developmental Biology Tübingen Germany
| | - Vikram Alva
- Department of Protein Evolution Max Planck Institute for Developmental Biology Tübingen Germany
| |
Collapse
|
3
|
Link AJ, Niu X, Weaver CM, Jennings JL, Duncan DT, McAfee KJ, Sammons M, Gerbasi VR, Farley AR, Fleischer TC, Browne CM, Samir P, Galassie A, Boone B. Targeted Identification of Protein Interactions in Eukaryotic mRNA Translation. Proteomics 2020; 20:e1900177. [PMID: 32027465 DOI: 10.1002/pmic.201900177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 12/13/2019] [Indexed: 11/09/2022]
Abstract
To identify protein-protein interactions and phosphorylated amino acid sites in eukaryotic mRNA translation, replicate TAP-MudPIT and control experiments are performed targeting Saccharomyces cerevisiae genes previously implicated in eukaryotic mRNA translation by their genetic and/or functional roles in translation initiation, elongation, termination, or interactions with ribosomal complexes. Replicate tandem affinity purifications of each targeted yeast TAP-tagged mRNA translation protein coupled with multidimensional liquid chromatography and tandem mass spectrometry analysis are used to identify and quantify copurifying proteins. To improve sensitivity and minimize spurious, nonspecific interactions, a novel cross-validation approach is employed to identify the most statistically significant protein-protein interactions. Using experimental and computational strategies discussed herein, the previously described protein composition of the canonical eukaryotic mRNA translation initiation, elongation, and termination complexes is calculated. In addition, statistically significant unpublished protein interactions and phosphorylation sites for S. cerevisiae's mRNA translation proteins and complexes are identified.
Collapse
Affiliation(s)
- Andrew J Link
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA.,Department of Biochemistry, Vanderbilt University, Nashville, TN, 37232, USA.,Department of Chemistry, Vanderbilt University, Nashville, TN, 37232, USA
| | - Xinnan Niu
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | - Connie M Weaver
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | - Jennifer L Jennings
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | - Dexter T Duncan
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | - K Jill McAfee
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | - Morgan Sammons
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, 37232, USA
| | - Vince R Gerbasi
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | - Adam R Farley
- Department of Biochemistry, Vanderbilt University, Nashville, TN, 37232, USA
| | - Tracey C Fleischer
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | | | - Parimal Samir
- Department of Biochemistry, Vanderbilt University, Nashville, TN, 37232, USA
| | - Allison Galassie
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37232, USA
| | - Braden Boone
- Department of Bioinformatics, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| |
Collapse
|
4
|
Bruno A, Costantino G, Sartori L, Radi M. The In Silico Drug Discovery Toolbox: Applications in Lead Discovery and Optimization. Curr Med Chem 2019; 26:3838-3873. [PMID: 29110597 DOI: 10.2174/0929867324666171107101035] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Revised: 09/27/2017] [Accepted: 09/28/2017] [Indexed: 01/04/2023]
Abstract
BACKGROUND Discovery and development of a new drug is a long lasting and expensive journey that takes around 20 years from starting idea to approval and marketing of new medication. Despite R&D expenditures have been constantly increasing in the last few years, the number of new drugs introduced into market has been steadily declining. This is mainly due to preclinical and clinical safety issues, which still represent about 40% of drug discontinuation. To cope with this issue, a number of in silico techniques are currently being used for an early stage evaluation/prediction of potential safety issues, allowing to increase the drug-discovery success rate and reduce costs associated with the development of a new drug. METHODS In the present review, we will analyse the early steps of the drug-discovery pipeline, describing the sequence of steps from disease selection to lead optimization and focusing on the most common in silico tools used to assess attrition risks and build a mitigation plan. RESULTS A comprehensive list of widely used in silico tools, databases, and public initiatives that can be effectively implemented and used in the drug discovery pipeline has been provided. A few examples of how these tools can be problem-solving and how they may increase the success rate of a drug discovery and development program have been also provided. Finally, selected examples where the application of in silico tools had effectively contributed to the development of marketed drugs or clinical candidates will be given. CONCLUSION The in silico toolbox finds great application in every step of early drug discovery: (i) target identification and validation; (ii) hit identification; (iii) hit-to-lead; and (iv) lead optimization. Each of these steps has been described in details, providing a useful overview on the role played by in silico tools in the decision-making process to speed-up the discovery of new drugs.
Collapse
Affiliation(s)
- Agostino Bruno
- Experimental Therapeutics Unit, IFOM - The FIRC Institute for Molecular Oncology Foundation, Via Adamello 16 - 20139 Milano, Italy
| | - Gabriele Costantino
- Dipartimento di Scienze degli Alimenti e del Farmaco, Universita degli Studi di Parma, Viale delle Scienze, 27/A, 43124 Parma, Italy
| | - Luca Sartori
- Experimental Therapeutics Unit, IFOM - The FIRC Institute for Molecular Oncology Foundation, Via Adamello 16 - 20139 Milano, Italy
| | - Marco Radi
- Dipartimento di Scienze degli Alimenti e del Farmaco, Universita degli Studi di Parma, Viale delle Scienze, 27/A, 43124 Parma, Italy
| |
Collapse
|
5
|
Liu B, Chen J, Guo M, Wang X. Protein Remote Homology Detection and Fold Recognition Based on Sequence-Order Frequency Matrix. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:292-300. [PMID: 29990004 DOI: 10.1109/tcbb.2017.2765331] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Protein remote homology detection and fold recognition are two critical tasks for the studies of protein structures and functions. Currently, the profile-based methods achieve the state-of-the-art performance in these fields. However, the widely used sequence profiles, like position-specific frequency matrix (PSFM) and position-specific scoring matrix (PSSM), ignore the sequence-order effects along protein sequence. In this study, we have proposed a novel profile, called sequence-order frequency matrix (SOFM), to extract the sequence-order information of neighboring residues from multiple sequence alignment (MSA). Combined with two profile feature extraction approaches, top-n-grams and the Smith-Waterman algorithm, the SOFMs are applied to protein remote homology detection and fold recognition, and two predictors called SOFM-Top and SOFM-SW are proposed. Experimental results show that SOFM contains more information content than other profiles, and these two predictors outperform other state-of-the-art methods. It is anticipated that SOFM will become a very useful profile in the studies of protein structures and functions.
Collapse
|
6
|
Abstract
Recent technological advances in sequencing and high-throughput DNA cloning have resulted in the generation of vast quantities of biological sequence data. Ideally the functions of individual genes and proteins predicted by these methods should be assessed experimentally within the context of a defined hypothesis. However, if no hypothesis is known a priori, or the number of sequences to be assessed is large, bioinformatics techniques may be useful in predicting function.This chapter proposes a pipeline of freely available Web-based tools to analyze protein-coding DNA and peptide sequences of unknown function. Accumulated information obtained during each step of the pipeline is used to build a testable hypothesis of function.The following methods are described in detail: 1. Annotation of gene function through Protein domain detection (SMART and Pfam). 2. Sequence similarity methods for homolog detection (BLAST and DELTA-BLAST). 3. Comparing sequences to whole genome data.
Collapse
Affiliation(s)
- Tom C Giles
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, Leicestershire, LE12 5RD, UK
- Advanced Data Analysis Centre, University of Nottingham, Leicestershire, LE12 5RD, UK
| | - Richard D Emes
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, Leicestershire, LE12 5RD, UK.
- Advanced Data Analysis Centre, University of Nottingham, Leicestershire, LE12 5RD, UK.
| |
Collapse
|
7
|
Chen J, Guo M, Wang X, Liu B. A comprehensive review and comparison of different computational methods for protein remote homology detection. Brief Bioinform 2016; 19:231-244. [DOI: 10.1093/bib/bbw108] [Citation(s) in RCA: 81] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Indexed: 01/02/2023] Open
Affiliation(s)
- Junjie Chen
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Mingyue Guo
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| |
Collapse
|
8
|
Scarpati M, Heavner ME, Wiech E, Singh S. Proteomic Tools for the Analysis of Cytoskeleton Proteins. Methods Mol Biol 2016; 1365:385-413. [PMID: 26498799 DOI: 10.1007/978-1-4939-3124-8_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Proteomic analyses have become an essential part of the toolkit of the molecular biologist, given the widespread availability of genomic data and open source or freely accessible bioinformatics software. Tools are available for detecting homologous sequences, recognizing functional domains, and modeling the three-dimensional structure for any given protein sequence. Although a wealth of structural and functional information is available for a large number of cytoskeletal proteins, with representatives spanning all of the major subfamilies, the majority of cytoskeletal proteins remain partially or totally uncharacterized. Moreover, bioinformatics tools provide a means for studying the effects of synthetic mutations or naturally occurring variants of these cytoskeletal proteins. This chapter discusses various freely available proteomic analysis tools, with a focus on in silico prediction of protein structure and function. The selected tools are notable for providing an easily accessible interface for the novice, while retaining advanced functionality for more experienced computational biologists.
Collapse
Affiliation(s)
- Michael Scarpati
- Biology Program, The Graduate Center, City University of New York, New York, NY, USA
| | - Mary Ellen Heavner
- Biochemistry Program, The Graduate Center, City University of New York, New York, NY, USA
| | - Eliza Wiech
- Biology Program, The Graduate Center, City University of New York, New York, NY, USA
| | - Shaneen Singh
- Biochemistry Program, The Graduate Center, City University of New York, New York, NY, USA.
- Department of Biology, Brooklyn College, City University of New York, 209 Ingersoll Hall Extension, 2900 Bedford Ave., Brooklyn, NY, 11210, USA.
- Biology Program, The Graduate Center, City University of New York, New York, NY, USA.
| |
Collapse
|
9
|
Liu B, Chen J, Wang X. Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis. Mol Genet Genomics 2015; 290:1919-31. [DOI: 10.1007/s00438-015-1044-4] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2015] [Accepted: 04/06/2015] [Indexed: 02/07/2023]
|
10
|
Joseph AP, de Brevern AG. From local structure to a global framework: recognition of protein folds. J R Soc Interface 2014; 11:20131147. [PMID: 24740960 DOI: 10.1098/rsif.2013.1147] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Protein folding has been a major area of research for many years. Nonetheless, the mechanisms leading to the formation of an active biological fold are still not fully apprehended. The huge amount of available sequence and structural information provides hints to identify the putative fold for a given sequence. Indeed, protein structures prefer a limited number of local backbone conformations, some being characterized by preferences for certain amino acids. These preferences largely depend on the local structural environment. The prediction of local backbone conformations has become an important factor to correctly identifying the global protein fold. Here, we review the developments in the field of local structure prediction and especially their implication in protein fold recognition.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Oxford, , Didcot OX11 0QX, UK
| | | |
Collapse
|
11
|
Liu B, Xu J, Zou Q, Xu R, Wang X, Chen Q. Using distances between Top-n-gram and residue pairs for protein remote homology detection. BMC Bioinformatics 2014; 15 Suppl 2:S3. [PMID: 24564580 PMCID: PMC4015815 DOI: 10.1186/1471-2105-15-s2-s3] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Background Protein remote homology detection is one of the central problems in bioinformatics, which is important for both basic research and practical application. Currently, discriminative methods based on Support Vector Machines (SVMs) achieve the state-of-the-art performance. Exploring feature vectors incorporating the position information of amino acids or other protein building blocks is a key step to improve the performance of the SVM-based methods. Results Two new methods for protein remote homology detection were proposed, called SVM-DR and SVM-DT. SVM-DR is a sequence-based method, in which the feature vector representation for protein is based on the distances between residue pairs. SVM-DT is a profile-based method, which considers the distances between Top-n-gram pairs. Top-n-gram can be viewed as a profile-based building block of proteins, which is calculated from the frequency profiles. These two methods are position dependent approaches incorporating the sequence-order information of protein sequences. Various experiments were conducted on a benchmark dataset containing 54 families and 23 superfamilies. Experimental results showed that these two new methods are very promising. Compared with the position independent methods, the performance improvement is obvious. Furthermore, the proposed methods can also provide useful insights for studying the features of protein families. Conclusion The better performance of the proposed methods demonstrates that the position dependant approaches are efficient for protein remote homology detection. Another advantage of our methods arises from the explicit feature space representation, which can be used to analyze the characteristic features of protein families. The source code of SVM-DT and SVM-DR is available at http://bioinformatics.hitsz.edu.cn/DistanceSVM/index.jsp
Collapse
|
12
|
Liu B, Zhang D, Xu R, Xu J, Wang X, Chen Q, Dong Q, Chou KC. Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. ACTA ACUST UNITED AC 2013; 30:472-9. [PMID: 24318998 PMCID: PMC7537947 DOI: 10.1093/bioinformatics/btt709] [Citation(s) in RCA: 250] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Motivation: Owing to its importance in both basic research (such as molecular evolution and protein attribute prediction) and practical application (such as timely modeling the 3D structures of proteins targeted for drug development), protein remote homology detection has attracted a great deal of interest. It is intriguing to note that the profile-based approach is promising and holds high potential in this regard. To further improve protein remote homology detection, a key step is how to find an optimal means to extract the evolutionary information into the profiles. Results: Here, we propose a novel approach, the so-called profile-based protein representation, to extract the evolutionary information via the frequency profiles. The latter can be calculated from the multiple sequence alignments generated by PSI-BLAST. Three top performing sequence-based kernels (SVM-Ngram, SVM-pairwise and SVM-LA) were combined with the profile-based protein representation. Various tests were conducted on a SCOP benchmark dataset that contains 54 families and 23 superfamilies. The results showed that the new approach is promising, and can obviously improve the performance of the three kernels. Furthermore, our approach can also provide useful insights for studying the features of proteins in various families. It has not escaped our notice that the current approach can be easily combined with the existing sequence-based methods so as to improve their performance as well. Availability and implementation: For users’ convenience, the source code of generating the profile-based proteins and the multiple kernel learning was also provided at http://bioinformatics.hitsz.edu.cn/main/∼binliu/remote/ Contact:bliu@insun.hit.edu.cn or bliu@gordonlifescience.org Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Shanghai Key Laboratory of Intelligent Information Processing, Shanghai 200433, China, Gordon Life Science Institute, Belmont, MA 02478, USA, School of Computer, Shenyang Aerospace University, Shenyang, Liaoning, China, School of Computer Science, Fudan University, Shanghai 200433, China and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | | | | | | | | | | | | | | |
Collapse
|
13
|
Mistry J, Finn RD, Eddy SR, Bateman A, Punta M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res 2013; 41:e121. [PMID: 23598997 PMCID: PMC3695513 DOI: 10.1093/nar/gkt263] [Citation(s) in RCA: 956] [Impact Index Per Article: 86.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Detection of protein homology via sequence similarity has important applications in biology, from protein structure and function prediction to reconstruction of phylogenies. Although current methods for aligning protein sequences are powerful, challenges remain, including problems with homologous overextension of alignments and with regions under convergent evolution. Here, we test the ability of the profile hidden Markov model method HMMER3 to correctly assign homologous sequences to >13,000 manually curated families from the Pfam database. We identify problem families using protein regions that match two or more Pfam families not currently annotated as related in Pfam. We find that HMMER3 E-value estimates seem to be less accurate for families that feature periodic patterns of compositional bias, such as the ones typically observed in coiled-coils. These results support the continued use of manually curated inclusion thresholds in the Pfam database, especially on the subset of families that have been identified as problematic in experiments such as these. They also highlight the need for developing new methods that can correct for this particular type of compositional bias.
Collapse
Affiliation(s)
- Jaina Mistry
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | | | | | | | | |
Collapse
|
14
|
Gullotto D, Nolassi MS, Bernini A, Spiga O, Niccolai N. Probing the protein space for extending the detection of weak homology folds. J Theor Biol 2013; 320:152-8. [DOI: 10.1016/j.jtbi.2012.12.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Revised: 11/03/2012] [Accepted: 12/05/2012] [Indexed: 12/19/2022]
|
15
|
Liu B, Wang X, Chen Q, Dong Q, Lan X. Using amino acid physicochemical distance transformation for fast protein remote homology detection. PLoS One 2012; 7:e46633. [PMID: 23029559 PMCID: PMC3460876 DOI: 10.1371/journal.pone.0046633] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Accepted: 09/03/2012] [Indexed: 11/18/2022] Open
Abstract
Protein remote homology detection is one of the most important problems in bioinformatics. Discriminative methods such as support vector machines (SVM) have shown superior performance. However, the performance of SVM-based methods depends on the vector representations of the protein sequences. Prior works have demonstrated that sequence-order effects are relevant for discrimination, but little work has explored how to incorporate the sequence-order information along with the amino acid physicochemical properties into the prediction. In order to incorporate the sequence-order effects into the protein remote homology detection, the physicochemical distance transformation (PDT) method is proposed. Each protein sequence is converted into a series of numbers by using the physicochemical property scores in the amino acid index (AAIndex), and then the sequence is converted into a fixed length vector by PDT. The sequence-order information can be efficiently included into the feature vector with little computational cost by this approach. Finally, the feature vectors are input into a support vector machine classifier to detect the protein remote homologies. Our experiments on a well-known benchmark show the proposed method SVM-PDT achieves superior or comparable performance with current state-of-the-art methods and its computational cost is considerably superior to those of other methods. When the evolutionary information extracted from the frequency profiles is combined with the PDT method, the profile-based PDT approach can improve the performance by 3.4% and 11.4% in terms of ROC score and ROC50 score respectively. The local sequence-order information of the protein can be efficiently captured by the proposed PDT and the physicochemical properties extracted from the amino acid index are incorporated into the prediction. The physicochemical distance transformation provides a general framework, which would be a valuable tool for protein-level study.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, People's Republic of China.
| | | | | | | | | |
Collapse
|
16
|
Arrigoni A, Grillo B, Vitriolo A, De Gioia L, Papaleo E. C-terminal acidic domain of ubiquitin-conjugating enzymes: A multi-functional conserved intrinsically disordered domain in family 3 of E2 enzymes. J Struct Biol 2012; 178:245-59. [DOI: 10.1016/j.jsb.2012.04.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2011] [Revised: 04/01/2012] [Accepted: 04/03/2012] [Indexed: 11/30/2022]
|
17
|
Abstract
The wealth of available protein structural data provides unprecedented opportunity to study and better understand the underlying principles of protein folding and protein structure evolution. A key to achieving this lies in the ability to analyse these data and to organize them in a coherent classification scheme. Over the past years several protein classifications have been developed that aim to group proteins based on their structural relationships. Some of these classification schemes explore the concept of structural neighbourhood (structural continuum), whereas other utilize the notion of protein evolution and thus provide a discrete rather than continuum view of protein structure space. This chapter presents a strategy for classification of proteins with known three-dimensional structure. Steps in the classification process along with basic definitions are introduced. Examples illustrating some fundamental concepts of protein folding and evolution with a special focus on the exceptions to them are presented.
Collapse
|
18
|
Jaroszewski L, Li Z, Cai XH, Weber C, Godzik A. FFAS server: novel features and applications. Nucleic Acids Res 2011; 39:W38-44. [PMID: 21715387 PMCID: PMC3125803 DOI: 10.1093/nar/gkr441] [Citation(s) in RCA: 120] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
The Fold and Function Assignment System (FFAS) server [Jaroszewski et al. (2005) FFAS03: a server for profile–profile sequence alignments. Nucleic Acids Research, 33, W284–W288] implements the algorithm for protein profile–profile alignment introduced originally in [Rychlewski et al. (2000) Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Science: a Publication of the Protein Society, 9, 232–241]. Here, we present updates, changes and novel functionality added to the server since 2005 and discuss its new applications. The sequence database used to calculate sequence profiles was enriched by adding sets of publicly available metagenomic sequences. The profile of a user’s protein can now be compared with ∼20 additional profile databases, including several complete proteomes, human proteins involved in genetic diseases and a database of microbial virulence factors. A newly developed interface uses a system of tabs, allowing the user to navigate multiple results pages, and also includes novel functionality, such as a dotplot graph viewer, modeling tools, an improved 3D alignment viewer and links to the database of structural similarities. The FFAS server was also optimized for speed: running times were reduced by an order of magnitude. The FFAS server, http://ffas.godziklab.org, has no log-in requirement, albeit there is an option to register and store results in individual, password-protected directories. Source code and Linux executables for the FFAS program are available for download from the FFAS server.
Collapse
Affiliation(s)
- Lukasz Jaroszewski
- Bioinformatics and Systems Biology Program, Sanford Burnham Medical Research Institute, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA
| | | | | | | | | |
Collapse
|
19
|
Tarrío R, Ayala FJ, Rodríguez-Trelles F. The Vein Patterning 1 (VEP1) gene family laterally spread through an ecological network. PLoS One 2011; 6:e22279. [PMID: 21818306 PMCID: PMC3144213 DOI: 10.1371/journal.pone.0022279] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2011] [Accepted: 06/18/2011] [Indexed: 11/23/2022] Open
Abstract
Lateral gene transfer (LGT) is a major evolutionary mechanism in prokaryotes. Knowledge about LGT— particularly, multicellular— eukaryotes has only recently started to accumulate. A widespread assumption sees the gene as the unit of LGT, largely because little is yet known about how LGT chances are affected by structural/functional features at the subgenic level. Here we trace the evolutionary trajectory of VEin Patterning 1, a novel gene family known to be essential for plant development and defense. At the subgenic level VEP1 encodes a dinucleotide-binding Rossmann-fold domain, in common with members of the short-chain dehydrogenase/reductase (SDR) protein family. We found: i) VEP1 likely originated in an aerobic, mesophilic and chemoorganotrophic α-proteobacterium, and was laterally propagated through nets of ecological interactions, including multiple LGTs between phylogenetically distant green plant/fungi-associated bacteria, and five independent LGTs to eukaryotes. Of these latest five transfers, three are ancient LGTs, implicating an ancestral fungus, the last common ancestor of land plants and an ancestral trebouxiophyte green alga, and two are recent LGTs to modern embryophytes. ii) VEP1's rampant LGT behavior was enabled by the robustness and broad utility of the dinucleotide-binding Rossmann-fold, which provided a platform for the evolution of two unprecedented departures from the canonical SDR catalytic triad. iii) The fate of VEP1 in eukaryotes has been different in different lineages, being ubiquitous and highly conserved in land plants, whereas fungi underwent multiple losses. And iv) VEP1-harboring bacteria include non-phytopathogenic and phytopathogenic symbionts which are non-randomly distributed with respect to the type of harbored VEP1 gene. Our findings suggest that VEP1 may have been instrumental for the evolutionary transition of green plants to land, and point to a LGT-mediated ‘Trojan Horse’ mechanism for the evolution of bacterial pathogenesis against plants. VEP1 may serve as tool for revealing microbial interactions in plant/fungi-associated environments.
Collapse
Affiliation(s)
- Rosa Tarrío
- Universidad de Santiago de Compostela, CIBERER, Genome Medicine Group, Santiago de Compostela, Spain
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
| | - Francisco J. Ayala
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
| | - Francisco Rodríguez-Trelles
- Grup de Biologia Evolutiva, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Barcelona, Spain
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
- * E-mail:
| |
Collapse
|
20
|
Söding J, Remmert M. Protein sequence comparison and fold recognition: progress and good-practice benchmarking. Curr Opin Struct Biol 2011; 21:404-11. [PMID: 21458982 DOI: 10.1016/j.sbi.2011.03.005] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2011] [Revised: 03/01/2011] [Accepted: 03/09/2011] [Indexed: 11/26/2022]
Abstract
Protein sequence comparison methods have grown increasingly sensitive during the last decade and can often identify distantly related proteins sharing a common ancestor some 3 billion years ago. Although cellular function is not conserved so long, molecular functions and structures of protein domains often are. In combination with a domain-centered approach to function and structure prediction, modern remote homology detection methods have a great and largely underexploited potential for elucidating protein functions and evolution. Advances during the last few years include nonlinear scoring functions combining various sequence features, the use of sequence context information, and powerful new software packages. Since progress depends on realistically assessing new and existing methods and published benchmarks are often hard to compare, we propose 10 rules of good-practice benchmarking.
Collapse
Affiliation(s)
- Johannes Söding
- Gene Center and Center for Integrated Protein Science, Ludwig-Maximilians-Universität München, Feodor-Lynen-Strasse 25, Munich, Germany.
| | | |
Collapse
|
21
|
Martin T, Lu SW, van Tilbeurgh H, Ripoll DR, Dixelius C, Turgeon BG, Debuchy R. Tracing the origin of the fungal α1 domain places its ancestor in the HMG-box superfamily: implication for fungal mating-type evolution. PLoS One 2010; 5:e15199. [PMID: 21170349 PMCID: PMC2999568 DOI: 10.1371/journal.pone.0015199] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2010] [Accepted: 10/29/2010] [Indexed: 11/19/2022] Open
Abstract
Background Fungal mating types in self-incompatible Pezizomycotina are specified by one of two alternate sequences occupying the same locus on corresponding chromosomes. One sequence is characterized by a gene encoding an HMG protein, while the hallmark of the other is a gene encoding a protein with an α1 domain showing similarity to the Matα1p protein of Saccharomyces cerevisiae. DNA-binding HMG proteins are ubiquitous and well characterized. In contrast, α1 domain proteins have limited distribution and their evolutionary origin is obscure, precluding a complete understanding of mating-type evolution in Ascomycota. Although much work has focused on the role of the S. cerevisiae Matα1p protein as a transcription factor, it has not yet been placed in any of the large families of sequence-specific DNA-binding proteins. Methodology/Principal Findings We present sequence comparisons, phylogenetic analyses, and in silico predictions of secondary and tertiary structures, which support our hypothesis that the α1 domain is related to the HMG domain. We have also characterized a new conserved motif in α1 proteins of Pezizomycotina. This motif is immediately adjacent to and downstream of the α1 domain and consists of a core sequence Y-[LMIF]-x(3)-G-[WL] embedded in a larger conserved motif. Conclusions/Significance Our data suggest that extant α1-box genes originated from an ancestral HMG gene, which confirms the current model of mating-type evolution within the fungal kingdom. We propose to incorporate α1 proteins in a new subclass of HMG proteins termed MATα_HMG.
Collapse
Affiliation(s)
- Tom Martin
- Department of Plant Biology and Forest Genetics, Uppsala Biocenter, Swedish University of Agricultural Sciences (SLU), Uppsala, Sweden
| | - Shun-Wen Lu
- Department of Plant Pathology and Plant-Microbe Biology, Cornell University, Ithaca, New York, United States of America
| | - Herman van Tilbeurgh
- Univ Paris-Sud, Institut de Biochimie et de Biophysique Moléculaire et Cellulaire, UMR8619 Univ Paris-Sud CNRS, Orsay, France
| | - Daniel R. Ripoll
- Department of Plant Pathology and Plant-Microbe Biology, Cornell University, Ithaca, New York, United States of America
| | - Christina Dixelius
- Department of Plant Biology and Forest Genetics, Uppsala Biocenter, Swedish University of Agricultural Sciences (SLU), Uppsala, Sweden
| | - B. Gillian Turgeon
- Department of Plant Pathology and Plant-Microbe Biology, Cornell University, Ithaca, New York, United States of America
| | - Robert Debuchy
- Univ Paris-Sud, Institut de Génétique et Microbiologie, UMR8621 Univ Paris-Sud CNRS, Orsay, France
- CNRS, Institut de Génétique et Microbiologie, UMR8621 Univ Paris-Sud CNRS, Orsay, France
- * E-mail:
| |
Collapse
|
22
|
Considering scores between unrelated proteins in the search database improves profile comparison. BMC Bioinformatics 2009; 10:399. [PMID: 19961610 PMCID: PMC3087343 DOI: 10.1186/1471-2105-10-399] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2009] [Accepted: 12/04/2009] [Indexed: 12/02/2022] Open
Abstract
Background Profile-based comparison of multiple sequence alignments is a powerful methodology for the detection remote protein sequence similarity, which is essential for the inference and analysis of protein structure, function, and evolution. Accurate estimation of statistical significance of detected profile similarities is essential for further development of this methodology. Here we analyze a novel approach to estimate the statistical significance of profile similarity: the explicit consideration of background score distributions for each database template (subject). Results Using a simple scheme to combine and analytically approximate query- and subject-based distributions, we show that (i) inclusion of background distributions for the subjects increases the quality of homology detection; (ii) this increase is higher when the distributions are based on the scores to all known non-homologs of the subject rather than a small calibration subset of the database representatives; and (iii) these all known non-homolog distributions of scores for the subject make the dominant contribution to the improved performance: adding the calibration distribution of the query has a negligible additional effect. Conclusion The construction of distributions based on the complete sets of non-homologs for each subject is particularly relevant in the setting of structure prediction where the database consists of proteins with solved 3D structure (PDB, SCOP, CATH, etc.) and therefore structural relationships between proteins are known. These results point to a potential new direction in the development of more powerful methods for remote homology detection.
Collapse
|