1
|
Manapkyzy D, Joldybayeva B, Ishchenko AA, Matkarimov BT, Zharkov DO, Taipakova S, Saparbaev MK. Enhanced thermal stability enables human mismatch-specific thymine-DNA glycosylase to catalyse futile DNA repair. PLoS One 2024; 19:e0304818. [PMID: 39423202 PMCID: PMC11488719 DOI: 10.1371/journal.pone.0304818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 08/19/2024] [Indexed: 10/21/2024] Open
Abstract
Human thymine-DNA glycosylase (TDG) excises T mispaired with G in a CpG context to initiate the base excision repair (BER) pathway. TDG is also involved in epigenetic regulation of gene expression by participating in active DNA demethylation. Here we demonstrate that under extended incubation time the full-length TDG (TDGFL), but neither its isolated catalytic domain (TDGcat) nor methyl-CpG binding domain-containing protein 4 (MBD4) DNA glycosylase, exhibits significant excision activity towards T and C in regular non-damaged DNA duplex in TpG/CpA and CpG/CpG contexts. Time course of the cleavage product accumulation under single-turnover conditions shows that the apparent rate constant for TDGFL-catalysed excision of T from T•A base pairs (0.0014-0.0069 min-1) is 85-330-fold lower than for the excision of T from T•G mispairs (0.47-0.61 min-1). Unexpectedly, TDGFL, but not TDGcat, exhibits prolonged enzyme survival at 37°C when incubated in the presence of equimolar concentrations of a non-specific DNA duplex, suggesting that the disordered N- and C-terminal domains of TDG can interact with DNA and stabilize the overall conformation of the protein. Notably, TDGFL was able to excise 5-hydroxymethylcytosine (5hmC), but not 5-methylcytosine residues from duplex DNA with the efficiency that could be physiologically relevant in post-mitotic cells. Our findings demonstrate that, under the experimental conditions used, TDG catalyses sequence context-dependent removal of T, C and 5hmC residues from regular DNA duplexes. We propose that in vivo the TDG-initiated futile DNA BER may lead to formation of persistent single-strand breaks in non-methylated or hydroxymethylated chromatin regions.
Collapse
Affiliation(s)
- Diana Manapkyzy
- Department of Molecular Biology and Genetics, Faculty of Biology and Biotechnology, al-Farabi Kazakh National University, Almaty, Kazakhstan
- Scientific Research Institute of Biology and Biotechnology Problems, al-Farabi Kazakh National University, Almaty, Kazakhstan
| | - Botagoz Joldybayeva
- Department of Molecular Biology and Genetics, Faculty of Biology and Biotechnology, al-Farabi Kazakh National University, Almaty, Kazakhstan
- Scientific Research Institute of Biology and Biotechnology Problems, al-Farabi Kazakh National University, Almaty, Kazakhstan
| | - Alexander A. Ishchenko
- Group «Mechanisms of DNA Repair and Carcinogenesis», CNRS UMR9019, Université Paris-Saclay, Gustave Roussy Cancer Campus, Villejuif Cedex, France
| | | | - Dmitry O. Zharkov
- SB RAS Institute of Chemical Biology and Fundamental Medicine, Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Sabira Taipakova
- Department of Molecular Biology and Genetics, Faculty of Biology and Biotechnology, al-Farabi Kazakh National University, Almaty, Kazakhstan
- Scientific Research Institute of Biology and Biotechnology Problems, al-Farabi Kazakh National University, Almaty, Kazakhstan
- National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan
| | - Murat K. Saparbaev
- Group «Mechanisms of DNA Repair and Carcinogenesis», CNRS UMR9019, Université Paris-Saclay, Gustave Roussy Cancer Campus, Villejuif Cedex, France
| |
Collapse
|
2
|
Hussain W. sAMP-PFPDeep: Improving accuracy of short antimicrobial peptides prediction using three different sequence encodings and deep neural networks. Brief Bioinform 2021; 23:6445107. [PMID: 34849586 DOI: 10.1093/bib/bbab487] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 10/06/2021] [Accepted: 10/23/2021] [Indexed: 12/15/2022] Open
Abstract
Short antimicrobial peptides (sAMPs) belong to a significant repertoire of antimicrobial agents and are known to possess enhanced antimicrobial activity, higher stability and less toxicity to human cells, as well as less complex than other large biological drugs. As these molecules are significantly important, herein, a prediction method for sAMPs (with a sequence length ≤ 30 residues) is proposed for accurate and efficient prediction of sAMPs instead of laborious and costly experimental approaches. Benchmark dataset was collected from a recently reported study and sequences were converted into three channel images comprising information related to the position, frequency and sum of 12 physiochemical features as the first, second and third channels, respectively. Two image-based deep neural networks (DNNs), i.e. RESNET-50 and VGG-16 were trained and evaluated using various metrics while a comparative analysis with previous techniques was also performed. Validation of sAMP-PFPDeep was also performed by using molecular docking based analysis. The results showed that VGG-16 provided more accurate results, i.e. 98.30% training accuracy and 87.37% testing accuracy for predicting sAMPs as compared to those of RESNET-50 having 96.14% training accuracy and 83.87% testing accuracy. However, the comparative analysis revealed that both these models outperformed previously reported state-of-the-art methods. Based on the results, it is concluded that sAMP-PFPDeep can help identify antimicrobial peptides with promising accuracy and efficiency. It can help biologists and scientists to identify antimicrobial peptides, by further aiding the computer-aided drug design and discovery, as well as virtual screening protocols against various pathologies. sAMP-PFPDeep is available at (https://github.com/WaqarHusain/sAMP-PFPDeep).
Collapse
Affiliation(s)
- Waqar Hussain
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore-54770, Pakistan
| |
Collapse
|
3
|
Jernigan R, Jia K, Ren Z, Zhou W. Large-scale multiple inference of collective dependence with applications to protein function. Ann Appl Stat 2021; 15:902-924. [DOI: 10.1214/20-aoas1431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Robert Jernigan
- Department of Biochemistry, Biophysics, and Molecular Biology, Program of Bioinformatics and Computational Biology, Iowa State University
| | - Kejue Jia
- Department of Biochemistry, Biophysics, and Molecular Biology, Program of Bioinformatics and Computational Biology, Iowa State University
| | - Zhao Ren
- Department of Statistics, University of Pittsburgh
| | - Wen Zhou
- Department of Statistics, Colorado State University
| |
Collapse
|
4
|
Mosa AI, Urbanowicz RA, AbouHaidar MG, Tavis JE, Ball JK, Feld JJ. A bivalent HCV peptide vaccine elicits pan-genotypic neutralizing antibodies in mice. Vaccine 2020; 38:6864-6867. [PMID: 32900542 DOI: 10.1016/j.vaccine.2020.08.066] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 08/19/2020] [Accepted: 08/25/2020] [Indexed: 11/17/2022]
Abstract
Vaccine development for antigenically variable pathogens has faltered because extreme genetic diversity precludes induction of broadly neutralizing antibodies (nAB) with classical vaccines. Here, using the most variable epitope of any known human pathogen (HVR1 of HCV), we describe a novel approach capable of eliciting broadly neutralizing antibodies targeting highly variable epitopes. Our proof-of-concept vaccine elicited pan-genotypic nAB against HCV variants differing from the immunogen sequences by more than 70% at the amino acid level. These findings suggest broadly nAB to highly variable pathogens can be elicited by vaccines designed to target physicochemically conserved residues within hypervariable epitopes.
Collapse
Affiliation(s)
- Alexander I Mosa
- Department of Cell and Systems Biology, University of Toronto, Canada.
| | - Richard A Urbanowicz
- Wolfson Centre for Global Virus Infections, University of Nottingham, UK; School of Life Sciences, University of Nottingham, UK
| | | | - John E Tavis
- Department of Molecular Microbiology and Immunology, Saint Louis University School of Medicine, United States
| | - Jonathan K Ball
- Wolfson Centre for Global Virus Infections, University of Nottingham, UK; School of Life Sciences, University of Nottingham, UK
| | - Jordan J Feld
- Toronto Centre for Liver Disease, Toronto General Hospital, Sandra Rotman Centre for Global Health, University of Toronto, Canada
| |
Collapse
|
5
|
Jing X, Dong Q, Lu R, Dong Q. Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181109130430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction.Objective:We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods.Results:Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail.Conclusion:Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, China
| | - Qimin Dong
- Vocational and Technical Education Center of Linxi County, Chifeng, Inner Mongolia, China
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, China
| | - Qiwen Dong
- Faculty of Education, East China Normal University, Shanghai, China
| |
Collapse
|
6
|
Endutkin AV, Koptelov SS, Popov AV, Torgasheva NA, Lomzov AA, Tsygankova AR, Skiba TV, Afonnikov DA, Zharkov DO. Residue coevolution reveals functionally important intramolecular interactions in formamidopyrimidine-DNA glycosylase. DNA Repair (Amst) 2018; 69:24-33. [DOI: 10.1016/j.dnarep.2018.07.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 07/04/2018] [Accepted: 07/04/2018] [Indexed: 10/28/2022]
|
7
|
Barlowe S, Coan HB, Youker RT. SubVis: an interactive R package for exploring the effects of multiple substitution matrices on pairwise sequence alignment. PeerJ 2017; 5:e3492. [PMID: 28674656 PMCID: PMC5490468 DOI: 10.7717/peerj.3492] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 05/27/2017] [Indexed: 01/13/2023] Open
Abstract
Understanding how proteins mutate is critical to solving a host of biological problems. Mutations occur when an amino acid is substituted for another in a protein sequence. The set of likelihoods for amino acid substitutions is stored in a matrix and input to alignment algorithms. The quality of the resulting alignment is used to assess the similarity of two or more sequences and can vary according to assumptions modeled by the substitution matrix. Substitution strategies with minor parameter variations are often grouped together in families. For example, the BLOSUM and PAM matrix families are commonly used because they provide a standard, predefined way of modeling substitutions. However, researchers often do not know if a given matrix family or any individual matrix within a family is the most suitable. Furthermore, predefined matrix families may inaccurately reflect a particular hypothesis that a researcher wishes to model or otherwise result in unsatisfactory alignments. In these cases, the ability to compare the effects of one or more custom matrices may be needed. This laborious process is often performed manually because the ability to simultaneously load multiple matrices and then compare their effects on alignments is not readily available in current software tools. This paper presents SubVis, an interactive R package for loading and applying multiple substitution matrices to pairwise alignments. Users can simultaneously explore alignments resulting from multiple predefined and custom substitution matrices. SubVis utilizes several of the alignment functions found in R, a common language among protein scientists. Functions are tied together with the Shiny platform which allows the modification of input parameters. Information regarding alignment quality and individual amino acid substitutions is displayed with the JavaScript language which provides interactive visualizations for revealing both high-level and low-level alignment information.
Collapse
Affiliation(s)
- Scott Barlowe
- Department of Mathematics and Computer Science, Western Carolina University, Cullowhee, NC, United States of America
| | - Heather B Coan
- Department of Biology, Western Carolina University, Cullowhee, NC, United States of America
| | - Robert T Youker
- Department of Biology, Western Carolina University, Cullowhee, NC, United States of America
| |
Collapse
|
8
|
Origins and evolution of WUSCHEL-related homeobox protein family in plant kingdom. ScientificWorldJournal 2014; 2014:534140. [PMID: 24511289 PMCID: PMC3913392 DOI: 10.1155/2014/534140] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Accepted: 09/19/2013] [Indexed: 12/24/2022] Open
Abstract
WUSCHEL-related homeobox (WOX) is a large group of transcription factors specifically found in plants. WOX members contain the conserved homeodomain essential for plant development by regulating cell division and differentiation. However, the evolutionary relationship of WOX members in plant kingdom remains to be elucidated. In this study, we searched 350 WOX members from 50 species in plant kingdom. Linkage analysis of WOX protein sequences demonstrated that amino acid residues 141-145 and 153-160 located in the homeodomain are possibly associated with the function of WOXs during the evolution. These 350 members were grouped into 3 clades: the first clade represents the conservative WOXs from the lower plant algae to higher plants; the second clade has the members from vascular plant species; the third clade has the members only from spermatophyte species. Furthermore, among the members of Arabidopsis thaliana and Oryza sativa, we observed ubiquitous expression of genes in the first clade and the diversified expression pattern of WOX genes in distinct organs in the second clade and the third clade. This work provides insight into the origin and evolutionary process of WOXs, facilitating their functional investigations in the future.
Collapse
|
9
|
Addington TA, Mertz RW, Siegel JB, Thompson JM, Fisher AJ, Filkov V, Fleischman NM, Suen AA, Zhang C, Toney MD. Janus: prediction and ranking of mutations required for functional interconversion of enzymes. J Mol Biol 2013; 425:1378-89. [PMID: 23396064 DOI: 10.1016/j.jmb.2013.01.034] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2012] [Revised: 01/27/2013] [Accepted: 01/30/2013] [Indexed: 10/27/2022]
Abstract
Identification of residues responsible for functional specificity in enzymes is a challenging and important problem in protein chemistry. Active-site residues are generally easy to identify, but residues outside the active site are also important to catalysis and their identities and roles are more difficult to determine. We report a method based on analysis of multiple sequence alignments, embodied in our program Janus, for predicting mutations required to interconvert structurally related but functionally distinct enzymes. Conversion of aspartate aminotransferase into tyrosine aminotransferase is demonstrated and compared to previous efforts. Incorporation of 35 predicted mutations resulted in an enzyme with the desired substrate specificity but low catalytic activity. A single round of DNA back-shuffling with wild-type aspartate aminotransferase on this variant generated mutants with tyrosine aminotransferase activities better than those previously realized from rational design or directed evolution. Methods such as this, coupled with computational modeling, may prove invaluable in furthering our understanding of enzyme catalysis and engineering.
Collapse
|
10
|
Lashin SA, Suslov VV, Matushkin YG. Theories of biological evolution from the viewpoint of the modern systemic biology. RUSS J GENET+ 2012. [DOI: 10.1134/s1022795412030064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
11
|
Hughes AL. Amino acid sequence coevolution in the insect bursicon ligand-receptor system. Mol Phylogenet Evol 2012; 63:617-24. [PMID: 22373512 DOI: 10.1016/j.ympev.2012.02.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2011] [Revised: 02/01/2012] [Accepted: 02/07/2012] [Indexed: 11/24/2022]
Abstract
The pattern of amino acid residue replacement in the components of the bursicon signaling system (involving the BURSα/BURSβ heterodimer and its receptor BURSrec) was reconstructed across a phylogeny of 17 insect species, in order to test for the co-occurrence of replacements at sets of individual sites. Sets of three or more branches with perfectly concordant changes occurred to a greater extent than expected by chance, given the observed level of amino acid change. The latter sites (SPC sites) were found to have distinctive characteristics: (1) the mean number of changes was significantly lower at SPC sites than that at other sites with multiple changes; (2) SPC sites had a significantly greater tendency toward parallel amino acid changes than other sites with multiple changes, but no greater tendency toward convergent changes; and (3) parallel changes tended to involve relatively similar amino acids, as indicated by relatively low mean chemical distances. The results implicated functional constraint, permitting only a limited subset of amino acids in a given site, as a major factor in causing both parallel amino acid replacement and coordinated amino acid changes in different sites of the same protein and of interacting proteins in this system.
Collapse
Affiliation(s)
- Austin L Hughes
- Department of Biological Sciences, University of South Carolina, Columbia, SC 29205, USA.
| |
Collapse
|
12
|
Sukumar N, Krein MP, Embrechts MJ. Predictive cheminformatics in drug discovery: statistical modeling for analysis of micro-array and gene expression data. Methods Mol Biol 2012; 910:165-94. [PMID: 22821597 DOI: 10.1007/978-1-61779-965-5_9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The vast amounts of chemical and biological data available through robotic high-throughput assays and micro-array technologies require computational techniques for visualization, analysis, and predictive -modeling. Predictive cheminformatics and bioinformatics employ statistical methods to mine this data for hidden correlations and to retrieve molecules or genes with desirable biological activity from large databases, for the purpose of drug development. While many statistical methods are commonly employed and widely accessible, their proper use involves due consideration to data representation and preprocessing, model validation and domain of applicability estimation, similarity assessment, the nature of the structure-activity landscape, and model interpretation. This chapter seeks to review these considerations in light of the current state of the art in statistical modeling and to summarize the best practices in predictive cheminformatics.
Collapse
Affiliation(s)
- N Sukumar
- Rensselaer Exploratory Center for Cheminformatics Research and Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, NY, USA.
| | | | | |
Collapse
|
13
|
Henriksen SB, Mortensen RJ, Geertz-Hansen HM, Neves-Petersen MT, Arnason O, Söring J, Petersen SB. Hyperdimensional analysis of amino acid pair distributions in proteins. PLoS One 2011; 6:e25638. [PMID: 22174733 PMCID: PMC3235099 DOI: 10.1371/journal.pone.0025638] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2011] [Accepted: 09/08/2011] [Indexed: 01/06/2023] Open
Abstract
Our manuscript presents a novel approach to protein structure analyses. We have organized an 8-dimensional data cube with protein 3D-structural information from 8706 high-resolution non-redundant protein-chains with the aim of identifying packing rules at the amino acid pair level. The cube contains information about amino acid type, solvent accessibility, spatial and sequence distance, secondary structure and sequence length. We are able to pose structural queries to the data cube using program ProPack. The response is a 1, 2 or 3D graph. Whereas the response is of a statistical nature, the user can obtain an instant list of all PDB-structures where such pair is found. The user may select a particular structure, which is displayed highlighting the pair in question. The user may pose millions of different queries and for each one he will receive the answer in a few seconds. In order to demonstrate the capabilities of the data cube as well as the programs, we have selected well known structural features, disulphide bridges and salt bridges, where we illustrate how the queries are posed, and how answers are given. Motifs involving cysteines such as disulphide bridges, zinc-fingers and iron-sulfur clusters are clearly identified and differentiated. ProPack also reveals that whereas pairs of Lys residues virtually never appear in close spatial proximity, pairs of Arg are abundant and appear at close spatial distance, contrasting the belief that electrostatic repulsion would prevent this juxtaposition and that Arg-Lys is perceived as a conservative mutation. The presented programs can find and visualize novel packing preferences in proteins structures allowing the user to unravel correlations between pairs of amino acids. The new tools allow the user to view statistical information and visualize instantly the structures that underpin the statistical information, which is far from trivial with most other SW tools for protein structure analysis.
Collapse
Affiliation(s)
- Svend B. Henriksen
- NanoBiotechnology Group, Department of Physics and Nanotechnology, Aalborg University, Aalborg, Denmark
| | - Rasmus J. Mortensen
- NanoBiotechnology Group, Department of Physics and Nanotechnology, Aalborg University, Aalborg, Denmark
| | - Henrik M. Geertz-Hansen
- NanoBiotechnology Group, Department of Physics and Nanotechnology, Aalborg University, Aalborg, Denmark
| | - Maria Teresa Neves-Petersen
- International Iberian Nanotechnol Lab (INL), Braga, Portugal
- Nanobiotechnology Group, Department of Biotechnology, Chemistry and Environmental Sciences, University of Aalborg, Aalborg, Denmark
- * E-mail:
| | - Omar Arnason
- NanoBiotechnology Group, Department of Physics and Nanotechnology, Aalborg University, Aalborg, Denmark
| | - Jón Söring
- NanoBiotechnology Group, Department of Physics and Nanotechnology, Aalborg University, Aalborg, Denmark
| | - Steffen B. Petersen
- Nanobiotechnology Group, Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
- The Institute for Lasers, Photonics and Biophotonics, University at Buffalo, The State University of New York, Buffalo, New York, United States of America
| |
Collapse
|
14
|
Sreekumar J, ter Braak CJF, van Ham RCHJ, van Dijk ADJ. Correlated mutations via regularized multinomial regression. BMC Bioinformatics 2011; 12:444. [PMID: 22082126 PMCID: PMC3247924 DOI: 10.1186/1471-2105-12-444] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2011] [Accepted: 11/14/2011] [Indexed: 11/13/2022] Open
Abstract
Background In addition to sequence conservation, protein multiple sequence alignments contain evolutionary signal in the form of correlated variation among amino acid positions. This signal indicates positions in the sequence that influence each other, and can be applied for the prediction of intra- or intermolecular contacts. Although various approaches exist for the detection of such correlated mutations, in general these methods utilize only pairwise correlations. Hence, they tend to conflate direct and indirect dependencies. Results We propose RMRCM, a method for Regularized Multinomial Regression in order to obtain Correlated Mutations from protein multiple sequence alignments. Importantly, our method is not restricted to pairwise (column-column) comparisons only, but takes into account the network nature of relationships between protein residues in order to predict residue-residue contacts. The use of regularization ensures that the number of predicted links between columns in the multiple sequence alignment remains limited, preventing overprediction. Using simulated datasets we analyzed the performance of our approach in predicting residue-residue contacts, and studied how it is influenced by various types of noise. For various biological datasets, validation with protein structure data indicates a good performance of the proposed algorithm for the prediction of residue-residue contacts, in comparison to previous results. RMRCM can also be applied to predict interactions (in addition to only predicting interaction sites or contact sites), as demonstrated by predicting PDZ-peptide interactions. Conclusions A novel method is presented, which uses regularized multinomial regression in order to obtain correlated mutations from protein multiple sequence alignments. Availability R-code of our implementation is available via http://www.ab.wur.nl/rmrcm
Collapse
Affiliation(s)
- Janardanan Sreekumar
- Central Tuber Crops Research Institute, Thiruvananthapuram-695017, Kerala, India
| | | | | | | |
Collapse
|
15
|
Fuzzy clustering of physicochemical and biochemical properties of amino acids. Amino Acids 2011; 43:583-94. [PMID: 21993537 PMCID: PMC3397137 DOI: 10.1007/s00726-011-1106-9] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2011] [Accepted: 09/23/2011] [Indexed: 12/03/2022]
Abstract
In this article, we categorize presently available experimental and theoretical knowledge of various physicochemical and biochemical features of amino acids, as collected in the AAindex database of known 544 amino acid (AA) indices. Previously reported 402 indices were categorized into six groups using hierarchical clustering technique and 142 were left unclustered. However, due to the increasing diversity of the database these indices are overlapping, therefore crisp clustering method may not provide optimal results. Moreover, in various large-scale bioinformatics analyses of whole proteomes, the proper selection of amino acid indices representing their biological significance is crucial for efficient and error-prone encoding of the short functional sequence motifs. In most cases, researchers perform exhaustive manual selection of the most informative indices. These two facts motivated us to analyse the widely used AA indices. The main goal of this article is twofold. First, we present a novel method of partitioning the bioinformatics data using consensus fuzzy clustering, where the recently proposed fuzzy clustering techniques are exploited. Second, we prepare three high quality subsets of all available indices. Superiority of the consensus fuzzy clustering method is demonstrated quantitatively, visually and statistically by comparing it with the previously proposed hierarchical clustered results. The processed AAindex1 database, supplementary material and the software are available at http://sysbio.icm.edu.pl/aaindex/.
Collapse
|
16
|
van Dijk ADJ, van Ham RCHJ. Conserved and variable correlated mutations in the plant MADS protein network. BMC Genomics 2010; 11:607. [PMID: 20979667 PMCID: PMC3017862 DOI: 10.1186/1471-2164-11-607] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2010] [Accepted: 10/28/2010] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Plant MADS domain proteins are involved in a variety of developmental processes for which their ability to form various interactions is a key requisite. However, not much is known about the structure of these proteins or their complexes, whereas such knowledge would be valuable for a better understanding of their function. Here, we analyze those proteins and the complexes they form using a correlated mutation approach in combination with available structural, bioinformatics and experimental data. RESULTS Correlated mutations are affected by several types of noise, which is difficult to disentangle from the real signal. In our analysis of the MADS domain proteins, we apply for the first time a correlated mutation analysis to a family of interacting proteins. This provides a unique way to investigate the amount of signal that is present in correlated mutations because it allows direct comparison of mutations in various family members and assessing their conservation. We show that correlated mutations in general are conserved within the various family members, and if not, the variability at the respective positions is less in the proteins in which the correlated mutation does not occur. Also, intermolecular correlated mutation signals for interacting pairs of proteins display clear overlap with other bioinformatics data, which is not the case for non-interacting protein pairs, an observation which validates the intermolecular correlated mutations. Having validated the correlated mutation results, we apply them to infer the structural organization of the MADS domain proteins. CONCLUSION Our analysis enables understanding of the structural organization of the MADS domain proteins, including support for predicted helices based on correlated mutation patterns, and evidence for a specific interaction site in those proteins.
Collapse
Affiliation(s)
- Aalt DJ van Dijk
- Applied Bioinformatics, PRI, Wageningen UR, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Roeland CHJ van Ham
- Applied Bioinformatics, PRI, Wageningen UR, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| |
Collapse
|
17
|
Yano T, Nobusawa E, Nagy A, Nakajima S, Nakajima K. Effects of single-point amino acid substitutions on the structure and function neuraminidase proteins in influenza A virus. Microbiol Immunol 2008; 52:216-23. [PMID: 18426396 DOI: 10.1111/j.1348-0421.2008.00034.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In order to clarify the effect of amino acid substitutions on the structure and function of the neuraminidase (NA) protein of influenza A virus, we introduced single-point amino acid substitutions into the NA protein of the A/Tokyo/3/67 (H2N2) strain using PCR-based random mutation. The rate of tolerant random one amino acid substitutions in the NA protein was 47%. Rates of tolerant substitutions for the stalk and for the surface and inner portion of the head region of the NA protein were 79, 54, and 19%, respectively. Deleterious changes, such as those causing the NA protein to stop at the Golgi/endoplasmic reticulum, were scattered throughout the protein. On the other hand, the ratio of mutations with which the NA protein lost neuraminidase activity, but was transported to the cell surface, decreased in proportion to the distance from the structural center of enzyme active site. In order to investigate the effect of accumulated amino acid substitutions on the structural character of the N2NA protein during evolution, the same amino acid substitutions were introduced by site-directed mutagenesis at 23 homologous positions on N2 proteins of A/Tokyo/3/67, A/Bangkok/15/85 (H3N2), and A/Mie/1/2004 (H3N2). The results showed a shift, or discordance, in tolerance at some of the positions. An increase in discordance was correlated with the interval in years between virus strains, and the discordance rate was estimated to be 0.6-0.7% per year.
Collapse
Affiliation(s)
- Takuya Yano
- Department of Virology, Medical School, Nagoya City University, Nagoya, Japan
| | | | | | | | | |
Collapse
|
18
|
Abstract
Hepatitis C virus is a genetically heterogeneous RNA virus that is a major cause of liver disease worldwide. Here, we show that, despite its extensive heterogeneity, the evolution of hepatitis C virus is primarily shaped by negative selection and that numerous coordinated substitutions in the polyprotein can be organized into a scale-free network whose degree of connections between sites follows a power-law distribution. This network shares all major properties with many complex biological and technological networks. The topological structure and hierarchical organization of this network suggest that a small number of amino acid sites exert extensive impact on hepatitis C virus evolution. Nonstructural proteins are enriched for negatively selected sites of high centrality, whereas structural proteins are enriched for positively selected sites located in the periphery of the network. The complex network of coordinated substitutions is an emergent property of genetic systems with implications for evolution, vaccine research, and drug development. In addition to such properties as polymorphism or strength of selection, the epistatic connectivity mapped in the network is important for typing individual sites, proteins, or entire genetic systems. The network topology may help devise molecular intervention strategies for disrupting viral functions or impeding compensatory changes for vaccine escape or drug resistance mutations. Also, it may be used to find new therapeutic targets, as suggested in this study for the NS4A protein, which plays an important role in the network.
Collapse
|
19
|
Demenkov PS, Aman EE, Ivanisenko VA. Prediction of the changes in thermodynamic stability of proteins caused by single amino acid substitutions. Biophysics (Nagoya-shi) 2008. [DOI: 10.1134/s0006350906070104] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
20
|
Godoy VG, Jarosz DF, Simon SM, Abyzov A, Ilyin V, Walker GC. UmuD and RecA directly modulate the mutagenic potential of the Y family DNA polymerase DinB. Mol Cell 2008; 28:1058-70. [PMID: 18158902 DOI: 10.1016/j.molcel.2007.10.025] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2007] [Revised: 07/23/2007] [Accepted: 10/18/2007] [Indexed: 11/25/2022]
Abstract
DinB is the only translesion Y family DNA polymerase conserved among bacteria, archaea, and eukaryotes. DinB and its orthologs possess a specialized lesion bypass function but also display potentially deleterious -1 frameshift mutagenic phenotypes when overproduced. We show that the DNA damage-inducible proteins UmuD(2) and RecA act in concert to modulate this mutagenic activity. Structural modeling suggests that the relatively open active site of DinB is enclosed by interaction with these proteins, thereby preventing the template bulging responsible for -1 frameshift mutagenesis. Intriguingly, residues that define the UmuD(2)-interacting surface on DinB statistically covary throughout evolution, suggesting a driving force for the maintenance of a regulatory protein-protein interaction at this site. Together, these observations indicate that proteins like RecA and UmuD(2) may be responsible for managing the mutagenic potential of DinB orthologs throughout evolution.
Collapse
Affiliation(s)
- Veronica G Godoy
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | | | | | | | | |
Collapse
|
21
|
Sherbakov DY, Triboy TI. Effect of co-evolving amino acid residues on topology of phylogenetic trees. BIOCHEMISTRY. BIOKHIMIIA 2007; 72:1363-1367. [PMID: 18205620 DOI: 10.1134/s0006297907120103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
The presence in proteins of amino acid residues that change in concert during evolution is associated with keeping constant the protein spatial structure and functions. As in the case with morphological features, correlated substitutions may become the cause of homoplasies--the independent evolution of identical non-homological adaptations. Our data obtained on model phylogenetic trees and corresponding sets of sequences have shown that the presence of correlated substitutions distorts the results of phylogenetic reconstructions. A method for accounting for co-evolving amino acid residues in phylogenetic analysis is proposed. According to this method, only a single site from the group of correlated amino acid positions should remain, whereas other positions should not be used in further phylogenetic analysis. Simulations performed have shown that replacement on the average of 8% of variable positions in a pair of model sequences by coordinately evolving amino acid residues is able to change the tree topology. The removal of such amino acid residues from sequences before phylogenetic analysis restores the correct topology.
Collapse
Affiliation(s)
- D Yu Sherbakov
- Limnological Institute, Siberian Branch of the Russian Academy of Sciences, Irkutsk, 664033, Russia.
| | | |
Collapse
|
22
|
Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 2007; 36:D202-5. [PMID: 17998252 PMCID: PMC2238890 DOI: 10.1093/nar/gkm998] [Citation(s) in RCA: 688] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. We have added a collection of protein contact potentials to the AAindex as a new section. Accordingly AAindex consists of three sections now: AAindex1 for the amino acid index of 20 numerical values, AAindex2 for the amino acid substitution matrix and AAindex3 for the statistical protein contact potentials. All data are derived from published literature. The database can be accessed through the DBGET/LinkDB system at GenomeNet (http://www.genome.jp/dbget-bin/www_bfind?aaindex) or downloaded by anonymous FTP (ftp://ftp.genome.jp/pub/db/community/aaindex/).
Collapse
Affiliation(s)
- Shuichi Kawashima
- Laboratory of Genome Database, Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokane-dai Minato-ku Tokyo 108-8639, Japan.
| | | | | | | | | | | |
Collapse
|
23
|
Naithani S, Chookajorn T, Ripoll DR, Nasrallah JB. Structural modules for receptor dimerization in the S-locus receptor kinase extracellular domain. Proc Natl Acad Sci U S A 2007; 104:12211-6. [PMID: 17609367 PMCID: PMC1924578 DOI: 10.1073/pnas.0705186104] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The highly polymorphic S-locus receptor kinase (SRK) is the stigma determinant of specificity in the self-incompatibility response of the Brassicaceae. SRK spans the plasma membrane of stigma epidermal cells, and it is activated in an allele-specific manner on binding of its extracellular region (eSRK) to its cognate pollen coat-localized S-locus cysteine-rich (SCR) ligand. SRK, like several other receptor kinases, forms dimers in the absence of ligand. To identify domains in SRK that mediate ligand-independent dimerization, we assayed eSRK for self-interaction in yeast. We show that SRK dimerization is mediated by two regions in eSRK, primarily by a C-terminal region inferred by homology modeling/fold recognition techniques to assume a PAN_APPLE-like structure, and secondarily by a region containing a signature sequence of the S-domain gene family, which might assume an EGF-like structure. We also show that eSRK exhibits a marked preference for homodimerization over heterodimerization with other eSRK variants and that this preference is mediated by a small, highly variable region within the PAN_APPLE domain. Thus, the extensive polymorphism exhibited by the eSRK not only determines differential affinity toward the SCR ligand, as has been assumed thus far, but also underlies a previously unrecognized allelic specificity in SRK dimerization. We propose that preference for SRK homodimerization explains the codominance exhibited by a majority of SRKs in the typically heterozygous stigmas of self-incompatible plants, whereas an increased propensity for heterodimerization combined with reduced affinity of heterodimers for cognate SCRs might underlie the dominant-recessive or mutual weakening relationships exhibited by some SRK allelic pairs.
Collapse
Affiliation(s)
| | | | - Daniel R. Ripoll
- Computational Biology Service Unit, Cornell Theory Center, Cornell University, Ithaca, NY 14853
| | - June B. Nasrallah
- *Department of Plant Biology and
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
24
|
Stern A, Doron-Faigenboim A, Erez E, Martz E, Bacharach E, Pupko T. Selecton 2007: advanced models for detecting positive and purifying selection using a Bayesian inference approach. Nucleic Acids Res 2007; 35:W506-11. [PMID: 17586822 PMCID: PMC1933148 DOI: 10.1093/nar/gkm382] [Citation(s) in RCA: 251] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Biologically significant sites in a protein may be identified by contrasting the rates of synonymous (Ks) and non-synonymous (Ka) substitutions. This enables the inference of site-specific positive Darwinian selection and purifying selection. We present here Selecton version 2.2 (http://selecton.bioinfo.tau.ac.il), a web server which automatically calculates the ratio between Ka and Ks (ω) at each site of the protein. This ratio is graphically displayed on each site using a color-coding scheme, indicating either positive selection, purifying selection or lack of selection. Selecton implements an assembly of different evolutionary models, which allow for statistical testing of the hypothesis that a protein has undergone positive selection. Specifically, the recently developed mechanistic-empirical model is introduced, which takes into account the physicochemical properties of amino acids. Advanced options were introduced to allow maximal fine tuning of the server to the user's specific needs, including calculation of statistical support of the ω values, an advanced graphic display of the protein's 3-dimensional structure, use of different genetic codes and inputting of a pre-built phylogenetic tree. Selecton version 2.2 is an effective, user-friendly and freely available web server which implements up-to-date methods for computing site-specific selection forces, and the visualization of these forces on the protein's sequence and structure.
Collapse
Affiliation(s)
- Adi Stern
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel and Department of Microbiology, University of Massachusetts, Amherst, MA 01003, USA
| | - Adi Doron-Faigenboim
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel and Department of Microbiology, University of Massachusetts, Amherst, MA 01003, USA
| | - Elana Erez
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel and Department of Microbiology, University of Massachusetts, Amherst, MA 01003, USA
| | - Eric Martz
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel and Department of Microbiology, University of Massachusetts, Amherst, MA 01003, USA
| | - Eran Bacharach
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel and Department of Microbiology, University of Massachusetts, Amherst, MA 01003, USA
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel and Department of Microbiology, University of Massachusetts, Amherst, MA 01003, USA
- *To whom correspondence should be addressed. 972-3-640-7693972-3-642-2046
| |
Collapse
|
25
|
Eyal E, Pietrokovski S, Bahar I. Rapid assessment of correlated amino acids from pair-to-pair (P2P) substitution matrices. Bioinformatics 2007; 23:1837-9. [PMID: 17496318 DOI: 10.1093/bioinformatics/btm256] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED Identification of correlated amino acids in proteins has been a topic of broad interest in view of its functional implications and importance in protein design. A new set of pair-to-pair (P2P) substitution matrices for amino acids was recently introduced as a useful tool for inferring information on such correlated sites. We present a website developed for automated application of these matrices for analysis of query sequences. The site offers options for graphical analysis of correlations, as well as visualization of correlated amino acids on representative, structurally characterized, members of the examined family of sequences. AVAILABILITY http://www.ccbb.pitt.edu/p2p.
Collapse
Affiliation(s)
- Eran Eyal
- Department of Computational Biology, School of Medicine, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15213, USA.
| | | | | |
Collapse
|
26
|
Berezovsky IN, Zeldovich KB, Shakhnovich EI. Positive and negative design in stability and thermal adaptation of natural proteins. PLoS Comput Biol 2007; 3:e52. [PMID: 17381236 PMCID: PMC1829478 DOI: 10.1371/journal.pcbi.0030052] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2006] [Accepted: 01/31/2007] [Indexed: 11/18/2022] Open
Abstract
The aim of this work is to elucidate how physical principles of protein design are reflected in natural sequences that evolved in response to the thermal conditions of the environment. Using an exactly solvable lattice model, we design sequences with selected thermal properties. Compositional analysis of designed model sequences and natural proteomes reveals a specific trend in amino acid compositions in response to the requirement of stability at elevated environmental temperature: the increase of fractions of hydrophobic and charged amino acid residues at the expense of polar ones. We show that this “from both ends of the hydrophobicity scale” trend is due to positive (to stabilize the native state) and negative (to destabilize misfolded states) components of protein design. Negative design strengthens specific repulsive non-native interactions that appear in misfolded structures. A pressure to preserve specific repulsive interactions in non-native conformations may result in correlated mutations between amino acids that are far apart in the native state but may be in contact in misfolded conformations. Such correlated mutations are indeed found in TIM barrel and other proteins. What mechanisms does Nature use in her quest for thermophilic proteins? It is known that stability of a protein is mainly determined by the energy gap, or the difference in energy, between native state and a set of incorrectly folded (misfolded) conformations. Here we show that Nature makes thermophilic proteins by widening this gap from both ends. The energy of the native state of a protein is decreased by selecting strongly attractive amino acids at positions that are in contact in the native state (positive design). Simultaneously, energies of the misfolded conformations are increased by selection of strongly repulsive amino acids at positions that are distant in native structure; however, these amino acids will interact repulsively in the misfolded conformations (negative design). These fundamental principles of protein design are manifested in the “from both ends of the hydrophobicity scale” trend observed in thermophilic adaptation, whereby proteomes of thermophilic proteins are enriched in extreme amino acids—hydrophobic and charged—at the expense of polar ones. Hydrophobic amino acids contribute mostly to the positive design, while charged amino acids that repel each other in non-native conformations of proteins contribute to negative design. Our results provide guidance in rational design of proteins with selected thermal properties.
Collapse
Affiliation(s)
- Igor N Berezovsky
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Konstantin B Zeldovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
27
|
Staats M, van Baarlen P, Schouten A, van Kan JAL, Bakker FT. Positive selection in phytotoxic protein-encoding genes of Botrytis species. Fungal Genet Biol 2006; 44:52-63. [PMID: 16935013 DOI: 10.1016/j.fgb.2006.07.003] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2006] [Revised: 07/05/2006] [Accepted: 07/15/2006] [Indexed: 10/24/2022]
Abstract
Evolutionary patterns of sequence divergence were analyzed in genes from the fungal genus Botrytis (Ascomycota), encoding phytotoxic proteins homologous to a necrosis and ethylene-inducing protein from Fusarium oxysporum. Fragments of two paralogous genes (designated NEP1 and NEP2) were amplified from all known Botrytis species and sequenced. NEP1 sequences of two Botrytis species contain premature stop codons, indicating that they may be non-functional. Both paralogs of all species encode proteins with a remarkably similar predicted secondary structure, however, they contain different types of post-translational modification motifs, which are conserved across the genus. While both NEP genes are, overall, under purifying selection, we identified a number of amino acids under positive selection based on inference using maximum likelihood models. Positively selected amino acids in NEP1 were not under selection in corresponding positions in NEP2. The biological significance of positively selected residues and the role of NEP proteins in pathogenesis remain to be resolved.
Collapse
Affiliation(s)
- Martijn Staats
- Wageningen University, Laboratory of Phytopathology, Wageningen, The Netherlands
| | | | | | | | | |
Collapse
|
28
|
Zhao F, Qin S. Evolutionary analysis of phycobiliproteins: implications for their structural and functional relationships. J Mol Evol 2006; 63:330-40. [PMID: 16830096 DOI: 10.1007/s00239-005-0026-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2005] [Accepted: 09/01/2005] [Indexed: 10/24/2022]
Abstract
Phycobiliproteins, together with linker polypeptides and various chromophores, are basic building blocks of phycobilisomes, a supramolecular complex with a light-harvesting function in cyanobacteria and red algae. Previous studies suggest that the different types of phycobiliproteins and the linker polypeptides originated from the same ancestor. Here we retrieve the phycobilisome-related genes from the well-annotated and even unfinished cyanobacteria genomes and find that many sites with elevated d(N)/d(S) ratios in different phycobiliprotein lineages are located in the chromophore-binding domain and the helical hairpin domains (X and Y). Covariation analyses also reveal that these sites are significantly correlated, showing strong evidence of the functional-structural importance of interactions among these residues. The potential selective pressure driving the diversification of phycobiliproteins may be related to the phycobiliprotein-chromophore microenvironment formation and the subunits interaction. Sites and genes identified here would provide targets for further research on the structural-functional role of these residues and energy transfer through the chromophores.
Collapse
Affiliation(s)
- Fangqing Zhao
- Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China
| | | |
Collapse
|
29
|
Wang P, Yan B, Guo JT, Hicks C, Xu Y. Structural genomics analysis of alternative splicing and application to isoform structure modeling. Proc Natl Acad Sci U S A 2005; 102:18920-5. [PMID: 16354838 PMCID: PMC1323168 DOI: 10.1073/pnas.0506770102] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Alternative splicing is a sophisticated nuclear process that regulates gene expression. It represents an important mechanism for enhancing the functional diversity of proteins. Our current knowledge of alternatively spliced variants is derived mainly from mRNA transcripts, and very little is known about their protein tertiary structures. We carried out a large-scale analysis of known alternatively spliced variants at both protein sequence and structure levels and have shown that threading is, in general, a viable approach for modeling structures of alternatively spliced variants. An examination of alternative splicing at the protein sequence level revealed that the size of splicing events follows the power law distribution and the majority of splicing isoforms harbor only one or two alternations. We examined alternative splicing in the context of protein 3D structures and found that the boundaries of alternative splicing events generally happen in coil regions of secondary structures and exposed residues and the majority of the sequences involved in splicing are located on the surface of proteins. In light of these findings, we then proceeded to demonstrate that threading represents a useful tool for structure prediction of alternative splicing isoforms and addressed the fold stability issue of threading-based structure prediction by molecular dynamics simulation. Our analysis and the insights gained have helped to establish a viable method for structure prediction of alternatively spliced isoforms at the genome scale.
Collapse
Affiliation(s)
- Peng Wang
- Department of Biochemistry and Molecular Biology and Institute of Bioinformatics, University of Georgia, Athens, GA 30622, USA
| | | | | | | | | |
Collapse
|
30
|
Fox-Walsh KL, Dou Y, Lam BJ, Hung SP, Baldi PF, Hertel KJ. The architecture of pre-mRNAs affects mechanisms of splice-site pairing. Proc Natl Acad Sci U S A 2005; 102:16176-81. [PMID: 16260721 PMCID: PMC1283478 DOI: 10.1073/pnas.0508489102] [Citation(s) in RCA: 185] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The exon/intron architecture of genes determines whether components of the spliceosome recognize splice sites across the intron or across the exon. Using in vitro splicing assays, we demonstrate that splice-site recognition across introns ceases when intron size is between 200 and 250 nucleotides. Beyond this threshold, splice sites are recognized across the exon. Splice-site recognition across the intron is significantly more efficient than splice-site recognition across the exon, resulting in enhanced inclusion of exons with weak splice sites. Thus, intron size can profoundly influence the likelihood that an exon is constitutively or alternatively spliced. An EST-based alternative-splicing database was used to determine whether the exon/intron architecture influences the probability of alternative splicing in the Drosophila and human genomes. Drosophila exons flanked by long introns display an up to 90-fold-higher probability of being alternatively spliced compared with exons flanked by two short introns, demonstrating that the exon/intron architecture in Drosophila is a major determinant in governing the frequency of alternative splicing. Exon skipping is also more likely to occur when exons are flanked by long introns in the human genome. Interestingly, experimental and computational analyses show that the length of the upstream intron is more influential in inducing alternative splicing than is the length of the downstream intron. We conclude that the size and location of the flanking introns control the mechanism of splice-site recognition and influence the frequency and the type of alternative splicing that a pre-mRNA transcript undergoes.
Collapse
Affiliation(s)
- Kristi L Fox-Walsh
- Department of Microbiology and Molecular Genetics, University of California, Irvine, CA 92697-4025, USA
| | | | | | | | | | | |
Collapse
|
31
|
Ray WC. MAVL/StickWRLD for protein: visualizing protein sequence families to detect non-consensus features. Nucleic Acids Res 2005; 33:W315-9. [PMID: 15980480 PMCID: PMC1160135 DOI: 10.1093/nar/gki374] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A fundamental problem with applying Consensus, Weight-Matrix or hidden Markov models as search tools for biosequences is that there is no way to know, from the model, if the modeled sequences display any dependencies between positional identities. In some instances, these dependencies are crucial in correctly accepting or rejecting other sequences as members of the family. MAVL (multiple alignment variation linker) and StickWRLD provide a web-based method to visually survey the model-training sequences to discover and characterize possible dependencies. Initially introduced for nucleic acid sequences, with MAVL/StickWRLD, it is easy to distinguish typical DNA or RNA structural dependencies in input families, identify mixed populations of distinct subfamilies, or discover novel dependencies that result from binding interactions or other selective pressures [W. Ray (2004) Nucleic Acids Res., 32, W59-W63]. Since the announcement of MAVL/StickWRLD for nucleic acids, one of the most requested new features has been the extension of this visualization method to support protein alignments. We are pleased to report that this extension has been successful, that the basic visualization has been augmented in several ways to enhance protein viewing, and that the results with protein alignments are even more dramatic than with NA alignments. MAVL/StickWRLD can be accessed at http://www.microbial-pathogenesis.org/stickwrld/.
Collapse
Affiliation(s)
- William C Ray
- Children's Research Institute and The Department of Pediatrics, The Ohio State University, 700 Children's Drive, Columbus, OH 43205, USA.
| |
Collapse
|
32
|
Abstract
The effective integration of data and knowledge from many disparate sources will be crucial to future drug discovery. Data integration is a key element of conducting scientific investigations with modern platform technologies, managing increasingly complex discovery portfolios and processes, and fully realizing economies of scale in large enterprises. However, viewing data integration as simply an 'IT problem' underestimates the novel and serious scientific and management challenges it embodies - challenges that could require significant methodological and even cultural changes in our approach to data.
Collapse
Affiliation(s)
- David B Searls
- Bioinformatics Division, Genetics Research, GlaxoSmithKline Pharmaceuticals, 709 Swedeland Road, P.O. Box 1539, King of Prussia, Pennsylvania 19406, USA.
| |
Collapse
|