1
|
Ren FD, Liu YZ, Ding KW, Chang LL, Cao DL, Liu S. Finite temperature string by K-means clustering sampling with order parameters as collective variables for molecular crystals: application to polymorphic transformation between β-CL-20 and ε-CL-20. Phys Chem Chem Phys 2024; 26:3500-3515. [PMID: 38206084 DOI: 10.1039/d3cp05389j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2024]
Abstract
Polymorphic transformation of molecular crystals is a fundamental phase transition process, and it is important practically in the chemical, material, biopharmaceutical, and energy storage industries. However, understanding of the transformation mechanism at the molecular level is poor due to the extreme simulating challenges in enhanced sampling and formulating order parameters (OPs) as the collective variables that can distinguish polymorphs with quite similar and complicated structures so as to describe the reaction coordinate. In this work, two kinds of OPs for CL-20 were constructed by the bond distances, bond orientations and relative orientations. A K-means clustering algorithm based on the Euclidean distance and sample weight was used to smooth the initial finite temperature string (FTS), and the minimum free energy path connecting β-CL-20 and ε-CL-20 was sketched by the string method in collective variables, and the free energy profile along the path and the nucleation kinetics were obtained by Markovian milestoning with Voronoi tessellations. In comparison with the average-based sampling, the K-means clustering algorithm provided an improved convergence rate of FTS. The simulation of transformation was independent of OP types but was affected greatly by finite-size effects. A surface-mediated local nucleation mechanism was confirmed and the configuration located at the shoulder of potential of mean force, rather than overall maximum, was confirmed to be the critical nucleus formed by the cooperative effect of the intermolecular interactions. This work provides an effective way to explore the polymorphic transformation of caged molecular crystals at the molecular level.
Collapse
Affiliation(s)
- Fu-de Ren
- School of Chemical Engineering and Technology, North University of China, Taiyuan 030051, China.
| | - Ying-Zhe Liu
- Xi'an Modern Chemistry Research Institute, Xi'an 710065, China
| | - Ke-Wei Ding
- Xi'an Modern Chemistry Research Institute, Xi'an 710065, China
| | - Ling-Ling Chang
- School of Chemical Engineering and Technology, North University of China, Taiyuan 030051, China.
| | - Duan-Lin Cao
- School of Chemical Engineering and Technology, North University of China, Taiyuan 030051, China.
| | - Shubin Liu
- Research Computing Center, University of North Carolina, Chapel Hill, North Carolina 27599-3420, USA.
- Depaertment of Chemistry, University of North Carolina, Chapel Hill, North Carolina 27599-3290, USA
| |
Collapse
|
2
|
Thalén F, Köhne CG, Bleidorn C. Patchwork: Alignment-Based Retrieval and Concatenation of Phylogenetic Markers from Genomic Data. Genome Biol Evol 2023; 15:evad227. [PMID: 38085033 PMCID: PMC10735302 DOI: 10.1093/gbe/evad227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 12/23/2023] Open
Abstract
Low-coverage whole-genome sequencing (also known as "genome skimming") is becoming an increasingly affordable approach to large-scale phylogenetic analyses. While already routinely used to recover organellar genomes, genome skimming is rather rarely utilized for recovering single-copy nuclear markers. One reason might be that only few tools exist to work with this data type within a phylogenomic context, especially to deal with fragmented genome assemblies. We here present a new software tool called Patchwork for mining phylogenetic markers from highly fragmented short-read assemblies as well as directly from sequence reads. Patchwork is an alignment-based tool that utilizes the sequence aligner DIAMOND and is written in the programming language Julia. Homologous regions are obtained via a sequence similarity search, followed by a "hit stitching" phase, in which adjacent or overlapping regions are merged into a single unit. The novel sliding window algorithm trims away any noncoding regions from the resulting sequence. We demonstrate the utility of Patchwork by recovering near-universal single-copy orthologs within a benchmarking study, and we additionally assess the performance of Patchwork in comparison with other programs. We find that Patchwork allows for accurate retrieval of (putatively) single-copy genes from genome skimming data sets at different sequencing depths with high computational speed, outperforming existing software targeting similar tasks. Patchwork is released under the GNU General Public License version 3. Installation instructions, additional documentation, and the source code itself are all available via GitHub at https://github.com/fethalen/Patchwork.
Collapse
Affiliation(s)
- Felix Thalén
- Department for Animal Evolution and Biodiversity, Georg-August-Universität Göttingen, Göttingen 37073, Germany
- Cardio-CARE AG, Medizincampus Davos, Davos Wolfgang 7265, Switzerland
| | - Clara G Köhne
- Department for Animal Evolution and Biodiversity, Georg-August-Universität Göttingen, Göttingen 37073, Germany
| | - Christoph Bleidorn
- Department for Animal Evolution and Biodiversity, Georg-August-Universität Göttingen, Göttingen 37073, Germany
| |
Collapse
|
3
|
Abdelmoteleb M, Zhang C, Furey B, Kozubal M, Griffiths H, Champeaud M, Goodman RE. Evaluating potential risks of food allergy of novel food sources based on comparison of proteins predicted from genomes and compared to www.AllergenOnline.org. Food Chem Toxicol 2021; 147:111888. [DOI: 10.1016/j.fct.2020.111888] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 11/23/2020] [Accepted: 11/25/2020] [Indexed: 12/15/2022]
|
4
|
Talyan S, Andrade-Navarro MA, Muro EM. Identification of transcribed protein coding sequence remnants within lincRNAs. Nucleic Acids Res 2019; 46:8720-8729. [PMID: 29986053 PMCID: PMC6158594 DOI: 10.1093/nar/gky608] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 06/26/2018] [Indexed: 12/21/2022] Open
Abstract
Long intergenic non-coding RNAs (lincRNAs) are non-coding transcripts >200 nucleotides long that do not overlap protein-coding sequences. Importantly, such elements are known to be tissue-specifically expressed and to play a widespread role in gene regulation across thousands of genomic loci. However, very little is known of the mechanisms for the evolutionary biogenesis of these RNA elements, especially given their poor conservation across species. It has been proposed that lincRNAs might arise from pseudogenes. To test this systematically, we developed a novel method that searches for remnants of protein-coding sequences within lincRNA transcripts; the hypothesis is that we can trace back their biogenesis from protein-coding genes or posterior transposon/retrotransposon insertions. Applying this method, we found 203 human lincRNA genes with regions significantly similar to protein-coding sequences. Our method provides a visualization tool to trace the evolutionary biogenesis of lincRNAs with respect to protein-coding genes by sequence divergence. Subsequently, we show the expression correlation between lincRNAs and their identified parental protein-coding genes using public RNA-seq repositories, hinting at novel gene regulatory relationships. In summary, we developed a novel computational methodology to study non-coding gene sequences, which can be applied to identify the evolutionary biogenesis and function of lincRNAs.
Collapse
Affiliation(s)
- Sweta Talyan
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany.,Institute of Molecular Biology, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany.,Institute of Molecular Biology, 55128 Mainz, Germany
| | - Enrique M Muro
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany.,Institute of Molecular Biology, 55128 Mainz, Germany
| |
Collapse
|
5
|
Das JK, Choudhury PP, Chaturvedi N, Tayyab M, Hassan SS. Ranking and clustering of Drosophila olfactory receptors using mathematical morphology. Genomics 2019; 111:549-559. [DOI: 10.1016/j.ygeno.2018.03.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 02/12/2018] [Accepted: 03/07/2018] [Indexed: 11/26/2022]
|
6
|
Mirabal P, Abreu J, Seco D. Assessing the best edit in perturbation-based iterative refinement algorithms to compute the median string. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2019.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
7
|
Jin Y, Goodman RE, Tetteh AO, Lu M, Tripathi L. Bioinformatics analysis to assess potential risks of allergenicity and toxicity of HRAP and PFLP proteins in genetically modified bananas resistant to Xanthomonas wilt disease. Food Chem Toxicol 2017; 109:81-89. [PMID: 28830835 DOI: 10.1016/j.fct.2017.08.024] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Revised: 08/16/2017] [Accepted: 08/19/2017] [Indexed: 11/17/2022]
Abstract
Banana Xanthomonas wilt (BXW) disease threatens banana production and food security throughout East Africa. Natural resistance is lacking among common cultivars. Genetically modified (GM) bananas resistant to BXW disease were developed by inserting the hypersensitive response-assisting protein (Hrap) or/and the plant ferredoxin-like protein (Pflp) gene(s) from sweet pepper (Capsicum annuum). Several of these GM banana events showed 100% resistance to BXW disease under field conditions in Uganda. The current study evaluated the potential allergenicity and toxicity of the expressed proteins HRAP and PFLP based on evaluation of published information on the history of safe use of the natural source of the proteins as well as established bioinformatics sequence comparison methods to known allergens (www.AllergenOnline.org and NCBI Protein) and toxins (NCBI Protein). The results did not identify potential risks of allergy and toxicity to either HRAP or PFLP proteins expressed in the GM bananas that might suggest potential health risks to humans. We recognize that additional tests including stability of these proteins in pepsin assay, nutrient analysis and possibly an acute rodent toxicity assay may be required by national regulatory authorities.
Collapse
Affiliation(s)
- Yuan Jin
- University of Nebraska-Lincoln, Food Allergy Research and Resource Program, 1901 North 21st Street, P.O. Box 886207, Lincoln, NE 68588-6207, USA
| | - Richard E Goodman
- University of Nebraska-Lincoln, Food Allergy Research and Resource Program, 1901 North 21st Street, P.O. Box 886207, Lincoln, NE 68588-6207, USA
| | - Afua O Tetteh
- University of Nebraska-Lincoln, Food Allergy Research and Resource Program, 1901 North 21st Street, P.O. Box 886207, Lincoln, NE 68588-6207, USA
| | - Mei Lu
- University of Nebraska-Lincoln, Food Allergy Research and Resource Program, 1901 North 21st Street, P.O. Box 886207, Lincoln, NE 68588-6207, USA
| | - Leena Tripathi
- International Institute of Tropical Agriculture, P.O. Box 30709, Nairobi, Kenya.
| |
Collapse
|
8
|
Abstract
There are millions of sequences deposited in genomic databases, and it is an important task to categorize them according to their structural and functional roles. Sequence comparison is a prerequisite for proper categorization of both DNA and protein sequences, and helps in assigning a putative or hypothetical structure and function to a given sequence. There are various methods available for comparing sequences, alignment being first and foremost for sequences with a small number of base pairs as well as for large-scale genome comparison. Various tools are available for performing pairwise large sequence comparison. The best known tools either perform global alignment or generate local alignments between the two sequences. In this chapter we first provide basic information regarding sequence comparison. This is followed by the description of the PAM and BLOSUM matrices that form the basis of sequence comparison. We also give a practical overview of currently available methods such as BLAST and FASTA, followed by a description and overview of tools available for genome comparison including LAGAN, MumMER, BLASTZ, and AVID.
Collapse
|
9
|
Korostelev YD, Zharov IA, Mironov AA, Rakhmaininova AB, Gelfand MS. Identification of Position-Specific Correlations between DNA-Binding Domains and Their Binding Sites. Application to the MerR Family of Transcription Factors. PLoS One 2016; 11:e0162681. [PMID: 27690309 PMCID: PMC5045206 DOI: 10.1371/journal.pone.0162681] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2015] [Accepted: 08/26/2016] [Indexed: 11/25/2022] Open
Abstract
The large and increasing volume of genomic data analyzed by comparative methods provides information about transcription factors and their binding sites that, in turn, enables statistical analysis of correlations between factors and sites, uncovering mechanisms and evolution of specific protein-DNA recognition. Here we present an online tool, Prot-DNA-Korr, designed to identify and analyze crucial protein-DNA pairs of positions in a family of transcription factors. Correlations are identified by analysis of mutual information between columns of protein and DNA alignments. The algorithm reduces the effects of common phylogenetic history and of abundance of closely related proteins and binding sites. We apply it to five closely related subfamilies of the MerR family of bacterial transcription factors that regulate heavy metal resistance systems. We validate the approach using known 3D structures of MerR-family proteins in complexes with their cognate DNA binding sites and demonstrate that a significant fraction of correlated positions indeed form specific side-chain-to-base contacts. The joint distribution of amino acids and nucleotides hence may be used to predict changes of specificity for point mutations in transcription factors.
Collapse
Affiliation(s)
- Yuriy D. Korostelev
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
- Department of Bioengineering and Bioinformatics, Moscow State University, 1-73 Vorobievy Gory, Moscow, Russia, 119991
| | - Ilya A. Zharov
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
| | - Andrey A. Mironov
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
- Department of Bioengineering and Bioinformatics, Moscow State University, 1-73 Vorobievy Gory, Moscow, Russia, 119991
| | - Alexandra B. Rakhmaininova
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
| | - Mikhail S. Gelfand
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
- Department of Bioengineering and Bioinformatics, Moscow State University, 1-73 Vorobievy Gory, Moscow, Russia, 119991
- * E-mail:
| |
Collapse
|
10
|
Reaching optimized parameter set: protein secondary structure prediction using neural network. Neural Comput Appl 2016. [DOI: 10.1007/s00521-015-2150-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
11
|
Siruguri V, Bharatraj DK, Vankudavath RN, Rao Mendu VV, Gupta V, Goodman RE. Evaluation of Bar, Barnase, and Barstar recombinant proteins expressed in genetically engineered Brassica juncea (Indian mustard) for potential risks of food allergy using bioinformatics and literature searches. Food Chem Toxicol 2015; 83:93-102. [DOI: 10.1016/j.fct.2015.06.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Revised: 06/02/2015] [Accepted: 06/03/2015] [Indexed: 11/26/2022]
|
12
|
Combinations of long peptide sequence blocks can be used to describe toxin diversification in venomous animals. Toxicon 2015; 95:84-92. [DOI: 10.1016/j.toxicon.2015.01.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Revised: 01/07/2015] [Accepted: 01/13/2015] [Indexed: 11/19/2022]
|
13
|
Fast and sensitive protein alignment using DIAMOND. Nat Methods 2014; 12:59-60. [PMID: 25402007 DOI: 10.1038/nmeth.3176] [Citation(s) in RCA: 6523] [Impact Index Per Article: 652.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2014] [Accepted: 10/20/2014] [Indexed: 01/28/2023]
Abstract
The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.
Collapse
|
14
|
Wong AKC, Lee ESA. Aligning and Clustering Patterns to Reveal the Protein Functionality of Sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:548-560. [PMID: 26356022 DOI: 10.1109/tcbb.2014.2306840] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Discovering sequence patterns with variations unveils significant functions of a protein family. Existing combinatorial methods of discovering patterns with variations are computationally expensive, and probabilistic methods require more elaborate probabilistic representation of the amino acid associations. To overcome these shortcomings, this paper presents a new computationally efficient method for representing patterns with variations in a compact representation called Aligned Pattern Cluster (AP Cluster). To tackle the runtime, our method discovers a shortened list of non-redundant statistically significant sequence associations based on our previous work. To address the representation of protein functional regions, our pattern alignment and clustering step, presented in this paper captures the conservations and variations of the aligned patterns. We further refine our solution to allow more coverage of sequences via extending the AP Clusters containing only statistically significant patterns to Weak and Conserved AP Clusters. When applied to the cytochrome c, the ubiquitin, and the triosephosphate isomerase protein families, our algorithm identifies the binding segments as well as the binding residues. When compared to other methods, ours discovers all binding sites in the AP Clusters with superior entropy and coverage. The identification of patterns with variations help biologists to avoid time-consuming simulations and experimentations. (Software available upon request).
Collapse
|
15
|
Shahbaaz M, Hassan MI, Ahmad F. Functional annotation of conserved hypothetical proteins from Haemophilus influenzae Rd KW20. PLoS One 2013; 8:e84263. [PMID: 24391926 PMCID: PMC3877243 DOI: 10.1371/journal.pone.0084263] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Accepted: 11/21/2013] [Indexed: 11/18/2022] Open
Abstract
Haemophilus influenzae is a Gram negative bacterium that belongs to the family Pasteurellaceae, causes bacteremia, pneumonia and acute bacterial meningitis in infants. The emergence of multi-drug resistance H. influenzae strain in clinical isolates demands the development of better/new drugs against this pathogen. Our study combines a number of bioinformatics tools for function predictions of previously not assigned proteins in the genome of H. influenzae. This genome was extensively analyzed and found 1,657 functional proteins in which function of 429 proteins are unknown, termed as hypothetical proteins (HPs). Amino acid sequences of all 429 HPs were extensively annotated and we successfully assigned the function to 296 HPs with high confidence. We also characterized the function of 124 HPs precisely, but with less confidence. We believed that sequence of a protein can be used as a framework to explain known functional properties. Here we have combined the latest versions of protein family databases, protein motifs, intrinsic features from the amino acid sequence, pathway and genome context methods to assign a precise function to hypothetical proteins for which no experimental information is available. We found these HPs belong to various classes of proteins such as enzymes, transporters, carriers, receptors, signal transducers, binding proteins, virulence and other proteins. The outcome of this work will be helpful for a better understanding of the mechanism of pathogenesis and in finding novel therapeutic targets for H. influenzae.
Collapse
Affiliation(s)
- Mohd Shahbaaz
- Department of Computer Science, Jamia Millia Islamia, Jamia Nagar, New Delhi, India
| | - Md Imtaiyaz Hassan
- Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi, India
| | - Faizan Ahmad
- Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi, India
| |
Collapse
|
16
|
Abstract
Factor (F)XIII is a protransglutaminase that, in addition to maintaining hemostasis, has multiple plasmatic and intracellular functions. Its plasmatic form (pFXIII) is a tetramer of two potentially active A (FXIII-A) and two inhibitory/carrier B (FXIII-B) subunits, whereas its cellular form (cFXIII) is a dimer of FXIII-A. FXIII-A belongs to the family of transglutaminases (TGs), which show modest similarity in the primary structure, but a high degree of conservatism in their domain and sub-domain secondary structure. FXIII-A consists of an activation peptide, a β-sandwich, a catalytic and two β-barrel domains. FXIII-B is a glycoprotein consisting of 10 repetitive sushi domains each held together by two internal disulfide bonds. The structural elements of FXIII-A involved in the interaction with FXIII-B have not been elucidated; in FXIII-B the first sushi domain seems important for complex formation. In the circulation pFXIII is bound to the fibrinogen γ'-chain through its B subunit. In the process of pFXIII activation first thrombin cleaves off the activation peptide from FXIII-A, then in the presence of Ca(2+) FXIII-B dissociates and FXIII-A becomes transformed into an active transglutaminase (FXIIIa). The activation is highly accelerated by the presence of fibrin(ogen). cFXIII does not require proteolysis for intracellular activation. The three-dimensional structure of FXIIIa has not been resolved. Based on analogies with transglutaminase-2, a three-dimensional structure of FXIIIa was developed by molecular modeling, which shows good agreement with the drastic structural changes demonstrated by biochemical studies. The structural requirements for enzyme-substrate interaction and for transglutaminase activity are also reviewed.
Collapse
Affiliation(s)
- I Komáromi
- Clinical Research Center Thrombosis, Haemostasis and Vascular Biology Research Group of the Hungarian Academy of Sciences, University of Debrecen, Medical and Health Science Center, Debrecen, Hungary
| | | | | |
Collapse
|
17
|
Brylinski M, Skolnick J. Comparison of structure-based and threading-based approaches to protein functional annotation. Proteins 2010; 78:118-34. [PMID: 19731377 PMCID: PMC2804779 DOI: 10.1002/prot.22566] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
To exploit the vast amount of sequence information provided by the Genomic revolution, the biological function of these sequences must be identified. As a practical matter, this is often accomplished by functional inference. Purely sequence-based approaches, particularly in the "twilight zone" of low sequence similarity levels, are complicated by many factors. For proteins, structure-based techniques aim to overcome these problems; however, most require high-quality crystal structures and suffer from complex and equivocal relations between protein fold and function. In this study, in extensive benchmarking, we consider a number of aspects of structure-based functional annotation: binding pocket detection, molecular function assignment and ligand-based virtual screening. We demonstrate that protein threading driven by a strong sequence profile component greatly improves the quality of purely structure-based functional annotation in the "twilight zone." By detecting evolutionarily related proteins, it considerably reduces the high false positive rate of function inference derived on the basis of global structure similarity alone. Combined evolution/structure-based function assignment emerges as a powerful technique that can make a significant contribution to comprehensive proteome annotation.
Collapse
Affiliation(s)
- Michal Brylinski
- Center for the Study of Systems Biology School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318
| |
Collapse
|
18
|
Edgar RC. Optimizing substitution matrix choice and gap parameters for sequence alignment. BMC Bioinformatics 2009; 10:396. [PMID: 19954534 PMCID: PMC2791778 DOI: 10.1186/1471-2105-10-396] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2009] [Accepted: 12/02/2009] [Indexed: 12/04/2022] Open
Abstract
Background While substitution matrices can readily be computed from reference alignments, it is challenging to compute optimal or approximately optimal gap penalties. It is also not well understood which substitution matrices are the most effective when alignment accuracy is the goal rather than homolog recognition. Here a new parameter optimization procedure, POP, is described and applied to the problems of optimizing gap penalties and selecting substitution matrices for pair-wise global protein alignments. Results POP is compared to a recent method due to Kim and Kececioglu and found to achieve from 0.2% to 1.3% higher accuracies on pair-wise benchmarks extracted from BALIBASE. The VTML matrix series is shown to be the most accurate on several global pair-wise alignment benchmarks, with VTML200 giving best or close to the best performance in all tests. BLOSUM matrices are found to be slightly inferior, even with the marginal improvements in the bug-fixed RBLOSUM series. The PAM series is significantly worse, giving accuracies typically 2% less than VTML. Integer rounding is found to cause slight degradations in accuracy. No evidence is found that selecting a matrix based on sequence divergence improves accuracy, suggesting that the use of this heuristic in CLUSTALW may be ineffective. Using VTML200 is found to improve the accuracy of CLUSTALW by 8% on BALIBASE and 5% on PREFAB. Conclusion The hypothesis that more accurate alignments of distantly related sequences may be achieved using low-identity matrices is shown to be false for commonly used matrix types. Source code and test data is freely available from the author's web site at http://www.drive5.com/pop.
Collapse
|
19
|
Reumers J, Maurer-Stroh S, Schymkowitz J, Rousseau F. Protein sequences encode safeguards against aggregation. Hum Mutat 2009; 30:431-7. [PMID: 19156839 DOI: 10.1002/humu.20905] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Functional requirements shaped proteins into globular structures. Under these structural constraints, which require both regular secondary structure and a hydrophobic core, protein aggregation is an unavoidable corollary to protein structure. However, as aggregation results in reduced fitness, natural selection will tend to eliminate strongly aggregating sequences. The analysis of distribution and variation of aggregation patterns in the human proteome using the TANGO algorithm confirms the findings of a previous study on several proteomes: the flanks of aggregation-prone regions are enriched with charged residues and proline, the so-called gatekeeper-residues. Moreover, in this study, we observed a widespread redundancy in gatekeeper usage. Interestingly, aggregating regions from key proteins such as p53 or huntingtin are among the most extensive "gatekept" sequences. As a consequence, mutations that remove gatekeepers could therefore result in a strong increase in disease-susceptibility. In a set of disease-associated mutations from the UniProt database, we find a strong enrichment of mutations that disrupt gatekeeper motifs. Closer inspection of a number of case studies indicates clearly that removing gatekeepers may play a determining role in widely varying disorders, such as van der Woude syndrome (VWS), X-linked Fabry disease (FD), and limb-girdle muscular dystrophy.
Collapse
Affiliation(s)
- Joke Reumers
- Switch Laboratory, VIB, Vrije Universiteit Brussel, Brussels, Belgium
| | | | | | | |
Collapse
|
20
|
Mizuno Y, Kurochkin IV, Herberth M, Okazaki Y, Schönbach C. Predicted mouse peroxisome-targeted proteins and their actual subcellular locations. BMC Bioinformatics 2008; 9 Suppl 12:S16. [PMID: 19091015 PMCID: PMC2638156 DOI: 10.1186/1471-2105-9-s12-s16] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The import of most intraperoxisomal proteins is mediated by peroxisome targeting signals at their C-termini (PTS1) or N-terminal regions (PTS2). Both signals have been integrated in subcellular location prediction programs. However their present performance, particularly of PTS2-targeting did not seem fitting for large-scale screening of sequences. RESULTS We modified an earlier reported PTS1 screening method to identify PTS2-containing mouse candidates using a combination of computational and manual annotation. For rapid confirmation of five new PTS2- and two previously identified PTS1-containing candidates we developed the new cell line CHO-perRed which stably expresses the peroxisomal marker dsRed-PTS1. Using CHO-perRed we confirmed the peroxisomal localization of PTS1-targeted candidate Zadh2. Preliminary characterization of Zadh2 expression suggested non-PPARalpha mediated activation. Notably, none of the PTS2 candidates located to peroxisomes. CONCLUSION In a few cases the PTS may oscillate from "silent" to "functional" depending on its surface accessibility indicating the potential for context-dependent conditional subcellular sorting. Overall, PTS2-targeting predictions are unlikely to improve without generation and integration of new experimental data from location proteomics, protein structures and quantitative Pex7 PTS2 peptide binding assays.
Collapse
Affiliation(s)
- Yumi Mizuno
- Division of Functional Genomics and Systems Medicine, Research Center for Genomic Medicine, Saitama Medical University, Hidaka, Saitama 350-1241, Japan.
| | | | | | | | | |
Collapse
|
21
|
Punta M, Ofran Y. The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS Comput Biol 2008; 4:e1000160. [PMID: 18974821 PMCID: PMC2518264 DOI: 10.1371/journal.pcbi.1000160] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Affiliation(s)
- Marco Punta
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Columbia University Center for Computational Biology and Bioinformatics (C2B2), New York, New York, United States of America
- Northeast Structural Genomics Consortium (NESG), Columbia University, New York, New York, United States of America
| | - Yanay Ofran
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
- * E-mail:
| |
Collapse
|
22
|
Doolittle RF, Jiang Y, Nand J. Genomic evidence for a simpler clotting scheme in jawless vertebrates. J Mol Evol 2008; 66:185-96. [PMID: 18283387 DOI: 10.1007/s00239-008-9074-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2007] [Revised: 12/30/2007] [Accepted: 01/25/2008] [Indexed: 11/24/2022]
Abstract
Mammalian blood clotting involves numerous components, most of which are the result of gene duplications that occurred early in vertebrate evolution and after the divergence of protochordates. As such, the genomes of the jawless fish (hagfish and lamprey) offer the best possibility for finding systems that might have a reduced set of the many clotting factors observed in higher vertebrates. The most straightforward way of inventorying these factors may be through whole genome sequencing. In this regard, the NCBI Trace database ( http://www.ncbi.nlm.nih.gov/Traces/trace.cgi ) for the lamprey (Petromyzon marinus) contains more than 18 million raw DNA sequences determined by whole-genome shotgun methodology. The data are estimated to be about sixfold redundant, indicating that coverage is sufficiently complete to permit judgments about the presence or absence of particular genes. A search for 20 proteins whose sequences were determined prior to the trace database study found all 20. A subsequent search for specified coagulation factors revealed a lamprey system with a smaller number of components than is found in other vertebrates in that factors V and VIII seem to be represented by a single gene, and factor IX, which is ordinarily a cofactor of factor VIII, is not present. Fortuitously, after the completion of the survey of the Trace database, a draft assembly based on the same database was posted. The draft assembly allowed many of the identified Trace fragments to be linked into longer sequences that fully support the conclusion that lampreys have a simpler clotting scheme compared with other vertebrates. The data are also consistent with the hypothesis that a whole-genome duplication or other large scale block duplication occurred after the divergence of jawless fish from other vertebrates and allowed the simultaneous appearance of a second set of two functionally paired proteins in the vertebrate clotting scheme.
Collapse
Affiliation(s)
- Russell F Doolittle
- Department of Chemistry & Biochemistry, University of California, San Diego, La Jolla, CA 92093-0314, USA.
| | | | | |
Collapse
|
23
|
Chen K, Huang X. Structural analysis of SNARE motifs from sea perch, Lateolabrax japonicus by computerized approaches. Comput Biol Chem 2007; 31:378-83. [PMID: 17890158 DOI: 10.1016/j.compbiolchem.2007.08.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2006] [Accepted: 08/10/2007] [Indexed: 10/22/2022]
Abstract
Three cDNA sequences encoding four SNARE (N-ethylmaleimide-sensitive fusion protein attachment protein receptors) motifs were cloned from sea perch, and the deduced peptide sequences were analyzed for structural prediction by using 14 different web servers and softwares. The "ionic layer" structure, the three dimensional extension and conformational characters of the SNARE 7S core complex by using bioinformatics approaches were compared respectively with those from mammalian X-ray crystallographic investigations. The result suggested that the formation and stabilization of fish SNARE core complex might be driven by hydrophobic association, hydrogen bond among R group of core amino acids and electrostatic attraction at molecular level. This revealed that the SNARE proteins interaction of the fish may share the same molecular mechanism with that of mammal, indicating the universality and solidity of SNARE core complex theory. This work is also an attempt to get the protein 3D structural information which appears to be similar to that obtained through X-ray crystallography, only by using computerized approaches.
Collapse
Affiliation(s)
- Kui Chen
- Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China
| | | |
Collapse
|
24
|
Goodman RE, Taylor SL, Yamamura J, Kobayashi T, Kawakami H, Kruger CL, Thompson GP. Assessment of the potential allergenicity of a Milk Basic Protein fraction. Food Chem Toxicol 2007; 45:1787-94. [PMID: 17482742 DOI: 10.1016/j.fct.2007.03.014] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2006] [Revised: 03/19/2007] [Accepted: 03/19/2007] [Indexed: 11/27/2022]
Abstract
BACKGROUND A specific basic fraction of bovine milk, termed Milk Basic Protein (MBP), has the potential to provide nutritionally important benefits if used as a food ingredient. Although derived from milk, MBP is intended for use as an ingredient in other foods. Cows' milk is a well studied, commonly allergenic food. Although the proteins in MBP are not identified as milk allergens, food products containing MBP will be labelled as containing milk as a caution to milk allergic consumers under food labelling guidelines in the US and the European Union as MBP has not been demonstrated to be free of milk allergens. However, as part of an overall safety evaluation of MBP, the developers sought to evaluate the potential allergenicity of the primary protein components for characteristics of allergenic food proteins and to assess whether intake of these proteins at intended use levels could present a significant new allergenic risk for consumers. OBJECTIVE To evaluate the potential allergenicity of the five identified proteins in MBP. While extensive studies have not demonstrated allergenicity of lactoferrin, the four other proteins are less studied. The four were tested here by sequence identity comparison to known allergens, and for stability of these proteins in acidic pepsin as a characteristic common to many food allergens. METHODS Sequences of the proteins were compared to those listed in AllergenOnline.com, by methods recommended for the evaluation of proteins introduced in crops through genetic engineering. Pepsin stability was assessed by incubating the various proteins in simulated gastric fluid at pH 1.2 with porcine pepsin for up to 60 min at 37 degrees C, with samples withdrawn and analyzed at specific times. RESULTS No significant sequence similarities were identified for the MBP proteins compared to known allergens. All but one of the protein components of MBP were digested relatively quickly by pepsin. The more stable protein will be of low abundance as consumed in contrast to most pepsin-stable food allergens. CONCLUSIONS Based on molecular characteristics and expected exposure, the protein components in MBP are unlikely to present any increased risk of allergy for milk allergic subjects or of cross-reactivity for other allergic subjects. However, since the proteins are derived from milk, products containing MBP will need to be labelled as containing milk proteins to warn milk allergic subjects of the potential risk of allergic reactions.
Collapse
Affiliation(s)
- Richard E Goodman
- Food Allergy Research and Resource Program, University of Nebraska, Lincoln, NE, USA.
| | | | | | | | | | | | | |
Collapse
|
25
|
Monderer-Rothkoff G, Amster-Choder O. Genetic dissection of the divergent activities of the multifunctional membrane sensor BglF. J Bacteriol 2007; 189:8601-15. [PMID: 17905978 PMCID: PMC2168942 DOI: 10.1128/jb.01220-07] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
BglF catalyzes beta-glucoside phosphotransfer across the cytoplasmic membrane in Escherichia coli. In addition, BglF acts as a sugar sensor that controls expression of beta-glucoside utilization genes by reversibly phosphorylating the transcriptional antiterminator BglG. Thus, BglF can exist in two opposed states: a nonstimulated state that inactivates BglG by phosphorylation and a sugar-stimulated state that activates BglG by dephosphorylation and phosphorylates the incoming sugar. Sugar phosphorylation and BglG (de)phosphorylation are both catalyzed by the same residue, Cys24. To investigate the coordination and the structural requirements of the opposing activities of BglF, we conducted a genetic screen that led to the isolation of mutations that shift the balance toward BglG phosphorylation. We show that some of the mutants that are impaired in dephosphorylation of BglG retained the ability to catalyze the concurrent activity of sugar phosphotransfer. These mutations map to two regions in the BglF membrane domain that, based on their predicted topology, were suggested to be implicated in activity. Using in vivo cross-linking, we show that a glycine in the membrane domain, whose substitution impaired the ability of BglF to dephosphorylate BglG, is spatially close to the active-site cysteine located in a hydrophilic domain. This residue is part of a newly identified motif conserved among beta-glucoside permeases associated with RNA-binding transcriptional antiterminators. The phenotype of the BglF mutants could be suppressed by BglG mutants that were isolated by a second genetic screen. In summary, we identified distinct sites in BglF that are involved in regulating phosphate flow via the common active-site residue in response to environmental cues.
Collapse
Affiliation(s)
- Galya Monderer-Rothkoff
- Department of Molecular Biology, The Hebrew University Medical School, P.O. Box 12272, Jerusalem 91120, Israel
| | | |
Collapse
|
26
|
Sulakhe D, Rodriguez A, D'Souza M, Wilde M, Nefedova V, Foster I, Maltsev N. GNARE: automated system for high-throughput genome analysis with grid computational backend. J Clin Monit Comput 2006; 19:361-9. [PMID: 16328950 DOI: 10.1007/s10877-005-3463-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2005] [Accepted: 06/30/2005] [Indexed: 10/25/2022]
Abstract
Recent progress in genomics and experimental biology has brought exponential growth of the biological information available for computational analysis in public genomics databases. However, applying the potentially enormous scientific value of this information to the understanding of biological systems requires computing and data storage technology of an unprecedented scale. The Grid, with its aggregated and distributed computational and storage infrastructure, offers an ideal platform for high-throughput bioinformatics analysis. To leverage this we have developed the Genome Analysis Research Environment (GNARE)--a scalable computational system for the high-throughput analysis of genomes, which provides an integrated database and computational backend for data-driven bioinformatics applications. GNARE efficiently automates the major steps of genome analysis including acquisition of data from multiple genomic databases; data analysis by a diverse set of bioinformatics tools; and storage of results and annotations. High-throughput computations in GNARE are performed using distributed heterogeneous Grid computing resources such as Grid2003, TeraGrid, and the DOE Science Grid. Multi-step genome analysis workflows involving massive data processing, the use of application-specific tools and algorithms and updating of an integrated database to provide interactive web access to results are all expressed and controlled by a "virtual data" model which transparently maps computational workflows to distributed Grid resources. This paper describes how Grid technologies such as Globus, Condor, and the Gryphyn Virtual Data System were applied in the development of GNARE. It focuses on our approach to Grid resource allocation and to the use of GNARE as a computational framework for the development of bioinformatics applications.
Collapse
Affiliation(s)
- Dinanath Sulakhe
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA
| | | | | | | | | | | | | |
Collapse
|
27
|
Araúzo-Bravo MJ, Ahmad S, Sarai A. Dimensionality of amino acid space and solvent accessibility prediction with neural networks. Comput Biol Chem 2006; 30:160-8. [PMID: 16545617 DOI: 10.1016/j.compbiolchem.2005.12.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2005] [Revised: 12/16/2005] [Accepted: 12/16/2005] [Indexed: 11/18/2022]
Abstract
Solvent accessibility prediction from amino acid sequences has been pursued by several researchers. Such a prediction typically starts by transforming the amino acid category (or type) information into numerical representations. All twenty amino acids can be completely and uniquely represented by 20-dimensional vectors. Here, we investigate if the amino acid space defined in this way really requires twenty dimensions. We tried to develop corresponding representations in fewer dimensions. A method for searching optimal codification schema in an arbitrary space using neural networks was developed. The method is used to obtain optimal encoding of amino acids at various levels of dimensionality, and applied to optimize the amino acid codifications for the prediction of the solvent accessibility values of the proteins using feed-forward neural networks. The traditional 20-dimensional codification seems to be redundant in solving the solvent accessibility prediction problem, since a 1-dimensional codification is able to achieve almost the same degree of accuracy as the 20-dimensional codification. Optimal coding in much fewer dimensions could be used to make the predictions of accessible surface area with almost the same degree of accuracy as that obtained by a fully unique 20-dimensional coding. The 1-dimensional amino acid codification for solvent accessibility prediction obtained by a purely mathematical way based on neural networks is highly correlated with a physical property of the amino acids, namely their average solvent accessibility. The method developed to find the optimal codification is general, although the codification thus produced is dependent on the type of estimated property.
Collapse
Affiliation(s)
- Marcos J Araúzo-Bravo
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka-ken 820-8502, Japan.
| | | | | |
Collapse
|
28
|
Swartz TH, Ikewada S, Ishikawa O, Ito M, Krulwich TA. The Mrp system: a giant among monovalent cation/proton antiporters? Extremophiles 2005; 9:345-54. [PMID: 15980940 DOI: 10.1007/s00792-005-0451-6] [Citation(s) in RCA: 123] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2005] [Accepted: 04/08/2005] [Indexed: 10/25/2022]
Abstract
Mrp systems are a novel and broadly distributed type of monovalent cation/proton antiporter of bacteria and archaea. Monovalent cation/proton antiporters are membrane transport proteins that catalyze efflux of cytoplasmic sodium, potassium or lithium ions in exchange for external hydrogen ions (protons). Other known monovalent cation antiporters are single gene products, whereas Mrp systems have been proposed to function as hetero-oligomers. A mrp operon typically has six or seven genes encoding hydrophobic proteins all of which are required for optimal Mrp-dependent sodium-resistance. There is little sequence similarity of Mrp proteins to other antiporters but three of these proteins have significant sequence similarity to membrane embedded subunits of ion-translocating electron transport complexes. Mrp antiporters have essential roles in the physiology of alkaliphilic and neutralophilic Bacillus species, nitrogen-fixing Sinorhizobium meliloti and in the pathogen Staphylococcus aureus, although these bacteria contain multiple monovalent cation/proton antiporters. The wide distribution of Mrp systems leads to the anticipation of important roles in an even wider variety of pathogens, extremophiles and environmentally important organisms. Here, the distribution, established physiological roles and catalytic activities of Mrp systems are reviewed, hypotheses regarding their complexity are discussed and major open questions about their function are highlighted.
Collapse
Affiliation(s)
- Talia H Swartz
- Department of Pharmacology & Biological Chemistry, Mount Sinai School of Medicine, New York, NY 10029, USA
| | | | | | | | | |
Collapse
|
29
|
Spence P, Bard J, Jones P, Betty M. The identification of G-protein coupled receptors in sequence databases. Expert Opin Ther Pat 2005. [DOI: 10.1517/13543776.8.3.235] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
30
|
Qian B, Ortiz AR, Baker D. Improvement of comparative model accuracy by free-energy optimization along principal components of natural structural variation. Proc Natl Acad Sci U S A 2004; 101:15346-51. [PMID: 15492216 PMCID: PMC524448 DOI: 10.1073/pnas.0404703101] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Accurate high-resolution refinement of protein structure models is a formidable challenge because of the delicate balance of forces in the native state, the difficulty in sampling the very large number of alternative tightly packed conformations, and the inaccuracies in current force fields. Indeed, energy-based refinement of comparative models generally leads to degradation rather than improvement in model quality, and, hence, most current comparative modeling procedures omit physically based refinement. However, despite their inaccuracies, current force fields do contain information that is orthogonal to the evolutionary information on which comparative models are based, and, hence, refinement might be able to improve comparative models if the space that is sampled is restricted sufficiently so that false attractors are avoided. Here, we use the principal components of the variation of backbone structures within a homologous family to define a small number of evolutionarily favored sampling directions and show that model quality can be improved by energy-based optimization along these directions.
Collapse
Affiliation(s)
- Bin Qian
- Howard Hughes Medical Institute and Department of Biochemistry, University of Washington, J-567 Health Sciences, Box 357350, Seattle, WA 98105, USA
| | | | | |
Collapse
|
31
|
Man O, Gilad Y, Lancet D. Prediction of the odorant binding site of olfactory receptor proteins by human-mouse comparisons. Protein Sci 2004; 13:240-54. [PMID: 14691239 PMCID: PMC2286516 DOI: 10.1110/ps.03296404] [Citation(s) in RCA: 117] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Olfactory receptors (ORs) are a large family of proteins involved in the recognition and discrimination of numerous odorants. These receptors belong to the G-protein coupled receptor (GPCR) hyperfamily, for which little structural data are available. In this study we predict the binding site residues of OR proteins by analyzing a set of 1441 OR protein sequences from mouse and human. The central insight utilized is that functional contact residues would be conserved among pairs of orthologous receptors, but considerably less conserved among paralogous pairs. Using judiciously selected subsets of 218 ortholog pairs and 518 paralog pairs, we have identified 22 sequence positions that are both highly conserved among the putative orthologs and variable among paralogs. These residues are disposed on transmembrane helices 2 to 7, and on the second extracellular loop of the receptor. Strikingly, although the prediction makes no assumption about the location of the binding site, these amino acid positions are clustered around a pocket in a structural homology model of ORs, mostly facing the inner lumen. We propose that the identified positions constitute the odorant binding site. This conclusion is supported by the observation that all but one of the predicted binding site residues correspond to ligand-contact positions in other rhodopsin-like GPCRs.
Collapse
Affiliation(s)
- Orna Man
- Department of Molecular Genetics and the Crown Human Genome Center, The Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | |
Collapse
|
32
|
Roberts MD, Martin NL, Kropinski AM. The genome and proteome of coliphage T1. Virology 2004; 318:245-66. [PMID: 14972552 DOI: 10.1016/j.virol.2003.09.020] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2003] [Revised: 09/18/2003] [Accepted: 09/22/2003] [Indexed: 11/19/2022]
Abstract
The genome of enterobacterial phage T1 has been sequenced, revealing that its 50.7-kb terminally redundant, circularly permuted sequence contains 48,836 bp of nonredundant nucleotides. Seventy-seven open reading frames (ORFs) were identified, with a high percentage of small genes located at the termini of the genomes displaying no homology to existing phage or prophage proteins. Of the genes showing homologs (47%), we identified those involved in host DNA degradation (three endonucleases) and T1 replication (DNA helicase, primase, and single-stranded DNA-binding proteins) and recombination (RecE and Erf homologs). While the tail genes showed homology to those from temperate coliphage N15, the capsid biosynthetic genes were unique. Phage proteins were resolved by 2D gel electrophoresis, and mass spectrometry was used to identify several of the spots including the major head, portal, and tail proteins, thus verifying the annotation.
Collapse
Affiliation(s)
- Mary D Roberts
- Biology Department, Radford University, Radford, VA 24142, USA
| | | | | |
Collapse
|
33
|
Abstract
We developed a method CHOP dissecting proteins into domain-like fragments. The basic idea was to cut proteins beginning from very reliable experimental information (PDB), proceeding to expert annotations of domain-like regions (Pfam-A), and completing through cuts based on termini of known proteins. In this way, CHOP dissected more than two thirds of all proteins from 62 proteomes. Analysis of our structural domain-like fragments revealed four surprising results. First, >70% of all dissected proteins contained more than one fragment. Second, most domains spanned on average over approximately 100 residues. This average was similar for eukaryotic and prokaryotic proteins, and it is also valid-although previously not described-for all proteins in the PDB. Third, single-domain proteins were significant longer than most domains in multidomain proteins. Fourth, three fourths of all domains appeared shorter than 210 residues. We believe that our CHOP fragments constituted an important resource for functional and structural genomics. Nevertheless, our main motivation to develop CHOP was that the single-linkage clustering method failed to adequately group full-length proteins. In contrast, CLUP-the simple clustering scheme CLUP introduced here-succeeded largely to group the CHOP fragments from 62 proteomes such that all members of one cluster shared a basic structural core. CLUP found >63,000 multi- and >118,000 single-member clusters. Although most fragments were restricted to a particular cluster, approximately 24% of the fragments were duplicated in at least two clusters. Our thresholds for grouping two fragments into the same cluster were rather conservative. Nevertheless, our results suggested that structural genomics initiatives have to target >30,000 fragments to at least cover the multimember clusters in 62 proteomes.
Collapse
Affiliation(s)
- Jinfeng Liu
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA
| | | |
Collapse
|
34
|
Baxter SM, Rosenblum JS, Knutson S, Nelson MR, Montimurro JS, Di Gennaro JA, Speir JA, Burbaum JJ, Fetrow JS. Synergistic Computational and Experimental Proteomics Approaches for More Accurate Detection of Active Serine Hydrolases in Yeast. Mol Cell Proteomics 2004; 3:209-25. [PMID: 14645503 DOI: 10.1074/mcp.m300082-mcp200] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
An analysis of the structurally and catalytically diverse serine hydrolase protein family in the Saccharomyces cerevisiae proteome was undertaken using two independent but complementary, large-scale approaches. The first approach is based on computational analysis of serine hydrolase active site structures; the second utilizes the chemical reactivity of the serine hydrolase active site in complex mixtures. These proteomics approaches share the ability to fractionate the complex proteome into functional subsets. Each method identified a significant number of sequences, but 15 proteins were identified by both methods. Eight of these were unannotated in the Saccharomyces Genome Database at the time of this study and are thus novel serine hydrolase identifications. Three of the previously uncharacterized proteins are members of a eukaryotic serine hydrolase family, designated as Fsh (family of serine hydrolase), identified here for the first time. OVCA2, a potential human tumor suppressor, and DYR-SCHPO, a dihydrofolate reductase from Schizosaccharomyces pombe, are members of this family. Comparing the combined results to results of other proteomic methods showed that only four of the 15 proteins were identified in a recent large-scale, "shotgun" proteomic analysis and eight were identified using a related, but similar, approach (neither identifies function). Only 10 of the 15 were annotated using alternate motif-based computational tools. The results demonstrate the precision derived from combining complementary, function-based approaches to extract biological information from complex proteomes. The chemical proteomics technology indicates that a functional protein is being expressed in the cell, while the computational proteomics technology adds details about the specific type of function and residue that is likely being labeled. The combination of synergistic methods facilitates analysis, enriches true positive results, and increases confidence in novel identifications. This work also highlights the risks inherent in annotation transfer and the use of scoring functions for determination of correct annotations.
Collapse
Affiliation(s)
- Susan M Baxter
- GeneFormatics, Inc., 5830 Oberlin Drive, Suite 200, San Diego, CA 92121, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Alberti-Segui C, Morales AJ, Xing H, Kessler MM, Willins DA, Weinstock KG, Cottarel G, Fechtel K, Rogers B. Identification of potential cell-surface proteins inCandida albicansand investigation of the role of a putative cell-surface glycosidase in adhesion and virulence. Yeast 2004; 21:285-302. [PMID: 15042589 DOI: 10.1002/yea.1061] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Cell-surface proteins are attractive targets for the development of novel antifungals as they are more accessible to drugs than are intracellular targets. By using a computational biology approach, we identified 180 potential cell-surface proteins in Candida albicans, including the known cell-surface adhesin Als1 and other cell-surface antigens, such as Pra1 and Csa1. Six proteins (named Csf1-6 for cell-surface factors) were selected for further biological characterization. First, we verified that the selected CSF genes are expressed in the yeast and/or hyphal form and then we investigated the effect of the loss of each CSF gene on cell-wall integrity, filamentation, adhesion to mammalian cells and virulence. As a result, we identified Csf4, a putative glycosidase with an apparent orthologue in Saccharomyces cerevisiae (Utr2), as an important factor for cell-wall integrity and maintenance. Interestingly, deletion of CSF4 also resulted in a defect in filamentation, a reduction in adherence to mammalian cells in an in vitro adhesion assay, and a prolongation of survival in an immunocompetent mouse model of disseminated candidiasis. A delay in colonization of key organs (e.g. kidney) was also observed, which is consistent with a reduction in virulence of the csf4-deletion strain. These data indicate a key role for extracellular glycosidases in fungal pathogenesis and represent a new site for therapeutic intervention to cure and prevent fungal disease.
Collapse
|
36
|
Boll M, Foltz M, Rubio-Aliaga I, Daniel H. A cluster of proton/amino acid transporter genes in the human and mouse genomes. Genomics 2003; 82:47-56. [PMID: 12809675 DOI: 10.1016/s0888-7543(03)00099-5] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
We recently cloned and functionally characterized two novel proton/amino acid transporters (PAT1 and PAT2) from mouse. Here we report the isolation of the corresponding cDNAs of the human orthologues and one additional mouse and human PAT-like transporter cDNA, designated PAT3. The PAT proteins comprise 470 to 483 amino acids. The mouse PAT3 mRNA is expressed in testis of adult mice. In the human and mouse genomes the genes of the PAT transporters (designated SLC36A1-3 and Slc36a1-3, respectively) are clustered on human chromosome 5q33.1 and in the syntenic region of mouse chromosome 11B1.3. PAT-like transporter genes are present as well in the genomes of other eukaryotic organisms such as Drosophila melanogaster and Caenorhabditis elegans. For the PAT3 subtype transporter, we could not yet identify its function. The human PAT1 and PAT2 transporters when functionally expressed in Xenopus laevis oocytes show characteristics similar to those of their mouse counterparts.
Collapse
Affiliation(s)
- Michael Boll
- Molecular Nutrition Unit, Institute of Nutritional Sciences, Technical University of Munich, Hochfeldweg 2, D-85350 Freising-Weihenstephan, Germany.
| | | | | | | |
Collapse
|
37
|
Chan JKL, Sun L, Yang XJ, Zhu G, Wu Z. Functional characterization of an amino-terminal region of HDAC4 that possesses MEF2 binding and transcriptional repressive activity. J Biol Chem 2003; 278:23515-21. [PMID: 12709441 DOI: 10.1074/jbc.m301922200] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Like the full-length histone deacetylase (HDAC) 4, its amino terminus (amino acids 1-208) without the carboxyl deacetylase domain is also known to effectively bind and repress myocyte enhancer factor 2 (MEF2). Within this repressive amino terminus, we further show that a stretch of 90 amino acids (119-208) displays MEF2 binding and repressive activity. The same region is also found to associate specifically with HDAC1 which is responsible for the repressive effect. The amino terminus of HDAC4 can associate with the DNA-bound MEF2 in vitro, suggesting that it does not repress MEF2 simply by disrupting the ability of MEF2 to bind DNA. In vivo, MEF2 induces nuclear translocation of both the full-length HDAC4 and HDAC4-(1-208), whereas the nuclear HDAC4 as well as HDAC4-(1-208) in turn specifically sequesters MEF2 to distinct nuclear bodies. In addition, we show that MyoD and HDAC4 functionally antagonize each other to regulate MEF2 activity. Combined with data from others, our data suggest that the full-length HDAC4 can repress MEF2 through multiple independent repressive domains.
Collapse
Affiliation(s)
- Jonathan K L Chan
- Department of Biochemistry, Hong Kong University of Science & Technology, Hong Kong, China
| | | | | | | | | |
Collapse
|
38
|
Muggleton SH, Bryant CH, Srinivasan A, Whittaker A, Topp S, Rawlings C. Are grammatical representations useful for learning from biological sequence data?--a case study. J Comput Biol 2002; 8:493-521. [PMID: 11694180 DOI: 10.1089/106652701753216512] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
This paper investigates whether Chomsky-like grammar representations are useful for learning cost-effective, comprehensible predictors of members of biological sequence families. The Inductive Logic Programming (ILP) Bayesian approach to learning from positive examples is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). Collectively, five of the co-authors of this paper, have extensive expertise on NPPs and general bioinformatics methods. Their motivation for generating a NPP grammar was that none of the existing bioinformatics methods could provide sufficient cost-savings during the search for new NPPs. Prior to this project experienced specialists at SmithKline Beecham had tried for many months to hand-code such a grammar but without success. Our best predictor makes the search for novel NPPs more than 100 times more efficient than randomly selecting proteins for synthesis and testing them for biological activity. As far as these authors are aware, this is both the first biological grammar learnt using ILP and the first real-world scientific application of the ILP Bayesian approach to learning from positive examples. A group of features is derived from this grammar. Other groups of features of NPPs are derived using other learning strategies. Amalgams of these groups are formed. A recognition model is generated for each amalgam using C4.5 and C4.5rules and its performance is measured using both predictive accuracy and a new cost function, Relative Advantage (RA). The highest RA was achieved by a model which includes grammar-derived features. This RA is significantly higher than the best RA achieved without the use of the grammar-derived features. Predictive accuracy is not a good measure of performance for this domain because it does not discriminate well between NPP recognition models: despite covering varying numbers of (the rare) positives, all the models are awarded a similar (high) score by predictive accuracy because they all exclude most of the abundant negatives.
Collapse
Affiliation(s)
- S H Muggleton
- Department of Computer Science, University of York, York YO10 5DD, United Kingdom
| | | | | | | | | | | |
Collapse
|
39
|
Fogolari F, Tessari S, Molinari H. Singular value decomposition analysis of protein sequence alignment score data. Proteins 2002; 46:161-70. [PMID: 11807944 DOI: 10.1002/prot.10032] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
One of the standard tools for the analysis of data arranged in matrix form is singular value decomposition (SVD). Few applications to genomic data have been reported to date mainly for the analysis of gene expression microarray data. We review SVD properties, examine mathematical terms and assumptions implicit in the SVD formalism, and show that SVD can be applied to the analysis of matrices representing pairwise alignment scores between large sets of protein sequences. In particular, we illustrate SVD capabilities for data dimension reduction and for clustering protein sequences. A comparison is performed between SVD-generated clusters of proteins and annotation reported in the SWISS-PROT Database for a set of protein sequences forming the calycin superfamily, entailing all entries corresponding to the lipocalin, cytosolic fatty acid-binding protein, and avidin-streptavidin Prosite patterns.
Collapse
Affiliation(s)
- F Fogolari
- Dipartimento Scientifico Tecnologico, Facoltà di Scienze, Università di Verona, Verona, Italy.
| | | | | |
Collapse
|
40
|
Lacy DB, Mourez M, Fouassier A, Collier RJ. Mapping the anthrax protective antigen binding site on the lethal and edema factors. J Biol Chem 2002; 277:3006-10. [PMID: 11714723 DOI: 10.1074/jbc.m109997200] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Entry of anthrax edema factor (EF) and lethal factor (LF) into the cytosol of eukaryotic cells depends on their ability to translocate across the endosomal membrane in the presence of anthrax protective antigen (PA). Here we report attributes of the N-terminal domains of EF and LF (EF(N) and LF(N), respectively) that are critical for their initial interaction with PA. We found that deletion of the first 36 residues of LF(N) had no effect on its binding to PA or its ability to be translocated. To map the binding site for PA, we used the three-dimensional structure of LF and sequence similarity between EF and LF to select positions for mutagenesis. We identified seven sites in LF(N) (Asp-182, Asp-187, Leu-188, Tyr-223, His-229, Leu-235, and Tyr-236) where mutation to Ala produced significant binding defects, with H229A and Y236A almost completely eliminating binding. Homologous mutants of EF(N) displayed nearly identical defects. Cytotoxicity assays confirmed that the LF(N) mutations impact intoxication. The seven mutation-sensitive amino acids are clustered on the surface of LF and form a small convoluted patch with both hydrophobic and hydrophilic character. We propose that this patch constitutes the recognition site for PA.
Collapse
Affiliation(s)
- D Borden Lacy
- Department of Microbiology and Molecular Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | | | |
Collapse
|
41
|
Altmann CR, Bell E, Sczyrba A, Pun J, Bekiranov S, Gaasterland T, Brivanlou AH. Microarray-based analysis of early development in Xenopus laevis. Dev Biol 2001; 236:64-75. [PMID: 11456444 DOI: 10.1006/dbio.2001.0298] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
In order to examine transcriptional regulation globally, during early vertebrate embryonic development, we have prepared Xenopus laevis cDNA microarrays. These prototype embryonic arrays contain 864 sequenced gastrula cDNA. In order to analyze and store array data, a microarray analysis pipeline was developed and integrated with sequence analysis and annotation tools. In three independent experimental settings, we demonstrate the power of these global approaches and provide optimized protocols for their application to molecular embryology. In the first set, by comparing maternal versus zygotic transcription, we document groups of genes that are temporally regulated. This analytical approach resulted in the discovery of novel temporally regulated genes. In the second, we examine changes in gene expression spatially during development by comparing dorsal and ventral mesoderm dissected from early gastrula embryos. We have discovered novel genes with spatial enrichment from these experiments. Finally, we use the prototype microarray to examine transcriptional responses from embryonic explants treated with activin. We selected genes (two of which are novel) regulated by activin for further characterization. All results obtained by the arrays were independently tested by RT-PCR or by in situ hybridization to provide a direct assessment of the accuracy and reproducibility of these approaches in the context of molecular embryology.
Collapse
Affiliation(s)
- C R Altmann
- Laboratory of Molecular Vertebrate Embryology, The Rockefeller University, 1230 York Avenue, New York, New York 10021, USA
| | | | | | | | | | | | | |
Collapse
|
42
|
Abstract
Eph receptor tyrosine kinases and their membrane-associated ligands, the ephrins, are essential regulators of axon guidance, cell migration, segmentation, and angiogenesis. There are two classes of vertebrate ephrin ligands which have distinct binding specificities for their cognate receptors. Multimerization of the ligands is required for receptor activation, and ephrin ligands themselves signal intracellularly upon binding Eph receptors. We have determined the structure of the extracellular domain of mouse ephrin-B2. The ephrin ectodomain is an eight-stranded beta barrel with topological similarity to plant nodulins and phytocyanins. Based on the structure, we have identified potential surface determinants of Eph/ephrin binding specificity and a ligand dimerization region. The high sequence similarity among ephrin ectodomains indicates that all ephrins may be modeled upon the ephrin-B2 structure presented here.
Collapse
Affiliation(s)
- J Toth
- Boston Biomedical Research Institute, Watertown, Massachusetts 02472, USA
| | | | | | | | | | | |
Collapse
|
43
|
Chasman D, Adams RM. Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. J Mol Biol 2001; 307:683-706. [PMID: 11254390 DOI: 10.1006/jmbi.2001.4510] [Citation(s) in RCA: 298] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
We have developed a formalism and a computational method for analyzing the potential functional consequences of non-synonymous single nucleotide polymorphisms. Our approach uses a structural model and phylogenetic information to derive a selection of structure and sequence-based features serving as indicators of an amino acid polymorphim's effect on function. The feature values can be integrated into a probabilistic assessment of whether an amino acid polymorphism will affect the function or stability of a target protein. The method has been validated with data sets of unbiased mutations in the lac repressor and lysoyzyme. Applying our methodology to recent surveys of genetic variation in the coding regions of clinically important genes, we estimate that approximately 26-32 % of the natural non-synonymous single nucleotide polymorphisms have effects on function. This estimate suggests that a typical person will have about 6240-12,800 heterozygous loci that encode proteins with functional variation due to natural amino acid polymorphism.
Collapse
Affiliation(s)
- D Chasman
- Variagenics, 60 Hampshire Street, Cambridge, MA 02144, USA.
| | | |
Collapse
|
44
|
|
45
|
Ober D, Hartmann T. Homospermidine synthase, the first pathway-specific enzyme of pyrrolizidine alkaloid biosynthesis, evolved from deoxyhypusine synthase. Proc Natl Acad Sci U S A 1999; 96:14777-82. [PMID: 10611289 PMCID: PMC24724 DOI: 10.1073/pnas.96.26.14777] [Citation(s) in RCA: 113] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Pyrrolizidine alkaloids are preformed plant defense compounds with sporadic phylogenetic distribution. They are thought to have evolved in response to the selective pressure of herbivory. The first pathway-specific intermediate of these alkaloids is the rare polyamine homospermidine, which is synthesized by homospermidine synthase (HSS). The HSS gene from Senecio vernalis was cloned and shown to be derived from the deoxyhypusine synthase (DHS) gene, which is highly conserved among all eukaryotes and archaebacteria. DHS catalyzes the first step in the activation of translation initiation factor 5A (eIF5A), which is essential for eukaryotic cell proliferation and which acts as a cofactor of the HIV-1 Rev regulatory protein. Sequence comparison provides direct evidence for the evolutionary recruitment of an essential gene of primary metabolism (DHS) for the origin of the committing step (HSS) in the biosynthesis of pyrrolizidine alkaloids.
Collapse
Affiliation(s)
- D Ober
- Institut für Pharmazeutische Biologie der Technischen Universität Braunschweig, Mendelssohnstrasse 1, D-38106 Braunschweig, Germany
| | | |
Collapse
|
46
|
Ober D, Hartmann T. Deoxyhypusine synthase from tobacco. cDNA isolation, characterization, and bacterial expression of an enzyme with extended substrate specificity. J Biol Chem 1999; 274:32040-7. [PMID: 10542236 DOI: 10.1074/jbc.274.45.32040] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Deoxyhypusine synthase catalyzes the formation of a deoxyhypusine residue in the translation eukaryotic initiation factor 5A (eIF5A) precursor protein by transferring an aminobutyl moiety from spermidine onto a conserved lysine residue within the eIF5A polypeptide chain. This reaction commences the activation of the initiation factor in fungi and vertebrates. A mechanistically identical reaction is known in the biosynthetic pathway leading to pyrrolizidine alkaloids in plants. Deoxyhypusine synthase from tobacco was cloned and expressed in active form in Escherichia coli. It catalyzes the formation of a deoxyhypusine residue in the tobacco eIF5A substrate as shown by gas chromatography coupled with a mass spectrometer. The enzyme also accepts free putrescine as the aminobutyl acceptor, instead of lysine bound in the eIF5A polypeptide chain, yielding homospermidine. Conversely, it accepts homospermidine instead of spermidine as the aminobutyl donor, whereby the reactions with putrescine and homospermidine proceed at the same rate as those involving the authentic substrates. The conversion of deoxyhypusine synthase-catalyzed eIF5A deoxyhypusinylation pinpoints a function for spermidine in plant metabolism. Furthermore, and quite unexpectedly, the substrate spectrum of deoxyhypusine synthase hints at a biochemical basis behind the sparse and skew occurrence of both homospermidine and its pyrrolizidine derivatives across distantly related plant taxa.
Collapse
Affiliation(s)
- D Ober
- Institut für Pharmazeutische Biologie der Technischen Universität Braunschweig, Mendelssohnstrasse 1, D-38106 Braunschweig, Germany
| | | |
Collapse
|
47
|
Rigoutsos I, Floratos A, Ouzounis C, Gao Y, Parida L. Dictionary building via unsupervised hierarchical motif discovery in the sequence space of natural proteins. Proteins 1999; 37:264-77. [PMID: 10584071 DOI: 10.1002/(sici)1097-0134(19991101)37:2<264::aid-prot11>3.0.co;2-c] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Using Teiresias, a pattern discovery method that identifies all motifs present in any given set of protein sequences without requiring alignment or explicit enumeration of the solution space, we have explored the GenPept sequence database and built a dictionary of all sequence patterns with two or more instances. The entries of this dictionary, henceforth named seqlets, cover 98.12% of all amino acid positions in the input database and in essence provide a comprehensive finite set of descriptors for protein sequence space. As such, seqlets can be effectively used to describe almost every naturally occurring protein. In fact, seqlets can be thought of as building blocks of protein molecules that are a necessary (but not sufficient) condition for function or family equivalence memberships. Thus, seqlets can either define conserved family signatures or cut across molecular families and previously undetected sequence signals deriving from functional convergence. Moreover, we show that seqlets also can capture structurally conserved motifs. The availability of a dictionary of seqlets that has been derived in such an unsupervised, hierarchical manner is generating new opportunities for addressing problems that range from reliable classification and the correlation of sequence fragments with functional categories to faster and sensitive engines for homology searches, evolutionary studies, and protein structure prediction.
Collapse
Affiliation(s)
- I Rigoutsos
- Computational Biology Center, Thomas J. Watson Research Center, Yorktown Heights, New York 10598, USA.
| | | | | | | | | |
Collapse
|
48
|
Abstract
Researchers who study human pathogens are often interested in unique and essential aspects of the biology of the pathogen. Recent progress has been made in understanding such a target in kinetoplastid parasites. The paraflagellar rod is a unique cytoskeletal structure that plays a key role in the life-cycle of these fascinating organisms. This review discusses the protein components and structure of the paraflagellar rod and its function in cell motility.
Collapse
Affiliation(s)
- J A Maga
- Dept of Biochemistry, Purdue University, West Lafayette, IN 47907, USA
| | | |
Collapse
|
49
|
Abstract
The clostridial neurotoxins (CNTs), comprised of tetanus neurotoxin (TeNT) and the seven serotypes of botulinum neurotoxin (BoNT A-G), specifically bind to neuronal cells and disrupt neurotransmitter release by cleaving proteins involved in synaptic vesicle membrane fusion. In this study, multiple CNT sequences were analyzed within the context of the 1277 residue BoNT/A crystal structure to gain insight into the events of binding, pore formation, translocation, and catalysis that are required for toxicity. A comparison of the TeNT-binding domain structure to that of BoNT/A reveals striking differences in their surface properties. Further, the solvent accessibility of a key tryptophan in the C terminus of the BoNT/A-binding domain refines the location of the ganglioside-binding site. Data collected from a single frozen crystal of BoNT/A are included in this study, revealing slight differences in the binding domain orientation as well as density for a previously unobserved translocation domain loop. This loop and the conservation of charged residues with structural proximity to putative pore-forming sequences lend insight into the CNT mechanism of pore formation and translocation. The sequence analysis of the catalytic domain revealed an area near the active-site likely to account for specificity differences between the CNTs. It revealed also a tertiary structure, highly conserved in primary sequence, which seems critical to catalysis but is 30 A from the active-site zinc ion. This observation, along with an analysis of the 54 residue "belt" from the translocation domain are discussed with respect to the mechanism of catalysis.
Collapse
Affiliation(s)
- D B Lacy
- Department of Chemistry, University of California, Berkeley 94720, USA
| | | |
Collapse
|
50
|
Huang GM, Ng WL, Farkas J, He L, Liang HA, Gordon D, Yu J, Hood L. Prostate cancer expression profiling by cDNA sequencing analysis. Genomics 1999; 59:178-86. [PMID: 10409429 DOI: 10.1006/geno.1999.5822] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Prostate cancer is a frequently diagnosed solid tumor that is originated mostly from prostate epithelium. One of the key issues in prostate cancer research is to develop molecular markers that can effectively detect and distinguish the progression and malignancy of prostate tumors. Automated, single-pass cDNA sequencing was utilized to rapidly identify expressed genes in a number of cDNA libraries constructed from various normal and tumor prostatic tissues. These included cell lines as well as short-term epithelial culture. A total of 6604 expressed sequence tags (ESTs) were generated and searched against on-line nucleotide and protein databases. A relational database centric software system was constructed to process, store, and analyze EST data rapidly. cDNA contigs were also obtained by assembly of multiple EST sequences. Protein structural signatures were annotated using motif analysis tools including BLOCKS and an in-house-designed neural network. Cross-library comparisons revealed their unique gene expression profiles. Several differentially expressed cDNA clones were identified, and their expression patterns were confirmed by RNA dot blot and RT-PCR analyses.
Collapse
Affiliation(s)
- G M Huang
- Department of Molecular Biotechnology, University of Washington, Seattle, Washington 98195, USA.
| | | | | | | | | | | | | | | |
Collapse
|