251
|
Ku SY, Hu YJ. Protein structure search and local structure characterization. BMC Bioinformatics 2008; 9:349. [PMID: 18721472 PMCID: PMC2529324 DOI: 10.1186/1471-2105-9-349] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2008] [Accepted: 08/22/2008] [Indexed: 11/10/2022] Open
Abstract
Background Structural similarities among proteins can provide valuable insight into their functional mechanisms and relationships. As the number of available three-dimensional (3D) protein structures increases, a greater variety of studies can be conducted with increasing efficiency, among which is the design of protein structural alphabets. Structural alphabets allow us to characterize local structures of proteins and describe the global folding structure of a protein using a one-dimensional (1D) sequence. Thus, 1D sequences can be used to identify structural similarities among proteins using standard sequence alignment tools such as BLAST or FASTA. Results We used self-organizing maps in combination with a minimum spanning tree algorithm to determine the optimum size of a structural alphabet and applied the k-means algorithm to group protein fragnts into clusters. The centroids of these clusters defined the structural alphabet. We also developed a flexible matrix training system to build a substitution matrix (TRISUM-169) for our alphabet. Based on FASTA and using TRISUM-169 as the substitution matrix, we developed the SA-FAST alignment tool. We compared the performance of SA-FAST with that of various search tools in database-scale search tasks and found that SA-FAST was highly competitive in all tests conducted. Further, we evaluated the performance of our structural alphabet in recognizing specific structural domains of EGF and EGF-like proteins. Our method successfully recovered more EGF sub-domains using our structural alphabet than when using other structural alphabets. SA-FAST can be found at . Conclusion The goal of this project was two-fold. First, we wanted to introduce a modular design pipeline to those who have been working with structural alphabets. Secondly, we wanted to open the door to researchers who have done substantial work in biological sequences but have yet to enter the field of protein structure research. Our experiments showed that by transforming the structural representations from 3D to 1D, several 1D-based tools can be applied to structural analysis, including similarity searches and structural motif finding.
Collapse
Affiliation(s)
- Shih-Yen Ku
- Department of Computer Science, National Chiao Tung University, 1001 University Rd. Hsinchu, Taiwan.
| | | |
Collapse
|
252
|
Furney SJ, Calvo B, Larrañaga P, Lozano JA, Lopez-Bigas N. Prioritization of candidate cancer genes--an aid to oncogenomic studies. Nucleic Acids Res 2008; 36:e115. [PMID: 18710882 PMCID: PMC2566894 DOI: 10.1093/nar/gkn482] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The development of techniques for oncogenomic analyses such as array comparative genomic hybridization, messenger RNA expression arrays and mutational screens have come to the fore in modern cancer research. Studies utilizing these techniques are able to highlight panels of genes that are altered in cancer. However, these candidate cancer genes must then be scrutinized to reveal whether they contribute to oncogenesis or are coincidental and non-causative. We present a computational method for the prioritization of candidate (i) proto-oncogenes and (ii) tumour suppressor genes from oncogenomic experiments. We constructed computational classifiers using different combinations of sequence and functional data including sequence conservation, protein domains and interactions, and regulatory data. We found that these classifiers are able to distinguish between known cancer genes and other human genes. Furthermore, the classifiers also discriminate candidate cancer genes from a recent mutational screen from other human genes. We provide a web-based facility through which cancer biologists may access our results and we propose computational cancer gene classification as a useful method of prioritizing candidate cancer genes identified in oncogenomic studies.
Collapse
Affiliation(s)
- Simon J Furney
- Research Unit on Biomedical Informatics, Experimental and Health Science Department, Universitat Pompeu Fabra, Barcelona 08080, Spain
| | | | | | | | | |
Collapse
|
253
|
Abstract
Genes expressed in testes are critical to male reproductive success, affecting spermatogenesis, sperm competition, and sperm-egg interaction. Comparing the evolution of testis proteins at different taxonomic levels can reveal which genes and functional classes are targets of natural and sexual selection and whether the same genes are targets among taxa. Here we examine the evolution of testis-expressed proteins at different levels of divergence among three rodents, mouse (Mus musculus), rat (Rattus norvegicus), and deer mouse (Peromyscus maniculatus), to identify rapidly evolving genes. Comparison of expressed sequence tags (ESTs) from testes suggests that proteins with testis-specific expression evolve more rapidly on average than proteins with maximal expression in other tissues. Genes with the highest rates of evolution have a variety of functional roles including signal transduction, DNA binding, and egg-sperm interaction. Most of these rapidly evolving genes have not been identified previously as targets of selection in comparisons among more divergent mammals. To determine if these genes are evolving rapidly among closely related species, we sequenced 11 of these genes in six Peromyscus species and found evidence for positive selection in five of them. Together, these results demonstrate rapid evolution of functionally diverse testis-expressed proteins in rodents, including the identification of amino acids under lineage-specific selection in Peromyscus. Evidence for positive selection among closely related species suggests that changes in these proteins may have consequences for reproductive isolation.
Collapse
|
254
|
Hadjebi O, Casas-Terradellas E, Garcia-Gonzalo FR, Rosa JL. The RCC1 superfamily: From genes, to function, to disease. BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH 2008; 1783:1467-79. [DOI: 10.1016/j.bbamcr.2008.03.015] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2007] [Revised: 03/19/2008] [Accepted: 03/20/2008] [Indexed: 02/07/2023]
|
255
|
Szczepanowski R, Bekel T, Goesmann A, Krause L, Krömeke H, Kaiser O, Eichler W, Pühler A, Schlüter A. Insight into the plasmid metagenome of wastewater treatment plant bacteria showing reduced susceptibility to antimicrobial drugs analysed by the 454-pyrosequencing technology. J Biotechnol 2008; 136:54-64. [DOI: 10.1016/j.jbiotec.2008.03.020] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2008] [Revised: 03/20/2008] [Accepted: 03/31/2008] [Indexed: 11/28/2022]
|
256
|
Fröhlich H, Fellmann M, Sültmann H, Poustka A, Beissbarth T. Predicting pathway membership via domain signatures. ACTA ACUST UNITED AC 2008; 24:2137-42. [PMID: 18676972 PMCID: PMC2553439 DOI: 10.1093/bioinformatics/btn403] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Functional characterization of genes is of great importance for the understanding of complex cellular processes. Valuable information for this purpose can be obtained from pathway databases, like KEGG. However, only a small fraction of genes is annotated with pathway information up to now. In contrast, information on contained protein domains can be obtained for a significantly higher number of genes, e.g. from the InterPro database. RESULTS We present a classification model, which for a specific gene of interest can predict the mapping to a KEGG pathway, based on its domain signature. The classifier makes explicit use of the hierarchical organization of pathways in the KEGG database. Furthermore, we take into account that a specific gene can be mapped to different pathways at the same time. The classification method produces a scoring of all possible mapping positions of the gene in the KEGG hierarchy. Evaluations of our model, which is a combination of a SVM and ranking perceptron approach, show a high prediction performance. Moreover, for signaling pathways we reveal that it is even possible to forecast accurately the membership to individual pathway components. AVAILABILITY The R package gene2pathway is a supplement to this article.
Collapse
Affiliation(s)
- Holger Fröhlich
- German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany.
| | | | | | | | | |
Collapse
|
257
|
Meta-basic estimates the size of druggable human genome. J Mol Model 2008; 15:695-9. [PMID: 18663489 DOI: 10.1007/s00894-008-0353-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2008] [Accepted: 07/10/2008] [Indexed: 10/21/2022]
Abstract
We present here the estimation of the upper limit of the number of molecular targets in the human genome that represent an opportunity for further therapeutic treatment. We select around approximately 6300 human proteins that are similar to sequences of known protein targets collected from DrugBank database. Our bioinformatics study estimates the size of 'druggable' human genome to be around 20% of human proteome, i.e. the number of the possible protein targets for small-molecule drug design in medicinal chemistry. We do not take into account any toxicity prediction, the three-dimensional characteristics of the active site in the predicted 'druggable' protein families, or detailed chemical analysis of known inhibitors/drugs. Instead we rely on remote homology detection method Meta-BASIC, which is based on sequence and structural similarity. The prepared dataset of all predicted protein targets from human genome presents the unique opportunity for developing and benchmarking various in silico chemo/bio-informatics methods in the context of the virtual high throughput screening.
Collapse
|
258
|
Crowhurst RN, Gleave AP, MacRae EA, Ampomah-Dwamena C, Atkinson RG, Beuning LL, Bulley SM, Chagne D, Marsh KB, Matich AJ, Montefiori M, Newcomb RD, Schaffer RJ, Usadel B, Allan AC, Boldingh HL, Bowen JH, Davy MW, Eckloff R, Ferguson AR, Fraser LG, Gera E, Hellens RP, Janssen BJ, Klages K, Lo KR, MacDiarmid RM, Nain B, McNeilage MA, Rassam M, Richardson AC, Rikkerink EH, Ross GS, Schröder R, Snowden KC, Souleyre EJF, Templeton MD, Walton EF, Wang D, Wang MY, Wang YY, Wood M, Wu R, Yauk YK, Laing WA. Analysis of expressed sequence tags from Actinidia: applications of a cross species EST database for gene discovery in the areas of flavor, health, color and ripening. BMC Genomics 2008; 9:351. [PMID: 18655731 PMCID: PMC2515324 DOI: 10.1186/1471-2164-9-351] [Citation(s) in RCA: 118] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2008] [Accepted: 07/27/2008] [Indexed: 11/13/2022] Open
Abstract
Background Kiwifruit (Actinidia spp.) are a relatively new, but economically important crop grown in many different parts of the world. Commercial success is driven by the development of new cultivars with novel consumer traits including flavor, appearance, healthful components and convenience. To increase our understanding of the genetic diversity and gene-based control of these key traits in Actinidia, we have produced a collection of 132,577 expressed sequence tags (ESTs). Results The ESTs were derived mainly from four Actinidia species (A. chinensis, A. deliciosa, A. arguta and A. eriantha) and fell into 41,858 non redundant clusters (18,070 tentative consensus sequences and 23,788 EST singletons). Analysis of flavor and fragrance-related gene families (acyltransferases and carboxylesterases) and pathways (terpenoid biosynthesis) is presented in comparison with a chemical analysis of the compounds present in Actinidia including esters, acids, alcohols and terpenes. ESTs are identified for most genes in color pathways controlling chlorophyll degradation and carotenoid biosynthesis. In the health area, data are presented on the ESTs involved in ascorbic acid and quinic acid biosynthesis showing not only that genes for many of the steps in these pathways are represented in the database, but that genes encoding some critical steps are absent. In the convenience area, genes related to different stages of fruit softening are identified. Conclusion This large EST resource will allow researchers to undertake the tremendous challenge of understanding the molecular basis of genetic diversity in the Actinidia genus as well as provide an EST resource for comparative fruit genomics. The various bioinformatics analyses we have undertaken demonstrates the extent of coverage of ESTs for genes encoding different biochemical pathways in Actinidia.
Collapse
Affiliation(s)
- Ross N Crowhurst
- The Horticultural and Food Research Institute of New Zealand, PB 92169, Auckland, New Zealand.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
259
|
Vaughan A, Chiu SY, Ramasamy G, Li L, Gardner MJ, Tarun AS, Kappe SHI, Peng X. Assessment and improvement of the Plasmodium yoelii yoelii genome annotation through comparative analysis. ACTA ACUST UNITED AC 2008; 24:i383-9. [PMID: 18586738 PMCID: PMC2718618 DOI: 10.1093/bioinformatics/btn140] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Motivation: The sequencing of the Plasmodium yoelii genome, a model rodent malaria parasite, has greatly facilitated research for the development of new drug and vaccine candidates against malaria. Unfortunately, only preliminary gene models were annotated on the partially sequenced genome, mostly by in silico gene prediction, and there has been no major improvement of the annotation since 2002. Results: Here we report on a systematic assessment of the accuracy of the genome annotation based on a detailed analysis of a comprehensive set of cDNA sequences and proteomics data. We found that the coverage of the current annotation tends to be biased toward genes expressed in the blood stages of the parasite life cycle. Based on our proteomic analysis, we estimate that about 15% of the liver stage proteome data we have generated is absent from the current annotation. Through comparative analysis we identified and manually curated a further 510 P. yoelii genes which have clear orthologs in the P. falciparum genome, but were not present or incorrectly annotated in the current annotation. This study suggests that improvements of the current P. yoelii genome annotation should focus on genes expressed in stages other than blood stages. Comparative analysis will be critically helpful for this re-annotation. The addition of newly annotated genes will facilitate the use of P. yoelii as a model system for studying human malaria. Contact:xinxia.peng@sbri.org; stefan.kappe@sbri.org Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ashley Vaughan
- Seattle Biomedical Research Institute, Seattle, WA 98109, USA
| | | | | | | | | | | | | | | |
Collapse
|
260
|
Loewenstein Y, Portugaly E, Fromer M, Linial M. Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space. Bioinformatics 2008; 24:i41-9. [PMID: 18586742 PMCID: PMC2718652 DOI: 10.1093/bioinformatics/btn174] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. APPLICATION We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. RESULTS We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. AVAILABILITY A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request.
Collapse
Affiliation(s)
- Yaniv Loewenstein
- School of Computer Science and Engineering, Institute of Life Sciences, The Hebrew University of Jerusalem, Israel
| | | | | | | |
Collapse
|
261
|
Aryee MJA, Quackenbush J. An optimized predictive strategy for interactome mapping. J Proteome Res 2008; 7:4089-94. [PMID: 18642945 DOI: 10.1021/pr700858e] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We present an optimized experimental strategy that can accelerate progress toward identifying the majority of pairwise protein interactions. Our method involves applying a predictive algorithm, based on the existing data, to identify protein pairs likely to interact and prioritizing these for screening. The approach is iterative as additional data allows one to refine predictions directing the next stage of experimentation.
Collapse
Affiliation(s)
- Martin J A Aryee
- Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, Massachusetts 02115, USA
| | | |
Collapse
|
262
|
Xia D, Sanderson SJ, Jones AR, Prieto JH, Yates JR, Bromley E, Tomley FM, Lal K, Sinden RE, Brunk BP, Roos DS, Wastling JM. The proteome of Toxoplasma gondii: integration with the genome provides novel insights into gene expression and annotation. Genome Biol 2008; 9:R116. [PMID: 18644147 PMCID: PMC2530874 DOI: 10.1186/gb-2008-9-7-r116] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2008] [Revised: 06/17/2008] [Accepted: 07/21/2008] [Indexed: 11/10/2022] Open
Abstract
A proteomics analysis identifies one third of the predicted Toxoplasma gondii proteins and integrates proteomics and genomics data to refine genome annotation. Background Although the genomes of many of the most important human and animal pathogens have now been sequenced, our understanding of the actual proteins expressed by these genomes and how well they predict protein sequence and expression is still deficient. We have used three complementary approaches (two-dimensional electrophoresis, gel-liquid chromatography linked tandem mass spectrometry and MudPIT) to analyze the proteome of Toxoplasma gondii, a parasite of medical and veterinary significance, and have developed a public repository for these data within ToxoDB, making for the first time proteomics data an integral part of this key genome resource. Results The draft genome for Toxoplasma predicts around 8,000 genes with varying degrees of confidence. Our data demonstrate how proteomics can inform these predictions and help discover new genes. We have identified nearly one-third (2,252) of all the predicted proteins, with 2,477 intron-spanning peptides providing supporting evidence for correct splice site annotation. Functional predictions for each protein and key pathways were determined from the proteome. Importantly, we show evidence for many proteins that match alternative gene models, or previously unpredicted genes. For example, approximately 15% of peptides matched more convincingly to alternative gene models. We also compared our data with existing transcriptional data in which we highlight apparent discrepancies between gene transcription and protein expression. Conclusion Our data demonstrate the importance of protein data in expression profiling experiments and highlight the necessity of integrating proteomic with genomic data so that iterative refinements of both annotation and expression models are possible.
Collapse
Affiliation(s)
- Dong Xia
- Department of Pre-clinical Veterinary Science, Faculty of Veterinary Science, University of Liverpool, Liverpool L69 7ZJ, UK.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
263
|
Royer L, Reimann M, Andreopoulos B, Schroeder M. Unraveling protein networks with power graph analysis. PLoS Comput Biol 2008; 4:e1000108. [PMID: 18617988 PMCID: PMC2424176 DOI: 10.1371/journal.pcbi.1000108] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2007] [Accepted: 05/29/2008] [Indexed: 11/28/2022] Open
Abstract
Networks play a crucial role in computational biology, yet their analysis and representation is still an open problem. Power Graph Analysis is a lossless transformation of biological networks into a compact, less redundant representation, exploiting the abundance of cliques and bicliques as elementary topological motifs. We demonstrate with five examples the advantages of Power Graph Analysis. Investigating protein-protein interaction networks, we show how the catalytic subunits of the casein kinase II complex are distinguishable from the regulatory subunits, how interaction profiles and sequence phylogeny of SH3 domains correlate, and how false positive interactions among high-throughput interactions are spotted. Additionally, we demonstrate the generality of Power Graph Analysis by applying it to two other types of networks. We show how power graphs induce a clustering of both transcription factors and target genes in bipartite transcription networks, and how the erosion of a phosphatase domain in type 22 non-receptor tyrosine phosphatases is detected. We apply Power Graph Analysis to high-throughput protein interaction networks and show that up to 85% (56% on average) of the information is redundant. Experimental networks are more compressible than rewired ones of same degree distribution, indicating that experimental networks are rich in cliques and bicliques. Power Graphs are a novel representation of networks, which reduces network complexity by explicitly representing re-occurring network motifs. Power Graphs compress up to 85% of the edges in protein interaction networks and are applicable to all types of networks such as protein interactions, regulatory networks, or homology networks. Networks play a crucial role in biology and are often used as a way to represent experimental results. Yet, their analysis and representation is still an open problem. Recent experimental and computational progress yields networks of increased size and complexity. There are, for example, small- and large-scale interaction networks, regulatory networks, genetic networks, protein-ligand interaction networks, and homology networks analyzed and published regularly. A common way to access the information in a network is though direct visualization, but this fails as it often just results in “fur balls” from which little insight can be gathered. On the other hand, clustering techniques manage to avoid the problems caused by the large number of nodes and even larger number of edges by coarse-graining the networks and thus abstracting details. But these also fail, since, in fact, much of the biology lies in the details. This work presents a novel methodology for analyzing and representing networks. Power Graphs are a lossless representation of networks, which reduces network complexity by explicitly representing re-occurring network motifs. Moreover, power graphs can be clearly visualized: they compress up to 90% of the edges in biological networks and are applicable to all types of networks such as protein interaction, regulatory networks, or homology networks.
Collapse
Affiliation(s)
- Loïc Royer
- Biotechnology Center, Technische Universität Dresden, Dresden, Germany
| | | | | | | |
Collapse
|
264
|
Stockinger H, Attwood T, Chohan SN, Côté R, Cudré-Mauroux P, Falquet L, Fernandes P, Finn RD, Hupponen T, Korpelainen E, Labarga A, Laugraud A, Lima T, Pafilis E, Pagni M, Pettifer S, Phan I, Rahman N. Experience using web services for biological sequence analysis. Brief Bioinform 2008; 9:493-505. [PMID: 18621748 DOI: 10.1093/bib/bbn029] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Programmatic access to data and tools through the web using so-called web services has an important role to play in bioinformatics. In this article, we discuss the most popular approaches based on SOAP/WS-I and REST and describe our, a cross section of the community, experiences with providing and using web services in the context of biological sequence analysis. We briefly review main technological approaches as well as best practice hints that are useful for both users and developers. Finally, syntactic and semantic data integration issues with multiple web services are discussed.
Collapse
Affiliation(s)
- Heinz Stockinger
- Swiss Institute of Bioinformatics, Vital-IT Group, Lausanne, Switzerland.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
265
|
Wasmuth J, Schmid R, Hedley A, Blaxter M. On the extent and origins of genic novelty in the phylum Nematoda. PLoS Negl Trop Dis 2008; 2:e258. [PMID: 18596977 PMCID: PMC2432500 DOI: 10.1371/journal.pntd.0000258] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2008] [Accepted: 06/09/2008] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND The phylum Nematoda is biologically diverse, including parasites of plants and animals as well as free-living taxa. Underpinning this diversity will be commensurate diversity in expressed genes, including gene sets associated specifically with evolution of parasitism. METHODS AND FINDINGS Here we have analyzed the extensive expressed sequence tag data (available for 37 nematode species, most of which are parasites) and define over 120,000 distinct putative genes from which we have derived robust protein translations. Combined with the complete proteomes of Caenorhabditis elegans and Caenorhabditis briggsae, these proteins have been grouped into 65,000 protein families that in turn contain 40,000 distinct protein domains. We have mapped the occurrence of domains and families across the Nematoda and compared the nematode data to that available for other phyla. Gene loss is common, and in particular we identify nearly 5,000 genes that may have been lost from the lineage leading to the model nematode C. elegans. We find a preponderance of novelty, including 56,000 nematode-restricted protein families and 26,000 nematode-restricted domains. Mapping of the latest time-of-origin of these new families and domains across the nematode phylogeny revealed ongoing evolution of novelty. A number of genes from parasitic species had signatures of horizontal transfer from their host organisms, and parasitic species had a greater proportion of novel, secreted proteins than did free-living ones. CONCLUSIONS These classes of genes may underpin parasitic phenotypes, and thus may be targets for development of effective control measures.
Collapse
Affiliation(s)
- James Wasmuth
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
- Program for Molecular Structure and Function, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Ralf Schmid
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
- Department of Biochemistry, University of Leicester, Leicester, United Kingdom
| | - Ann Hedley
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Mark Blaxter
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
- * E-mail:
| |
Collapse
|
266
|
Abstract
MOTIVATION Computational assignment of protein function may be the single most vital application of bioinformatics in the post-genome era. These assignments are made based on various protein features, where one is the presence of identifiable domains. The relationship between protein domain content and function is important to investigate, to understand how domain combinations encode complex functions. RESULTS Two different models are presented on how protein domain combinations yield specific functions: one rule-based and one probabilistic. We demonstrate how these are useful for Gene Ontology annotation transfer. The first is an intuitive generalization of the Pfam2GO mapping, and detects cases of strict functional implications of sets of domains. The second uses a probabilistic model to represent the relationship between domain content and annotation terms, and was found to be better suited for incomplete training sets. We implemented these models as predictors of Gene Ontology functional annotation terms. Both predictors were more accurate than conventional best BLAST-hit annotation transfer and more sensitive than a single-domain model on a large-scale dataset. We present a number of cases where combinations of Pfam-A protein domains predict functional terms that do not follow from the individual domains. AVAILABILITY Scripts and documentation are available for download at http://sonnhammer.sbc.su.se/multipfam2go_source_docs.tar
Collapse
Affiliation(s)
- Kristoffer Forslund
- Stockholm Bioinformatics Centre, Stockholm University, 10691 Stockholm, Sweden.
| | | |
Collapse
|
267
|
He XP, Xu XW, Zhao SH, Fan B, Yu M, Zhu MJ, Li CC, Peng ZZ, Liu B. Investigation of Lpin1 as a candidate gene for fat deposition in pigs. Mol Biol Rep 2008; 36:1175-80. [PMID: 18581256 DOI: 10.1007/s11033-008-9294-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2008] [Accepted: 06/11/2008] [Indexed: 10/21/2022]
Abstract
Lpin1 deficiency prevents normal adipose tissue development and remarkably reduces adipose tissue mass, while overexpression of the Lpin1 gene in either skeletal muscle or adipose tissue promotes adiposity in mice. However, little is known about the porcine Lpin1 gene. In the present study, a 5,559-bp cDNA sequence of the porcine Lpin1 gene was obtained by RT-PCR and 3'RACE. The sequence consisted of a 111-bp 5'UTR, a 2,685-bp open reading frame encoding a protein of 894 amino acids and a 2,763-bp 3'UTR. Semi-quantitative RT-PCR analysis revealed that Lpin1 had a high level of expression in the liver, spleen, skeletal muscle and fat, a low level of expression in the heart, lung and kidney. The porcine Lpin1 gene was assigned to 3q21-27 by using the somatic cell hybrid panel (SCHP) and the radiation hybrid (IMpRH) panel. One C93T single nucleotide polymorphism (SNP) was identified and genotyped using the TaqI PCR-RFLP method. Association analysis between the genotypes and fat deposition traits suggested that different genotypes of the Lpin1 gene were associated with percentage of leaf fat and intramuscular fat.
Collapse
Affiliation(s)
- X P He
- Laboratory of Molecular Biology and Animal Breeding, Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, People's Republic of China
| | | | | | | | | | | | | | | | | |
Collapse
|
268
|
Liu F, Chen P, Cui SJ, Wang ZQ, Han ZG. SjTPdb: integrated transcriptome and proteome database and analysis platform for Schistosoma japonicum. BMC Genomics 2008; 9:304. [PMID: 18578888 PMCID: PMC2447853 DOI: 10.1186/1471-2164-9-304] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2007] [Accepted: 06/26/2008] [Indexed: 11/22/2022] Open
Abstract
Background Schistosoma japonicum is one of the three major blood fluke species, the etiological agents of schistosomiasis which remains a serious public health problem with an estimated 200 million people infected in 76 countries. In recent years, enormous amounts of both transcriptomic and proteomic data of schistosomes have become available, providing information on gene expression profiles for developmental stages and tissues of S. japonicum. Here, we establish a public searchable database, termed SjTPdb, with integrated transcriptomic and proteomic data of S. japonicum, to enable more efficient access and utility of these data and to facilitate the study of schistosome biology, physiology and evolution. Description All the available ESTs, EST clusters, and the proteomic dataset of S. japonicum are deposited in SjTPdb. The core of the database is the 8,420 S. japonicum proteins translated from the EST clusters, which are well annotated for sequence similarity, structural features, functional ontology, genomic variations and expression patterns across developmental stages and tissues including the tegument and eggshell of this flatworm. The data can be queried by simple text search, BLAST search, search based on developmental stage of the life cycle, and an integrated search for more specific information. A PHP-based web interface allows users to browse and query SjTPdb, and moreover to switch to external databases by the following embedded links. Conclusion SjTPdb is the first schistosome database with detailed annotations for schistosome proteins. It is also the first integrated database of both transcriptome and proteome of S. japonicum, providing a comprehensive data resource and research platform to facilitate functional genomics of schistosome. SjTPdb is available from URL: .
Collapse
Affiliation(s)
- Feng Liu
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, PR China.
| | | | | | | | | |
Collapse
|
269
|
Zheng G, Qian Z, Yang Q, Wei C, Xie L, Zhu Y, Li Y. The combination approach of SVM and ECOC for powerful identification and classification of transcription factor. BMC Bioinformatics 2008; 9:282. [PMID: 18554421 PMCID: PMC2440765 DOI: 10.1186/1471-2105-9-282] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2008] [Accepted: 06/16/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transcription factors (TFs) are core functional proteins which play important roles in gene expression control, and they are key factors for gene regulation network construction. Traditionally, they were identified and classified through experimental approaches. In order to save time and reduce costs, many computational methods have been developed to identify TFs from new proteins and to classify the resulted TFs. Though these methods have facilitated screening of TFs to some extent, low accuracy is still a common problem. With the fast growing number of new proteins, more precise algorithms for identifying TFs from new proteins and classifying the consequent TFs are in a high demand. RESULTS The support vector machine (SVM) algorithm was utilized to construct an automatic detector for TF identification, where protein domains and functional sites were employed as feature vectors. Error-correcting output coding (ECOC) algorithm, which was originated from information and communication engineering fields, was introduced to combine with support vector machine (SVM) methodology for TF classification. The overall success rates of identification and classification achieved 88.22% and 97.83% respectively. Finally, a web site was constructed to let users access our tools (see Availability and requirements section for URL). CONCLUSION The SVM method was a valid and stable means for TFs identification with protein domains and functional sites as feature vectors. Error-correcting output coding (ECOC) algorithm is a powerful method for multi-class classification problem. When combined with SVM method, it can remarkably increase the accuracy of TF classification using protein domains and functional sites as feature vectors. In addition, our work implied that ECOC algorithm may succeed in a broad range of applications in biological data mining.
Collapse
Affiliation(s)
- Guangyong Zheng
- Department of Computing and Information Technology, Fudan University, 220 Handan Road, Shanghai 200433, PR China.
| | | | | | | | | | | | | |
Collapse
|
270
|
Analysis of the genome sequence of Lactobacillus gasseri ATCC 33323 reveals the molecular basis of an autochthonous intestinal organism. Appl Environ Microbiol 2008; 74:4610-25. [PMID: 18539810 DOI: 10.1128/aem.00054-08] [Citation(s) in RCA: 122] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
This study presents the complete genome sequence of Lactobacillus gasseri ATCC 33323, a neotype strain of human origin and a native species found commonly in the gastrointestinal tracts of neonates and adults. The plasmid-free genome was 1,894,360 bp in size and predicted to encode 1,810 genes. The GC content was 35.3%, similar to the GC content of its closest relatives, L. johnsonii NCC 533 (34%) and L. acidophilus NCFM (34%). Two identical copies of the prophage LgaI (40,086 bp), of the Sfi11-like Siphoviridae phage family, were integrated tandomly in the chromosome. A number of unique features were identified in the genome of L. gasseri that were likely acquired by horizontal gene transfer and may contribute to the survival of this bacterium in its ecological niche. L. gasseri encodes two restriction and modification systems, which may limit bacteriophage infection. L. gasseri also encodes an operon for production of heteropolysaccharides of high complexity. A unique alternative sigma factor was present similar to that of B. caccae ATCC 43185, a bacterial species isolated from human feces. In addition, L. gasseri encoded the highest number of putative mucus-binding proteins (14) among lactobacilli sequenced to date. Selected phenotypic characteristics that were compared between ATCC 33323 and other human L. gasseri strains included carbohydrate fermentation patterns, growth and survival in bile, oxalate degradation, and adhesion to intestinal epithelial cells, in vitro. The results from this study indicated high intraspecies variability from a genome encoding traits important for survival and retention in the gastrointestinal tract.
Collapse
|
271
|
Anaerobic degradation of p-ethylphenol by "Aromatoleum aromaticum" strain EbN1: pathway, regulation, and involved proteins. J Bacteriol 2008; 190:5699-709. [PMID: 18539747 DOI: 10.1128/jb.00409-08] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The denitrifying "Aromatoleum aromaticum" strain EbN1 was demonstrated to utilize p-ethylphenol under anoxic conditions and was suggested to employ a degradation pathway which is reminiscent of known anaerobic ethylbenzene degradation in the same bacterium: initial hydroxylation of p-ethylphenol to 1-(4-hydroxyphenyl)-ethanol followed by dehydrogenation to p-hydroxyacetophenone. Possibly, subsequent carboxylation and thiolytic cleavage yield p-hydroxybenzoyl-coenzyme A (CoA), which is channeled into the central benzoyl-CoA pathway. Substrate-specific formation of three of the four proposed intermediates was confirmed by gas chromatographic-mass spectrometric analysis and also by applying deuterated p-ethylphenol. Proteins suggested to be involved in this degradation pathway are encoded in a single large operon-like structure ( approximately 15 kb). Among them are a p-cresol methylhydroxylase-like protein (PchCF), two predicted alcohol dehydrogenases (ChnA and EbA309), a biotin-dependent carboxylase (XccABC), and a thiolase (TioL). Proteomic analysis (two-dimensional difference gel electrophoresis) revealed their specific and coordinated upregulation in cells adapted to anaerobic growth with p-ethylphenol and p-hydroxyacetophenone (e.g., PchF up to 29-fold). Coregulated proteins of currently unknown function (e.g., EbA329) are possibly involved in p-ethylphenol- and p-hydroxyacetophenone-specific solvent stress responses and related to other aromatic solvent-induced proteins of strain EbN1.
Collapse
|
272
|
Protein Structural Change upon Ligand Binding Correlates with Enzymatic Reaction Mechanism. J Mol Biol 2008; 379:397-401. [DOI: 10.1016/j.jmb.2008.04.019] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2008] [Revised: 03/21/2008] [Accepted: 04/08/2008] [Indexed: 11/20/2022]
|
273
|
Bourbon HM. Comparative genomics supports a deep evolutionary origin for the large, four-module transcriptional mediator complex. Nucleic Acids Res 2008; 36:3993-4008. [PMID: 18515835 PMCID: PMC2475620 DOI: 10.1093/nar/gkn349] [Citation(s) in RCA: 254] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The multisubunit Mediator (MED) complex bridges DNA-bound transcriptional regulators to the RNA polymerase II (PolII) initiation machinery. In yeast, the 25 MED subunits are distributed within three core subcomplexes and a separable kinase module composed of Med12, Med13 and the Cdk8-CycC pair thought to control the reversible interaction between MED and PolII by phosphorylating repeated heptapeptides within the Rpb1 carboxyl-terminal domain (CTD). Here, MED conservation has been investigated across the eukaryotic kingdom. Saccharomyces cerevisiae Med2, Med3/Pgd1 and Med5/Nut1 subunits are apparent homologs of metazoan Med29/Intersex, Med27/Crsp34 and Med24/Trap100, respectively, and these and other 30 identified human MED subunits have detectable counterparts in the amoeba Dictyostelium discoideum, indicating that none is specific to metazoans. Indeed, animal/fungal subunits are also conserved in plants, green and red algae, entamoebids, oomycetes, diatoms, apicomplexans, ciliates and the 'deep-branching' protists Trichomonas vaginalis and Giardia lamblia. Surprisingly, although lacking CTD heptads, T. vaginalis displays 44 MED subunit homologs, including several CycC, Med12 and Med13 paralogs. Such observations have allowed the identification of a conserved 17-subunit framework around which peripheral subunits may be assembled, and support a very ancient eukaryotic origin for a large, four-module MED. The implications of this comprehensive work for MED structure-function relationships are discussed.
Collapse
Affiliation(s)
- Henri-Marc Bourbon
- Centre de Biologie du Développement, UMR5547 CNRS/Toulouse III, IFR109, Université Paul Sabatier, 31062 Toulouse, France.
| |
Collapse
|
274
|
Al-Shahrour F, Carbonell J, Minguez P, Goetz S, Conesa A, Tárraga J, Medina I, Alloza E, Montaner D, Dopazo J. Babelomics: advanced functional profiling of transcriptomics, proteomics and genomics experiments. Nucleic Acids Res 2008; 36:W341-6. [PMID: 18515841 PMCID: PMC2447758 DOI: 10.1093/nar/gkn318] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
We present a new version of Babelomics, a complete suite of web tools for the functional profiling of genome scale experiments, with new and improved methods as well as more types of functional definitions. Babelomics includes different flavours of conventional functional enrichment methods as well as more advanced gene set analysis methods that makes it a unique tool among the similar resources available. In addition to the well-known functional definitions (GO, KEGG), Babelomics includes new ones such as Biocarta pathways or text mining-derived functional terms. Regulatory modules implemented include transcriptional control (Transfac, CisRed) and other levels of regulation such as miRNA-mediated interference. Moreover, Babelomics allows for sub-selection of terms in order to test more focused hypothesis. Also gene annotation correspondence tables can be imported, which allows testing with user-defined functional modules. Finally, a tool for the ‘de novo’ functional annotation of sequences has been included in the system. This allows using yet unannotated organisms in the program. Babelomics has been extensively re-engineered and now it includes the use of web services and Web 2.0 technology features, a new user interface with persistent sessions and a new extended database of gene identifiers. Babelomics is available at http://www.babelomics.org
Collapse
Affiliation(s)
- Fátima Al-Shahrour
- Department of Bioinformatics, Centro de Investigación Príncipe Felipe (CIPF), Autopista del Saler 16, E46013 Valencia, Spain
| | | | | | | | | | | | | | | | | | | |
Collapse
|
275
|
Zvi A, Ariel N, Fulkerson J, Sadoff JC, Shafferman A. Whole genome identification of Mycobacterium tuberculosis vaccine candidates by comprehensive data mining and bioinformatic analyses. BMC Med Genomics 2008; 1:18. [PMID: 18505592 PMCID: PMC2442614 DOI: 10.1186/1755-8794-1-18] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2007] [Accepted: 05/28/2008] [Indexed: 12/19/2022] Open
Abstract
Background Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), infects ~8 million annually culminating in ~2 million deaths. Moreover, about one third of the population is latently infected, 10% of which develop disease during lifetime. Current approved prophylactic TB vaccines (BCG and derivatives thereof) are of variable efficiency in adult protection against pulmonary TB (0%–80%), and directed essentially against early phase infection. Methods A genome-scale dataset was constructed by analyzing published data of: (1) global gene expression studies under conditions which simulate intra-macrophage stress, dormancy, persistence and/or reactivation; (2) cellular and humoral immunity, and vaccine potential. This information was compiled along with revised annotation/bioinformatic characterization of selected gene products and in silico mapping of T-cell epitopes. Protocols for scoring, ranking and prioritization of the antigens were developed and applied. Results Cross-matching of literature and in silico-derived data, in conjunction with the prioritization scheme and biological rationale, allowed for selection of 189 putative vaccine candidates from the entire genome. Within the 189 set, the relative distribution of antigens in 3 functional categories differs significantly from their distribution in the whole genome, with reduction in the Conserved hypothetical category (due to improved annotation) and enrichment in Lipid and in Virulence categories. Other prominent representatives in the 189 set are the PE/PPE proteins; iron sequestration, nitroreductases and proteases, all within the Intermediary metabolism and respiration category; ESX secretion systems, resuscitation promoting factors and lipoproteins, all within the Cell wall category. Application of a ranking scheme based on qualitative and quantitative scores, resulted in a list of 45 best-scoring antigens, of which: 74% belong to the dormancy/reactivation/resuscitation classes; 30% belong to the Cell wall category; 13% are classical vaccine candidates; 9% are categorized Conserved hypotheticals, all potentially very potent T-cell antigens. Conclusion The comprehensive literature and in silico-based analyses allowed for the selection of a repertoire of 189 vaccine candidates, out of the whole-genome 3989 ORF products. This repertoire, which was ranked to generate a list of 45 top-hits antigens, is a platform for selection of genes covering all stages of M. tuberculosis infection, to be incorporated in rBCG or subunit-based vaccines.
Collapse
Affiliation(s)
- Anat Zvi
- Israel Institute for Biological Research, Ness Ziona 74100, Israel.
| | | | | | | | | |
Collapse
|
276
|
Tranchevent LC, Barriot R, Yu S, Van Vooren S, Van Loo P, Coessens B, De Moor B, Aerts S, Moreau Y. ENDEAVOUR update: a web resource for gene prioritization in multiple species. Nucleic Acids Res 2008; 36:W377-84. [PMID: 18508807 PMCID: PMC2447805 DOI: 10.1093/nar/gkn325] [Citation(s) in RCA: 179] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Endeavour (http://www.esat.kuleuven.be/endeavourweb; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes. Using a training set of genes known to be involved in a biological process of interest, our approach consists of (i) inferring several models (based on various genomic data sources), (ii) applying each model to the candidate genes to rank those candidates against the profile of the known genes and (iii) merging the several rankings into a global ranking of the candidate genes. In the present article, we describe the latest developments of Endeavour. First, we provide a web-based user interface, besides our Java client, to make Endeavour more universally accessible. Second, we support multiple species: in addition to Homo sapiens, we now provide gene prioritization for three major model organisms: Mus musculus, Rattus norvegicus and Caenorhabditis elegans. Third, Endeavour makes use of additional data sources and is now including numerous databases: ontologies and annotations, protein–protein interactions, cis-regulatory information, gene expression data sets, sequence information and text-mining data. We tested the novel version of Endeavour on 32 recent disease gene associations from the literature. Additionally, we describe a number of recent independent studies that made use of Endeavour to prioritize candidate genes for obesity and Type II diabetes, cleft lip and cleft palate, and pulmonary fibrosis.
Collapse
|
277
|
Sato S, Nakamura Y, Kaneko T, Asamizu E, Kato T, Nakao M, Sasamoto S, Watanabe A, Ono A, Kawashima K, Fujishiro T, Katoh M, Kohara M, Kishida Y, Minami C, Nakayama S, Nakazaki N, Shimizu Y, Shinpo S, Takahashi C, Wada T, Yamada M, Ohmido N, Hayashi M, Fukui K, Baba T, Nakamichi T, Mori H, Tabata S. Genome structure of the legume, Lotus japonicus. DNA Res 2008; 15:227-39. [PMID: 18511435 PMCID: PMC2575887 DOI: 10.1093/dnares/dsn008] [Citation(s) in RCA: 430] [Impact Index Per Article: 26.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The legume Lotus japonicus has been widely used as a model system to investigate the genetic background of legume-specific phenomena such as symbiotic nitrogen fixation. Here, we report structural features of the L. japonicus genome. The 315.1-Mb sequences determined in this and previous studies correspond to 67% of the genome (472 Mb), and are likely to cover 91.3% of the gene space. Linkage mapping anchored 130-Mb sequences onto the six linkage groups. A total of 10 951 complete and 19 848 partial structures of protein-encoding genes were assigned to the genome. Comparative analysis of these genes revealed the expansion of several functional domains and gene families that are characteristic of L. japonicus. Synteny analysis detected traces of whole-genome duplication and the presence of synteny blocks with other plant genomes to various degrees. This study provides the first opportunity to look into the complex and unique genetic system of legumes.
Collapse
Affiliation(s)
- Shusei Sato
- Kazusa DNA Research Institute, 2-6-7 Kazusa-kamatari, Kisarazu, Chiba, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
278
|
The metagenome of a biogas-producing microbial community of a production-scale biogas plant fermenter analysed by the 454-pyrosequencing technology. J Biotechnol 2008; 136:77-90. [PMID: 18597880 DOI: 10.1016/j.jbiotec.2008.05.008] [Citation(s) in RCA: 261] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2008] [Revised: 04/16/2008] [Accepted: 05/08/2008] [Indexed: 11/21/2022]
Abstract
Composition and gene content of a biogas-producing microbial community from a production-scale biogas plant fed with renewable primary products was analysed by means of a metagenomic approach applying the ultrafast 454-pyrosequencing technology. Sequencing of isolated total community DNA on a Genome Sequencer FLX System resulted in 616,072 reads with an average read length of 230 bases accounting for 141,664,289 bases sequence information. Assignment of obtained single reads to COG (Clusters of Orthologous Groups of proteins) categories revealed a genetic profile characteristic for an anaerobic microbial consortium conducting fermentative metabolic pathways. Assembly of single reads resulted in the formation of 8752 contigs larger than 500 bases in size. Contigs longer than 10kb mainly encode house-keeping proteins, e.g. DNA polymerase, recombinase, DNA ligase, sigma factor RpoD and genes involved in sugar and amino acid metabolism. A significant portion of contigs was allocated to the genome sequence of the archaeal methanogen Methanoculleus marisnigri JR1. Mapping of single reads to the M. marisnigri JR1 genome revealed that approximately 64% of the reference genome including methanogenesis gene regions are deeply covered. These results suggest that species related to those of the genus Methanoculleus play a dominant role in methanogenesis in the analysed fermentation sample. Moreover, assignment of numerous contig sequences to clostridial genomes including gene regions for cellulolytic functions indicates that clostridia are important for hydrolysis of cellulosic plant biomass in the biogas fermenter under study. Metagenome sequence data from a biogas-producing microbial community residing in a fermenter of a biogas plant provide the basis for a rational approach to improve the biotechnological process of biogas production.
Collapse
|
279
|
Mulder NJ, Apweiler R. The InterPro database and tools for protein domain analysis. ACTA ACUST UNITED AC 2008; Chapter 2:Unit 2.7. [PMID: 18428686 DOI: 10.1002/0471250953.bi0207s21] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
InterPro provides a one-stop shop for protein-sequence classification, freeing the user from having to visit multiple databases separately and rationalize the different results in varying formats. This unit describes how to submit a sequence to InterProScan via a Web server. It also provides instructions for installing and running InterProScan locally. In addition, details on browsing InterPro families and domains of interest using the InterPro Web and sequence retrieval system (SRS) are provided to show users how to get the most from the resource.
Collapse
Affiliation(s)
- Nicola J Mulder
- The EMBL Outstation, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | |
Collapse
|
280
|
Egan S, Lanigan M, Shiell B, Beddome G, Stewart D, Vaughan J, Michalski WP. The recovery of Mycobacterium avium subspecies paratuberculosis from the intestine of infected ruminants for proteomic evaluation. J Microbiol Methods 2008; 75:29-39. [PMID: 18547663 DOI: 10.1016/j.mimet.2008.04.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2007] [Revised: 04/24/2008] [Accepted: 04/28/2008] [Indexed: 10/22/2022]
Abstract
Johne's disease is a slowly developing intestinal disease, primarily of ruminants, caused by Mycobacterium avium subspecies paratuberculosis. The disease contributes to significant economic losses worldwide in agricultural industry. Analysis of bacterial proteomes isolated directly from infected animals can provide important information about the repertoire of proteins present during infection and disease progression. In this study, M. avium subspecies paratuberculosis has been extracted from Johne's disease-infected cattle and goat intestinal tissue sections in a manner compatible with direct 2-DE proteomic analysis for comparison with in vitro-cultured bacteria. M. avium subspecies paratuberculosis was harvested from the submucosa and mucosa of intestinal sections and enriched from macerated tissue by hypotonic lysis, sonication and centrifugation through a viscosity gradient. Subsequent comparison of the proteomes of the in vivo- and in vitro-derived bacteria identified a number of proteins that were differentially expressed. Among them, a number of hypothetical proteins of unknown function and a hypothetical fatty acyl dehydrogenase (FadE3_2) and 3-hydroxyacyl-CoA dehydrogenase, possibly important for in vivo metabolism, utilising the pathway for the beta-oxidation of fatty acids.
Collapse
Affiliation(s)
- Sharon Egan
- Protein Biochemistry and Proteomics Group, Australian Animal Health Laboratory, CSIRO Livestock Industries, Geelong VIC 3220, Australia
| | | | | | | | | | | | | |
Collapse
|
281
|
In silico and in vivo evaluation of bacteriophage phiEF24C, a candidate for treatment of Enterococcus faecalis infections. Appl Environ Microbiol 2008; 74:4149-63. [PMID: 18456848 DOI: 10.1128/aem.02371-07] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Along with the increasing threat of nosocomial infections by vancomycin-resistant Enterococcus faecalis, bacteriophage (phage) therapy has been expected as an alternative therapy against infectious disease. Although genome information and proof of applicability are prerequisites for a modern therapeutic phage, E. faecalis phage has not been analyzed in terms of these aspects. Previously, we reported a novel virulent phage, phiEF24C, and its biology indicated its therapeutic potential against E. faecalis infection. In this study, the phiEF24C genome was analyzed and the in vivo therapeutic applicability of phiEF24C was also briefly assessed. Its complete genome (142,072 bp) was predicted to have 221 open reading frames (ORFs) and five tRNA genes. In our functional analysis of the ORFs by use of a public database, no proteins undesirable in phage therapy, such as pathogenic and integration-related proteins, were predicted. The noncompetitive directions of replication and transcription and the host-adapted translation of the phage were deduced bioinformatically. Its genomic features indicated that phiEF24C is a member of the SPO1-like phage genus and especially that it has a close relationship to the Listeria phage P100, which is authorized for prophylactic use. Thus, these bioinformatics analyses rationalized the therapeutic eligibility of phiEF24C. Moreover, the in vivo therapeutic potential of phiEF24C, which was effective at a low concentration and was not affected by host sensitivity to the phage, was proven by use of sepsis BALB/c mouse models. Furthermore, no change in mouse lethality was observed under either single or repeated phage exposures. Although further study is required, phiEF24C can be a promising therapeutic phage against E. faecalis infections.
Collapse
|
282
|
Bansal S, Miao X, Adams MWW, Prestegard JH, Valafar H. Rapid classification of protein structure models using unassigned backbone RDCs and probability density profile analysis (PDPA). JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2008; 192:60-8. [PMID: 18321742 PMCID: PMC2699457 DOI: 10.1016/j.jmr.2008.01.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2007] [Revised: 01/19/2008] [Accepted: 01/29/2008] [Indexed: 05/22/2023]
Abstract
A method of identifying the best structural model for a protein of unknown structure from a list of structural candidates using unassigned 15N1H residual dipolar coupling (RDC) data and probability density profile analysis (PDPA) is described. Ten candidate structures have been obtained for the structural genomics target protein PF2048.1 using ROBETTA. 15N1H residual dipolar couplings have been measured from NMR spectra of the protein in two alignment media and these data have been analyzed using PDPA to rank the models in terms of their ability to represent the actual structure. A number of advantages in using this method to characterize a protein structure become apparent. RDCs can easily and rapidly be acquired, and without the need for assignment, the cost and duration of data acquisition is greatly reduced. The approach is quite robust with respect to imprecise and missing data. In the case of PF2048.1, a 79 residue protein, only 58 and 55 of the total RDC data were observed. The method can accelerate structure determination at higher resolution using traditional NMR spectroscopy by providing a starting point for the addition of NOEs and other NMR structural data.
Collapse
Affiliation(s)
- Sonal Bansal
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA 30602
| | - Xijiang Miao
- Computer Science and Engineering, University of South Carolina, Columbia SC 29308, USA
| | | | - James H. Prestegard
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA 30602
| | - Homayoun Valafar
- Computer Science and Engineering, University of South Carolina, Columbia SC 29308, USA
| |
Collapse
|
283
|
Götz S, García-Gómez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, Robles M, Talón M, Dopazo J, Conesa A. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res 2008; 36:3420-35. [PMID: 18445632 PMCID: PMC2425479 DOI: 10.1093/nar/gkn176] [Citation(s) in RCA: 2890] [Impact Index Per Article: 180.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Functional genomics technologies have been widely adopted in the biological research of both model and non-model species. An efficient functional annotation of DNA or protein sequences is a major requirement for the successful application of these approaches as functional information on gene products is often the key to the interpretation of experimental results. Therefore, there is an increasing need for bioinformatics resources which are able to cope with large amount of sequence data, produce valuable annotation results and are easily accessible to laboratories where functional genomics projects are being undertaken. We present the Blast2GO suite as an integrated and biologist-oriented solution for the high-throughput and automatic functional annotation of DNA or protein sequences based on the Gene Ontology vocabulary. The most outstanding Blast2GO features are: (i) the combination of various annotation strategies and tools controlling type and intensity of annotation, (ii) the numerous graphical features such as the interactive GO-graph visualization for gene-set function profiling or descriptive charts, (iii) the general sequence management features and (iv) high-throughput capabilities. We used the Blast2GO framework to carry out a detailed analysis of annotation behaviour through homology transfer and its impact in functional genomics research. Our aim is to offer biologists useful information to take into account when addressing the task of functionally characterizing their sequence data.
Collapse
Affiliation(s)
- Stefan Götz
- Bioinformatics Department, Centro de Investigación Principe Felipe, Valencia, Spain
| | | | | | | | | | | | | | | | | | | |
Collapse
|
284
|
Perrodou E, Chica C, Poch O, Gibson TJ, Thompson JD. A new protein linear motif benchmark for multiple sequence alignment software. BMC Bioinformatics 2008; 9:213. [PMID: 18439277 PMCID: PMC2374782 DOI: 10.1186/1471-2105-9-213] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2007] [Accepted: 04/25/2008] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs. RESULTS We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases. CONCLUSION We have shown that none of the programs currently available is capable of reliably aligning LMs in distantly related sequences and we have highlighted a number of specific problems. The results of the tests suggest possible ways to improve program accuracy for difficult, divergent sequences.
Collapse
Affiliation(s)
- Emmanuel Perrodou
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Department of Structural Biology and Genomics, F-67400 Illkirch, France.
| | | | | | | | | |
Collapse
|
285
|
Topalis P, Lawson D, Collins FH, Louis C. How can ontologies help vector biology? Trends Parasitol 2008; 24:249-52. [PMID: 18440275 DOI: 10.1016/j.pt.2008.03.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2007] [Revised: 01/21/2008] [Accepted: 03/10/2008] [Indexed: 10/22/2022]
Abstract
The reach of genomics has now extended to vector biology, with three mosquito genomes already sequenced and more arthropod vector genomes in the pipeline. The availability of these genomes has paved the way for high-throughput investigations on genome-wide gene expression and proteomics in vector biology. Such investigations would not have been possible without parallel progress in bioinformatics. It is now necessary to construct specific ontologies that will enable vector biologists to achieve computer-comprehensible annotation of genes and genomes, but also of various experimental, clinical and surveillance data. This will inevitably lead to the enhanced usage of such controlled vocabularies, and to an effort to develop novel ontologies, particularly in the context of disease control.
Collapse
Affiliation(s)
- Pantelis Topalis
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, PO Box 1385, Vassilika Vouton, Heraklion, Crete, Greece
| | | | | | | |
Collapse
|
286
|
Braconi Quintaje S, Orchard S. The annotation of both human and mouse kinomes in UniProtKB/Swiss-Prot: one small step in manual annotation, one giant leap for full comprehension of genomes. Mol Cell Proteomics 2008; 7:1409-19. [PMID: 18436524 DOI: 10.1074/mcp.r700001-mcp200] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Biomolecule phosphorylation by protein kinases is a fundamental cell signaling process in all living cells. Following the comprehensive cataloguing of the protein kinase complement of the human genome (Manning, G., Whyte, D. B., Martinez, R., Hunter, T., and Sudarsanam, S. (2002) The protein kinase complement of the human genome. Science 298, 1912-1934), this review will detail the state-of-the-art human and mouse kinase proteomes as provided in the UniProtKB/Swiss-Prot protein knowledgebase. The sequences of the 480 classical and up to 24 atypical protein kinases now believed to exist in the human genome and 484 classical and up to 24 atypical kinases within the mouse genome have been reviewed and, where necessary, revised. Extensive annotation has been added to each entry. In an era when a wealth of new databases is emerging on the Internet, UniProtKB/Swiss-Prot makes available to the scientific community the most up-to-date and in-depth annotation of these proteins with access to additional external resources linked from within each entry. Incorrect sequence annotations resulting from errors and artifacts have been eliminated. Each entry will be constantly reviewed and updated as new information becomes available with the orthologous enzymes in related species being annotated in a parallel effort and complete kinomes being completed as sequences become available. This ensures that the mammalian kinomes available from UniProtKB/Swiss-Prot are of a consistently high standard with each separate entry acting both as a valuable information resource and a central portal to a wealth of further detail via extensive cross-referencing.
Collapse
Affiliation(s)
- Silvia Braconi Quintaje
- Swiss-Prot group, Swiss Institute of Bioinformatics, CMU, 1 Michel Servet, 1211 Geneva 4, Switzerland
| | | |
Collapse
|
287
|
Gasparini F, Franchi N, Spolaore B, Ballarin L. Novel rhamnose-binding lectins from the colonial ascidian Botryllus schlosseri. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2008; 32:1177-1191. [PMID: 18471875 DOI: 10.1016/j.dci.2008.03.006] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2007] [Revised: 03/13/2008] [Accepted: 03/18/2008] [Indexed: 05/26/2023]
Abstract
In a full-length cDNA library from the compound ascidian Botryllus schlosseri, we identified, by BLAST search against UniProt database, five transcripts, each with complete coding sequence, homologous to known rhamnose-binding lectins (RBLs). Comparisons of the predicted amino acid sequences suggest that they represent different isoforms of a novel RBL, called BsRBL-1-5. Four of these isolectins were found in Botryllus homogenate after purification by affinity chromatography on acid-treated Sepharose, analysis by reverse-phase HPLC and mass spectrometry. Analysis of both molecular masses and tryptic digests of BsRBLs indicated that the N-terminal sequence of the purified proteins starts from residue 22 of the putative amino acid sequence, and residues 1-21 represent a signal peptide. Analysis by mass spectrometry of V8-protease digests confirmed the presence and alignments of the eight cysteines involved in the disulphide bridges that characterise RBLs. Functional studies proved the enhancing effect on phagocytosis of the affinity-purified material. Results are discussed in terms of phylogenetic relationships of BsRBLs with orthologous molecules from protostomes and deuterostomes.
Collapse
Affiliation(s)
- Fabio Gasparini
- Dipartimento di Biologia and CRIBI, University of Padova, Padova, Italy
| | | | | | | |
Collapse
|
288
|
Datema E, Mueller LA, Buels R, Giovannoni JJ, Visser RGF, Stiekema WJ, van Ham RCHJ. Comparative BAC end sequence analysis of tomato and potato reveals overrepresentation of specific gene families in potato. BMC PLANT BIOLOGY 2008; 8:34. [PMID: 18405374 PMCID: PMC2324086 DOI: 10.1186/1471-2229-8-34] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2007] [Accepted: 04/11/2008] [Indexed: 05/18/2023]
Abstract
BACKGROUND Tomato (Solanum lycopersicon) and potato (S. tuberosum) are two economically important crop species, the genomes of which are currently being sequenced. This study presents a first genome-wide analysis of these two species, based on two large collections of BAC end sequences representing approximately 19% of the tomato genome and 10% of the potato genome. RESULTS The tomato genome has a higher repeat content than the potato genome, primarily due to a higher number of retrotransposon insertions in the tomato genome. On the other hand, simple sequence repeats are more abundant in potato than in tomato. The two genomes also differ in the frequency distribution of SSR motifs. Based on EST and protein alignments, potato appears to contain up to 6,400 more putative coding regions than tomato. Major gene families such as cytochrome P450 mono-oxygenases and serine-threonine protein kinases are significantly overrepresented in potato, compared to tomato. Moreover, the P450 superfamily appears to have expanded spectacularly in both species compared to Arabidopsis thaliana, suggesting an expanded network of secondary metabolic pathways in the Solanaceae. Both tomato and potato appear to have a low level of microsynteny with A. thaliana. A higher degree of synteny was observed with Populus trichocarpa, specifically in the region between 15.2 and 19.4 Mb on P. trichocarpa chromosome 10. CONCLUSION The findings in this paper present a first glimpse into the evolution of Solanaceous genomes, both within the family and relative to other plant species. When the complete genome sequences of these species become available, whole-genome comparisons and protein- or repeat-family specific studies may shed more light on the observations made here.
Collapse
Affiliation(s)
- Erwin Datema
- Applied Bioinformatics, Plant Research International, PO Box 16, 6700 AA, Wageningen, The Netherlands
- Laboratory of Bioinformatics, Wageningen University, Transitorium, Dreijenlaan 3, 6703 HA Wageningen, The Netherlands
| | - Lukas A Mueller
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Robert Buels
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - James J Giovannoni
- United States Department of Agriculture and Boyce Thompson Institute for Plant, Research, Cornell University, Ithaca, New York 14853, USA
| | - Richard GF Visser
- Laboratory of Plant Breeding, Wageningen University, P.O. Box 386, 6700 AJ Wageningen, The Netherlands
| | - Willem J Stiekema
- Laboratory of Bioinformatics, Wageningen University, Transitorium, Dreijenlaan 3, 6703 HA Wageningen, The Netherlands
- Centre for BioSystems Genomics (CBSG), PO Box 98, 6700 AB Wageningen, The Netherlands
| | - Roeland CHJ van Ham
- Applied Bioinformatics, Plant Research International, PO Box 16, 6700 AA, Wageningen, The Netherlands
- Laboratory of Bioinformatics, Wageningen University, Transitorium, Dreijenlaan 3, 6703 HA Wageningen, The Netherlands
| |
Collapse
|
289
|
Patient S, Wieser D, Kleen M, Kretschmann E, Jesus Martin M, Apweiler R. UniProtJAPI: a remote API for accessing UniProt data. Bioinformatics 2008; 24:1321-2. [PMID: 18390879 DOI: 10.1093/bioinformatics/btn122] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Programmatic access to the UniProt Knowledgebase (UniProtKB) is essential for many bioinformatics applications dealing with protein data. We have created a Java library named UniProtJAPI, which facilitates the integration of UniProt data into Java-based software applications. The library supports queries and similarity searches that return UniProtKB entries in the form of Java objects. These objects contain functional annotations or sequence information associated with a UniProt entry. Here, we briefly describe the UniProtJAPI and demonstrate its usage.
Collapse
Affiliation(s)
- Samuel Patient
- The European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | |
Collapse
|
290
|
In vitro selection and characterization of ceftobiprole-resistant methicillin-resistant Staphylococcus aureus. Antimicrob Agents Chemother 2008; 52:2089-96. [PMID: 18378703 DOI: 10.1128/aac.01403-07] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Methicillin-resistant Staphylococcus aureus (MRSA) is resistant to beta-lactam antibiotics because it expresses penicillin-binding protein 2a (PBP2a), a low-affinity penicillin-binding protein. An investigational broad-spectrum cephalosporin, ceftobiprole (BPR), binds PBP2a with high affinity and is active against MRSA. We hypothesized that BPR resistance could be mediated by mutations in mecA, the gene encoding PBP2a. We selected BPR-resistant mutants by passage in high-volume broth cultures containing subinhibitory concentrations of BPR. We used strain COLnex (which lacks chromosomal mecA) transformed with pAW8 (a plasmid vector only), pYK20 (a plasmid carrying wild-type mecA), or pYK21 (a plasmid carrying a mutant mecA gene corresponding to five PBP2a mutations). All strains became resistant to BPR by day 9 of passaging, but MICs continued to increase until day 21. MICs increased 256-fold (from 1 to 256 microg/ml) for pAW8, 32-fold (from 4 to 128 microg/ml) for pYK20, and 8-fold (from 16 to 128 mug/ml) for pYK21. Strains carrying wild-type or mutant mecA developed six (pYK20 transformants) or four (pYK21 transformants) new mutations in mecA. The transformation of COLnex with a mecA mutant plasmid conferred BPR resistance, and the loss of mecA converted resistant strains into susceptible ones. Modeling studies predicted that several of the mecA mutations altered BPR binding; other mutations may have mediated resistance by influencing interactions with other proteins. Multiple mecA mutations were associated with BPR resistance in MRSA. BPR resistance also developed in the strain lacking mecA, suggesting a role for chromosomal genes.
Collapse
|
291
|
Mobarec JC, Filizola M. Advances in the Development and Application of Computational Methodologies for Structural Modeling of G-Protein Coupled Receptors. Expert Opin Drug Discov 2008; 3:343-355. [PMID: 19672320 DOI: 10.1517/17460441.3.3.343] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
BACKGROUND: Despite the large amount of experimental data accumulated in the past decade on G-protein coupled receptor (GPCR) structure and function, understanding of the molecular mechanisms underlying GPCR signaling is still far from being complete, thus impairing the design of effective and selective pharmaceuticals. OBJECTIVE: Understanding of GPCR function has been challenged even further by more recent experimental evidence that several of these receptors are organized in the cell membrane as homo- or hetero-oligomers, and that they may exhibit unique pharmacological properties. Given the complexity of these new signaling systems, researcher's efforts are turning increasingly to molecular modeling, bioinformatics and computational simulations for mechanistic insights of GPCR functional plasticity. METHODS: We review here current advances in the development and application of computational approaches to improve prediction of GPCR structure and dynamics, thus enhancing current understanding of GPCR signaling. RESULTS/CONCLUSIONS: Models resulting from use of these computational approaches further supported by experiments are expected to help elucidate the complex allosterism that propagates through GPCR complexes, ultimately aiming at successful structure-based rational drug design.
Collapse
Affiliation(s)
- Juan Carlos Mobarec
- Department of Structural and Chemical Biology, Mount Sinai School of Medicine, Icahn Medical Institute Building, 1425 Madison Avenue, Box 1677, New York, NY 10029-6574, Tel: 212-241-8634
| | | |
Collapse
|
292
|
Brayer KJ, Segal DJ. Keep your fingers off my DNA: protein-protein interactions mediated by C2H2 zinc finger domains. Cell Biochem Biophys 2008; 50:111-31. [PMID: 18253864 DOI: 10.1007/s12013-008-9008-5] [Citation(s) in RCA: 220] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2007] [Accepted: 12/28/2007] [Indexed: 11/28/2022]
Abstract
Cys2-His2 (C2H2) zinc finger domains (ZFs) were originally identified as DNA-binding domains, and uncharacterized domains are typically assumed to function in DNA binding. However, a growing body of evidence suggests an important and widespread role for these domains in protein binding. There are even examples of zinc fingers that support both DNA and protein interactions, which can be found in well-known DNA-binding proteins such as Sp1, Zif268, and Ying Yang 1 (YY1). C2H2 protein-protein interactions (PPIs) are proving to be more abundant than previously appreciated, more plastic than their DNA-binding counterparts, and more variable and complex in their interactions surfaces. Here we review the current knowledge of over 100 C2H2 zinc finger-mediated PPIs, focusing on what is known about the binding surface, contributions of individual fingers to the interaction, and function. An accurate understanding of zinc finger biology will likely require greater insights into the potential protein interaction capabilities of C2H2 ZFs.
Collapse
Affiliation(s)
- Kathryn J Brayer
- Department of Pharmacology and Toxicology, College of Pharmacy, University of Arizona, Tucson, AZ 85721, USA
| | | |
Collapse
|
293
|
Bose A, Metcalf WW. Distinct regulators control the expression of methanol methyltransferase isozymes inMethanosarcina acetivoransC2A. Mol Microbiol 2008; 67:649-61. [DOI: 10.1111/j.1365-2958.2007.06075.x] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
294
|
Windelinckx A, Vlietinck R, Aerssens J, Beunen G, Thomis MAI. Selection of genes and single nucleotide polymorphisms for fine mapping starting from a broad linkage region. Twin Res Hum Genet 2008; 10:871-85. [PMID: 18179400 DOI: 10.1375/twin.10.6.871] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Fine mapping of linkage peaks is one of the great challenges facing researchers who try to identify genes and genetic variants responsible for the variation in a certain trait or complex disease. Once the trait is linked to a certain chromosomal region, most studies use a candidate gene approach followed by a selection of polymorphisms within these genes, either based on their possibility to be functional, or based on the linkage disequilibrium between adjacent markers. For both candidate gene selection and SNP selection, several approaches have been described, and different software tools are available. However, mastering all these information sources and choosing between the different approaches can be difficult and time-consuming. Therefore, this article lists several of these in silico procedures, and the authors describe an empirical two-step fine mapping approach, in which candidate genes are prioritized using a bioinformatics approach (ENDEAVOUR), and the top genes are chosen for further SNP selection with a linkage disequilibrium based method (Tagger). The authors present the different actions that were applied within this approach on two previously identified linkage regions for muscle strength. This resulted in the selection of 331 polymorphisms located in 112 different candidate genes out of an initial set of 23,300 SNPs.
Collapse
Affiliation(s)
- An Windelinckx
- Research Center for Exercise and Health, Department of Biomedical Kinesiology, Faculty of Kinesiology and Rehabilitation Sciences, Katholieke Universiteit Leuven, Leuven, Belgium
| | | | | | | | | |
Collapse
|
295
|
Kemmer D, Podowski RM, Yusuf D, Brumm J, Cheung W, Wahlestedt C, Lenhard B, Wasserman WW. Gene characterization index: assessing the depth of gene annotation. PLoS One 2008; 3:e1440. [PMID: 18213364 PMCID: PMC2194620 DOI: 10.1371/journal.pone.0001440] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2007] [Accepted: 12/16/2007] [Indexed: 11/19/2022] Open
Abstract
Background We introduce the Gene Characterization Index, a bioinformatics method for scoring the extent to which a protein-encoding gene is functionally described. Inherently a reflection of human perception, the Gene Characterization Index is applied for assessing the characterization status of individual genes, thus serving the advancement of both genome annotation and applied genomics research by rapid and unbiased identification of groups of uncharacterized genes for diverse applications such as directed functional studies and delineation of novel drug targets. Methodology/Principal Findings The scoring procedure is based on a global survey of researchers, who assigned characterization scores from 1 (poor) to 10 (extensive) for a sample of genes based on major online resources. By evaluating the survey as training data, we developed a bioinformatics procedure to assign gene characterization scores to all genes in the human genome. We analyzed snapshots of functional genome annotation over a period of 6 years to assess temporal changes reflected by the increase of the average Gene Characterization Index. Applying the Gene Characterization Index to genes within pharmaceutically relevant classes, we confirmed known drug targets as high-scoring genes and revealed potentially interesting novel targets with low characterization indexes. Removing known drug targets and genes linked to sequence-related patent filings from the entirety of indexed genes, we identified sets of low-scoring genes particularly suited for further experimental investigation. Conclusions/Significance The Gene Characterization Index is intended to serve as a tool to the scientific community and granting agencies for focusing resources and efforts on unexplored areas of the genome. The Gene Characterization Index is available from http://cisreg.ca/gci/.
Collapse
Affiliation(s)
- Danielle Kemmer
- Center for Genomics and Bioinformatics, Karolinska Institute, Stockholm, Sweden
- Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, Canada
| | - Raf M. Podowski
- Center for Genomics and Bioinformatics, Karolinska Institute, Stockholm, Sweden
| | - Dimas Yusuf
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, Canada
| | - Jochen Brumm
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, Canada
- Department of Statistics, University of British Columbia, Vancouver, Canada
| | - Warren Cheung
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, Canada
| | - Claes Wahlestedt
- Center for Genomics and Bioinformatics, Karolinska Institute, Stockholm, Sweden
- Molecular and Integrative Neurosciences Department, The Scripps Research Institute, Jupiter, Florida, United States of America
| | - Boris Lenhard
- Center for Genomics and Bioinformatics, Karolinska Institute, Stockholm, Sweden
- Computational Biology Unit, Bergen Center for Computational Science, Sars International Centre for Marine Molecular Biology, Unifob AS, University of Bergen, Bergen, Norway
| | - Wyeth W. Wasserman
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, Canada
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
296
|
Hehr U, Bauer P, Winner B, Schule R, Olmez A, Koehler W, Uyanik G, Engel A, Lenz D, Seibel A, Hehr A, Ploetz S, Gamez J, Rolfs A, Weis J, Ringer TM, Bonin M, Schuierer G, Marienhagen J, Bogdahn U, Weber BHF, Topaloglu H, Schols L, Riess O, Winkler J. Long-term course and mutational spectrum of spatacsin-linked spastic paraplegia. Ann Neurol 2008; 62:656-65. [PMID: 18067136 DOI: 10.1002/ana.21310] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
OBJECTIVE Hereditary spastic paraplegias (HSPs) comprise a heterogeneous group of neurodegenerative disorders resulting in progressive spasticity of the lower limbs. One form of autosomal recessive hereditary spastic paraplegia (ARHSP) with thin corpus callosum (TCC) was linked to chromosomal region 15q13-21 (SPG11) and associated with mutations in the spatacsin gene. We assessed the long-term course and the mutational spectrum of spatacsin-associated ARHSP with TCC. METHODS Neurological examination, cerebral magnetic resonance imaging (MRI), 18fluorodeoxyglucose positron emission tomography (PET), nerve biopsy, linkage and mutation analysis are presented. RESULTS Spastic paraplegia in patients with spatacsin mutations (n = 20) developed during the second decade of life. The Spastic Paraplegia Rating Scale (SPRS) showed severely compromised walking between the second and third decades of life (mean SPRS score, >30). Impaired cognitive function was associated with severe atrophy of the frontoparietal cortex, TCC, and bilateral periventricular white matter lesions. Progressive cortical and thalamic hypometabolism in the 18fluorodeoxyglucose PET was observed. Sural nerve biopsy showed a loss of unmyelinated nerve fibers and accumulation of intraaxonal pleomorphic membranous material. Mutational analysis of spatacsin demonstrated six novel and one previously reported frameshift mutation and two novel nonsense mutations. Furthermore, we report the first two splice mutations to be associated with SPG11. INTERPRETATION We demonstrate that not only frameshift and nonsense mutations but also splice mutations result in SPG11. Mutations are distributed throughout the spatacsin gene and emerge as major cause for ARHSP with TCC associated with severe motor and cognitive impairment. The clinical phenotype and the ultrastructural analysis suggest a disturbed axonal transport of long projecting neurons.
Collapse
Affiliation(s)
- Ute Hehr
- Department of Human Genetics, University of Regensburg, Regensburg, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
297
|
Wu S, Liang MP, Altman RB. The SeqFEATURE library of 3D functional site models: comparison to existing methods and applications to protein function annotation. Genome Biol 2008; 9:R8. [PMID: 18197987 PMCID: PMC2395245 DOI: 10.1186/gb-2008-9-1-r8] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2007] [Revised: 11/21/2007] [Accepted: 01/16/2008] [Indexed: 11/10/2022] Open
Abstract
Structural genomics efforts have led to increasing numbers of novel, uncharacterized protein structures with low sequence identity to known proteins, resulting in a growing need for structure-based function recognition tools. Our method, SeqFEATURE, robustly models protein functions described by sequence motifs using a structural representation. We built a library of models that shows good performance compared to other methods. In particular, SeqFEATURE demonstrates significant improvement over other methods when sequence and structural similarity are low.
Collapse
Affiliation(s)
- Shirley Wu
- Program in Biomedical Informatics, Stanford University, Stanford, CA, 94305 USA
| | | | | |
Collapse
|
298
|
Kaneko T, Nakajima N, Okamoto S, Suzuki I, Tanabe Y, Tamaoki M, Nakamura Y, Kasai F, Watanabe A, Kawashima K, Kishida Y, Ono A, Shimizu Y, Takahashi C, Minami C, Fujishiro T, Kohara M, Katoh M, Nakazaki N, Nakayama S, Yamada M, Tabata S, Watanabe MM. Complete genomic structure of the bloom-forming toxic cyanobacterium Microcystis aeruginosa NIES-843. DNA Res 2008; 14:247-56. [PMID: 18192279 PMCID: PMC2779907 DOI: 10.1093/dnares/dsm026] [Citation(s) in RCA: 172] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The nucleotide sequence of the complete genome of a cyanobacterium, Microcystis aeruginosa NIES-843, was determined. The genome of M. aeruginosa is a single, circular chromosome of 5 842 795 base pairs (bp) in length, with an average GC content of 42.3%. The chromosome comprises 6312 putative protein-encoding genes, two sets of rRNA genes, 42 tRNA genes representing 41 tRNA species, and genes for tmRNA, the B subunit of RNase P, SRP RNA, and 6Sa RNA. Forty-five percent of the putative protein-encoding sequences showed sequence similarity to genes of known function, 32% were similar to hypothetical genes, and the remaining 23% had no apparent similarity to reported genes. A total of 688 kb of the genome, equivalent to 11.8% of the entire genome, were composed of both insertion sequences and miniature inverted-repeat transposable elements. This is indicative of a plasticity of the M. aeruginosa genome, through a mechanism that involves homologous recombination mediated by repetitive DNA elements. In addition to known gene clusters related to the synthesis of microcystin and cyanopeptolin, novel gene clusters that may be involved in the synthesis and modification of toxic small polypeptides were identified. Compared with other cyanobacteria, a relatively small number of genes for two component systems and a large number of genes for restriction-modification systems were notable characteristics of the M. aeruginosa genome.
Collapse
Affiliation(s)
- Takakazu Kaneko
- Kazusa DNA Research Institute, 2-6-7 Kazusa-kamatari, Kisarazu, Chiba 292-0818, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
299
|
Glöckner G, Golderer G, Werner-Felmayer G, Meyer S, Marwan W. A first glimpse at the transcriptome of Physarum polycephalum. BMC Genomics 2008; 9:6. [PMID: 18179708 PMCID: PMC2258281 DOI: 10.1186/1471-2164-9-6] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2007] [Accepted: 01/07/2008] [Indexed: 01/03/2023] Open
Abstract
Background Physarum polycephalum, an acellular plasmodial species belongs to the amoebozoa, a major branch in eukaryote evolution. Its complex life cycle and rich cell biology is reflected in more than 2500 publications on various aspects of its biochemistry, developmental biology, cytoskeleton, and cell motility. It now can be genetically manipulated, opening up the possibility of targeted functional analysis in this organism. Methods Here we describe a large fraction of the transcribed genes by sequencing a cDNA library from the plasmodial stage of the developmental cycle. Results In addition to the genes for the basic metabolism we found an unexpected large number of genes involved in sophisticated signaling networks and identified potential receptors for environmental signals such as light. In accordance with the various developmental options of the plasmodial cell we found that many P. polycephalum genes are alternatively spliced. Using 30 donor and 30 acceptor sites we determined the splicing signatures of this species. Comparisons to various other organisms including Dictyostelium, the closest relative, revealed that roughly half of the transcribed genes have no detectable counterpart, thus potentially defining species specific adaptations. On the other hand, we found highly conserved proteins, which are maintained in the metazoan lineage, but absent in D. discoideum or plants. These genes arose possibly in the last common ancestor of Amoebozoa and Metazoa but were lost in D. discoideum. Conclusion This work provides an analysis of up to half of the protein coding genes of Physarum polycephalum. The definition of splice motifs together with the description of alternatively spliced genes will provide a valuable resource for the ongoing genome project.
Collapse
Affiliation(s)
- Gernot Glöckner
- Leibniz Institute for Age Research-Fritz Lipmann Institute, Beutenbergstr, 11, D-07745 Jena, Germany.
| | | | | | | | | |
Collapse
|
300
|
Hahne F, Mehrle A, Arlt D, Poustka A, Wiemann S, Beissbarth T. Extending pathways based on gene lists using InterPro domain signatures. BMC Bioinformatics 2008; 9:3. [PMID: 18177498 PMCID: PMC2245903 DOI: 10.1186/1471-2105-9-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2007] [Accepted: 01/04/2008] [Indexed: 12/28/2022] Open
Abstract
Background High-throughput technologies like functional screens and gene expression analysis produce extended lists of candidate genes. Gene-Set Enrichment Analysis is a commonly used and well established technique to test for the statistically significant over-representation of particular pathways. A shortcoming of this method is however, that most genes that are investigated in the experiments have very sparse functional or pathway annotation and therefore cannot be the target of such an analysis. The approach presented here aims to assign lists of genes with limited annotation to previously described functional gene collections or pathways. This works by comparing InterPro domain signatures of the candidate gene lists with domain signatures of gene sets derived from known classifications, e.g. KEGG pathways. Results In order to validate our approach, we designed a simulation study. Based on all pathways available in the KEGG database, we create test gene lists by randomly selecting pathway genes, removing these genes from the known pathways and adding variable amounts of noise in the form of genes not annotated to the pathway. We show that we can recover pathway memberships based on the simulated gene lists with high accuracy. We further demonstrate the applicability of our approach on a biological example. Conclusion Results based on simulation and data analysis show that domain based pathway enrichment analysis is a very sensitive method to test for enrichment of pathways in sparsely annotated lists of genes. An R based software package domainsignatures, to routinely perform this analysis on the results of high-throughput screening, is available via Bioconductor.
Collapse
Affiliation(s)
- Florian Hahne
- German Cancer Research Center, Molecular Genome Analysis, Im Neuenheimer Feld 580,69120 Heidelberg, Germany.
| | | | | | | | | | | |
Collapse
|