101
|
Winsor GL, Van Rossum T, Lo R, Khaira B, Whiteside MD, Hancock REW, Brinkman FSL. Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes. Nucleic Acids Res 2008; 37:D483-8. [PMID: 18978025 PMCID: PMC2686508 DOI: 10.1093/nar/gkn861] [Citation(s) in RCA: 193] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Pseudomonas aeruginosa is a well-studied opportunistic pathogen that is particularly known for its intrinsic antimicrobial resistance, diverse metabolic capacity, and its ability to cause life threatening infections in cystic fibrosis patients. The Pseudomonas Genome Database (http://www.pseudomonas.com) was originally developed as a resource for peer-reviewed, continually updated annotation for the Pseudomonas aeruginosa PAO1 reference strain genome. In order to facilitate cross-strain and cross-species genome comparisons with other Pseudomonas species of importance, we have now expanded the database capabilities to include all Pseudomonas species, and have developed or incorporated methods to facilitate high quality comparative genomics. The database contains robust assessment of orthologs, a novel ortholog clustering method, and incorporates five views of the data at the sequence and annotation levels (Gbrowse, Mauve and custom views) to facilitate genome comparisons. A choice of simple and more flexible user-friendly Boolean search features allows researchers to search and compare annotations or sequences within or between genomes. Other features include more accurate protein subcellular localization predictions and a user-friendly, Boolean searchable log file of updates for the reference strain PAO1. This database aims to continue to provide a high quality, annotated genome resource for the research community and is available under an open source license.
Collapse
Affiliation(s)
- Geoffrey L Winsor
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | | | | | | | | | | | | |
Collapse
|
102
|
Czerwoniec A, Dunin-Horkawicz S, Purta E, Kaminska KH, Kasprzak JM, Bujnicki JM, Grosjean H, Rother K. MODOMICS: a database of RNA modification pathways. 2008 update. Nucleic Acids Res 2008; 37:D118-21. [PMID: 18854352 PMCID: PMC2686465 DOI: 10.1093/nar/gkn710] [Citation(s) in RCA: 175] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
MODOMICS, a database devoted to the systems biology of RNA modification, has been subjected to substantial improvements. It provides comprehensive information on the chemical structure of modified nucleosides, pathways of their biosynthesis, sequences of RNAs containing these modifications and RNA-modifying enzymes. MODOMICS also provides cross-references to other databases and to literature. In addition to the previously available manually curated tRNA sequences from a few model organisms, we have now included additional tRNAs and rRNAs, and all RNAs with 3D structures in the Nucleic Acid Database, in which modified nucleosides are present. In total, 3460 modified bases in RNA sequences of different organisms have been annotated. New RNA-modifying enzymes have been also added. The current collection of enzymes includes mainly proteins for the model organisms Escherichia coli and Saccharomyces cerevisiae, and is currently being expanded to include proteins from other organisms, in particular Archaea and Homo sapiens. For enzymes with known structures, links are provided to the corresponding Protein Data Bank entries, while for many others homology models have been created. Many new options for database searching and querying have been included. MODOMICS can be accessed at http://genesilico.pl/modomics.
Collapse
Affiliation(s)
- Anna Czerwoniec
- Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Umultowska 89, PL-61-614 Poznan, Poland, Max Planck Institute for Developmental Biology, Department 1, Protein Evolution Spemannstr. 35, 72076 Tuebingen, Germany, Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Ks. Trojdena 4, PL-02-190 Warsaw, Poland, Institute of Biochemistry and Biophysics PAS, Pawinskiego 5a, 02-106 Warsaw and IGM, Univ Paris-Sud, UMR 8621, Orsay, F 91405, France
| | - Stanislaw Dunin-Horkawicz
- Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Umultowska 89, PL-61-614 Poznan, Poland, Max Planck Institute for Developmental Biology, Department 1, Protein Evolution Spemannstr. 35, 72076 Tuebingen, Germany, Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Ks. Trojdena 4, PL-02-190 Warsaw, Poland, Institute of Biochemistry and Biophysics PAS, Pawinskiego 5a, 02-106 Warsaw and IGM, Univ Paris-Sud, UMR 8621, Orsay, F 91405, France
| | - Elzbieta Purta
- Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Umultowska 89, PL-61-614 Poznan, Poland, Max Planck Institute for Developmental Biology, Department 1, Protein Evolution Spemannstr. 35, 72076 Tuebingen, Germany, Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Ks. Trojdena 4, PL-02-190 Warsaw, Poland, Institute of Biochemistry and Biophysics PAS, Pawinskiego 5a, 02-106 Warsaw and IGM, Univ Paris-Sud, UMR 8621, Orsay, F 91405, France
| | - Katarzyna H. Kaminska
- Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Umultowska 89, PL-61-614 Poznan, Poland, Max Planck Institute for Developmental Biology, Department 1, Protein Evolution Spemannstr. 35, 72076 Tuebingen, Germany, Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Ks. Trojdena 4, PL-02-190 Warsaw, Poland, Institute of Biochemistry and Biophysics PAS, Pawinskiego 5a, 02-106 Warsaw and IGM, Univ Paris-Sud, UMR 8621, Orsay, F 91405, France
| | - Joanna M. Kasprzak
- Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Umultowska 89, PL-61-614 Poznan, Poland, Max Planck Institute for Developmental Biology, Department 1, Protein Evolution Spemannstr. 35, 72076 Tuebingen, Germany, Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Ks. Trojdena 4, PL-02-190 Warsaw, Poland, Institute of Biochemistry and Biophysics PAS, Pawinskiego 5a, 02-106 Warsaw and IGM, Univ Paris-Sud, UMR 8621, Orsay, F 91405, France
| | - Janusz M. Bujnicki
- Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Umultowska 89, PL-61-614 Poznan, Poland, Max Planck Institute for Developmental Biology, Department 1, Protein Evolution Spemannstr. 35, 72076 Tuebingen, Germany, Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Ks. Trojdena 4, PL-02-190 Warsaw, Poland, Institute of Biochemistry and Biophysics PAS, Pawinskiego 5a, 02-106 Warsaw and IGM, Univ Paris-Sud, UMR 8621, Orsay, F 91405, France
| | - Henri Grosjean
- Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Umultowska 89, PL-61-614 Poznan, Poland, Max Planck Institute for Developmental Biology, Department 1, Protein Evolution Spemannstr. 35, 72076 Tuebingen, Germany, Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Ks. Trojdena 4, PL-02-190 Warsaw, Poland, Institute of Biochemistry and Biophysics PAS, Pawinskiego 5a, 02-106 Warsaw and IGM, Univ Paris-Sud, UMR 8621, Orsay, F 91405, France
| | - Kristian Rother
- Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Umultowska 89, PL-61-614 Poznan, Poland, Max Planck Institute for Developmental Biology, Department 1, Protein Evolution Spemannstr. 35, 72076 Tuebingen, Germany, Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Ks. Trojdena 4, PL-02-190 Warsaw, Poland, Institute of Biochemistry and Biophysics PAS, Pawinskiego 5a, 02-106 Warsaw and IGM, Univ Paris-Sud, UMR 8621, Orsay, F 91405, France
- *To whom correspondence should be addressed. Tel: +48-22 597 0752; Fax: +48 22 597 0715;
| |
Collapse
|
103
|
Abstract
Reconciliation extracts information from the topological incongruence between gene and species trees to infer duplications and losses in the history of a gene family. The inferred duplication-loss histories provide valuable information for a broad range of biological applications, including ortholog identification, estimating gene duplication times, and rooting and correcting gene trees. While reconciliation for binary trees is a tractable and well studied problem, there are no algorithms for reconciliation with non-binary species trees. Yet a striking proportion of species trees are non-binary. For example, 64% of branch points in the NCBI taxonomy have three or more children. When applied to non-binary species trees, current algorithms overestimate the number of duplications because they cannot distinguish between duplication and incomplete lineage sorting. We present the first algorithms for reconciling binary gene trees with non-binary species trees under a duplication-loss parsimony model. Our algorithms utilize an efficient mapping from gene to species trees to infer the minimum number of duplications in O(|V(G) | x (k(S) + h(S))) time, where |V(G)| is the number of nodes in the gene tree, h(S) is the height of the species tree and k(S) is the size of its largest polytomy. We present a dynamic programming algorithm which also minimizes the total number of losses. Although this algorithm is exponential in the size of the largest polytomy, it performs well in practice for polytomies with outdegree of 12 or less. We also present a heuristic which estimates the minimal number of losses in polynomial time. In empirical tests, this algorithm finds an optimal loss history 99% of the time. Our algorithms have been implemented in NOTUNG, a robust, production quality, tree-fitting program, which provides a graphical user interface for exploratory analysis and also supports automated, high-throughput analysis of large data sets.
Collapse
Affiliation(s)
- Benjamin Vernot
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania
| | - Maureen Stolzer
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania
| | - Aiton Goldman
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania
| | - Dannie Durand
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania
- Department of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania
| |
Collapse
|
104
|
Müller H, Mancuso F. Identification and analysis of co-occurrence networks with NetCutter. PLoS One 2008; 3:e3178. [PMID: 18781200 PMCID: PMC2526157 DOI: 10.1371/journal.pone.0003178] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2008] [Accepted: 08/07/2008] [Indexed: 01/25/2023] Open
Abstract
Background Co-occurrence analysis is a technique often applied in text mining, comparative genomics, and promoter analysis. The methodologies and statistical models used to evaluate the significance of association between co-occurring entities are quite diverse, however. Methodology/Principal Findings We present a general framework for co-occurrence analysis based on a bipartite graph representation of the data, a novel co-occurrence statistic, and software performing co-occurrence analysis as well as generation and analysis of co-occurrence networks. We show that the overall stringency of co-occurrence analysis depends critically on the choice of the null-model used to evaluate the significance of co-occurrence and find that random sampling from a complete permutation set of the bipartite graph permits co-occurrence analysis with optimal stringency. We show that the Poisson-binomial distribution is the most natural co-occurrence probability distribution when vertex degrees of the bipartite graph are variable, which is usually the case. Calculation of Poisson-binomial P-values is difficult, however. Therefore, we propose a fast bi-binomial approximation for calculation of P-values and show that this statistic is superior to other measures of association such as the Jaccard coefficient and the uncertainty coefficient. Furthermore, co-occurrence analysis of more than two entities can be performed using the same statistical model, which leads to increased signal-to-noise ratios, robustness towards noise, and the identification of implicit relationships between co-occurring entities. Using NetCutter, we identify a novel protein biosynthesis related set of genes that are frequently coordinately deregulated in human cancer related gene expression studies. NetCutter is available at http://bio.ifom-ieo-campus.it/NetCutter/). Conclusion Our approach can be applied to any set of categorical data where co-occurrence analysis might reveal functional relationships such as clinical parameters associated with cancer subtypes or SNPs associated with disease phenotypes. The stringency of our approach is expected to offer an advantage in a variety of applications.
Collapse
Affiliation(s)
- Heiko Müller
- Department of Experimental Oncology, European Institute of Oncology, Milan, Italy.
| | | |
Collapse
|
105
|
Chen H, Kihara D. Estimating quality of template-based protein models by alignment stability. Proteins 2008; 71:1255-74. [PMID: 18041762 DOI: 10.1002/prot.21819] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
The error in protein tertiary structure prediction is unavoidable, but it is not explicitly shown in most of the current prediction algorithms. Estimated error of a predicted structure is crucial information for experimental biologists to use the prediction model for design and interpretation of experiments. Here, we propose a method to estimate errors in predicted structures based on the stability of the optimal target-template alignment when compared with a set of suboptimal alignments. The stability of the optimal alignment is quantified by an index named the SuboPtimal Alignment Diversity (SPAD). We implemented SPAD in a profile-based threading algorithm and investigated how well SPAD can indicate errors in threading models using a large benchmark dataset of 5232 alignments. SPAD shows a very good correlation not only to alignment shift errors but also structure-level errors, the root mean square deviation (RMSD) of predicted structure models to the native structures (i.e. global errors), and local errors at each residue position. We have further compared SPAD with seven other quality measures, six from sequence alignment-based measures and one atomic statistical potential, discrete optimized protein energy (DOPE), in terms of the correlation coefficient to the global and local structure-level errors. In terms of the correlation to the RMSD of structure models, when a target and a template are in the same SCOP family, the sequence identity showed a best correlation to the RMSD; in the superfamily level, SPAD was the best; and in the fold level, DOPE was best. However, in a head-to-head comparison, SPAD wins over the other measures. Next, SPAD is compared with three other measures of local errors. In this comparison, SPAD was best in all of the family, the superfamily and the fold levels. Using the discovered correlation, we have also predicted the global and local error of our predicted structures of CASP7 targets by the SPAD. Finally, we proposed a sausage representation of predicted tertiary structures which intuitively indicate the predicted structure and the estimated error range of the structure simultaneously.
Collapse
Affiliation(s)
- Hao Chen
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, Indiana 47907, USA
| | | |
Collapse
|
106
|
Khan AM, Miotto O, Nascimento EJM, Srinivasan KN, Heiny AT, Zhang GL, Marques ET, Tan TW, Brusic V, Salmon J, August JT. Conservation and variability of dengue virus proteins: implications for vaccine design. PLoS Negl Trop Dis 2008; 2:e272. [PMID: 18698358 PMCID: PMC2491585 DOI: 10.1371/journal.pntd.0000272] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2008] [Accepted: 07/10/2008] [Indexed: 12/27/2022] Open
Abstract
Background Genetic variation and rapid evolution are hallmarks of RNA viruses, the result of high mutation rates in RNA replication and selection of mutants that enhance viral adaptation, including the escape from host immune responses. Variability is uneven across the genome because mutations resulting in a deleterious effect on viral fitness are restricted. RNA viruses are thus marked by protein sites permissive to multiple mutations and sites critical to viral structure-function that are evolutionarily robust and highly conserved. Identification and characterization of the historical dynamics of the conserved sites have relevance to multiple applications, including potential targets for diagnosis, and prophylactic and therapeutic purposes. Methodology/Principal Findings We describe a large-scale identification and analysis of evolutionarily highly conserved amino acid sequences of the entire dengue virus (DENV) proteome, with a focus on sequences of 9 amino acids or more, and thus immune-relevant as potential T-cell determinants. DENV protein sequence data were collected from the NCBI Entrez protein database in 2005 (9,512 sequences) and again in 2007 (12,404 sequences). Forty-four (44) sequences (pan-DENV sequences), mainly those of nonstructural proteins and representing ∼15% of the DENV polyprotein length, were identical in 80% or more of all recorded DENV sequences. Of these 44 sequences, 34 (∼77%) were present in ≥95% of sequences of each DENV type, and 27 (∼61%) were conserved in other Flaviviruses. The frequencies of variants of the pan-DENV sequences were low (0 to ∼5%), as compared to variant frequencies of ∼60 to ∼85% in the non pan-DENV sequence regions. We further showed that the majority of the conserved sequences were immunologically relevant: 34 contained numerous predicted human leukocyte antigen (HLA) supertype-restricted peptide sequences, and 26 contained T-cell determinants identified by studies with HLA-transgenic mice and/or reported to be immunogenic in humans. Conclusions/Significance Forty-four (44) pan-DENV sequences of at least 9 amino acids were highly conserved and identical in 80% or more of all recorded DENV sequences, and the majority were found to be immune-relevant by their correspondence to known or putative HLA-restricted T-cell determinants. The conservation of these sequences through the entire recorded DENV genetic history supports their possible value for diagnosis, prophylactic and/or therapeutic applications. The combination of bioinformatics and experimental approaches applied herein provides a framework for large-scale and systematic analysis of conserved and variable sequences of other pathogens, in particular, for rapidly mutating viruses, such as influenza A virus and HIV. Dengue viruses (DENVs) circulate in nature as a population of 4 distinct types, each with multiple genotypes and variants, and represent an increasing global public health issue with no prophylactic and therapeutic formulations currently available. Viral genomes contain sites that are evolutionarily stable and therefore highly conserved, presumably because changes in these sites have deleterious effects on viral fitness and survival. The identification and characterization of the historical dynamics of these sites in DENV have relevance to several applications such as diagnosis and drug and vaccine development. In this study, we have identified sequence fragments that were conserved across the majority of available DENV sequences, analyzed their historical dynamics, and evaluated their relevance as candidate vaccine targets, using various bioinformatics-based methods and immune assay in human leukocyte antigen (HLA) transgenic mice. This approach provides a framework for large-scale and systematic analysis of other human pathogens.
Collapse
Affiliation(s)
- Asif M. Khan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Olivo Miotto
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Institute of Systems Science, National University of Singapore, Singapore
| | - Eduardo J. M. Nascimento
- Department of Medicine, Division of Infectious Diseases, The Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - K. N. Srinivasan
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
- Product Evaluation and Registration Division, Centre for Drug Administration, Health Sciences Authority, Singapore
| | - A. T. Heiny
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Guang Lan Zhang
- Cancer Vaccine Center, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - E. T. Marques
- Department of Medicine, Division of Infectious Diseases, The Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Tin Wee Tan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Vladimir Brusic
- Cancer Vaccine Center, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Jerome Salmon
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - J. Thomas August
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
- * E-mail:
| |
Collapse
|
107
|
Bourgogne A, Garsin DA, Qin X, Singh KV, Sillanpaa J, Yerrapragada S, Ding Y, Dugan-Rocha S, Buhay C, Shen H, Chen G, Williams G, Muzny D, Maadani A, Fox KA, Gioia J, Chen L, Shang Y, Arias CA, Nallapareddy SR, Zhao M, Prakash VP, Chowdhury S, Jiang H, Gibbs RA, Murray BE, Highlander SK, Weinstock GM. Large scale variation in Enterococcus faecalis illustrated by the genome analysis of strain OG1RF. Genome Biol 2008; 9:R110. [PMID: 18611278 PMCID: PMC2530867 DOI: 10.1186/gb-2008-9-7-r110] [Citation(s) in RCA: 217] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2008] [Revised: 05/08/2008] [Accepted: 07/08/2008] [Indexed: 11/18/2022] Open
Abstract
A comparison of two strains of the hospital pathogen Enterococcus faecalis suggests that mediators of virulence differ between strains and that virulence does not depend on mobile gene elements Background Enterococcus faecalis has emerged as a major hospital pathogen. To explore its diversity, we sequenced E. faecalis strain OG1RF, which is commonly used for molecular manipulation and virulence studies. Results The 2,739,625 base pair chromosome of OG1RF was found to contain approximately 232 kilobases unique to this strain compared to V583, the only publicly available sequenced strain. Almost no mobile genetic elements were found in OG1RF. The 64 areas of divergence were classified into three categories. First, OG1RF carries 39 unique regions, including 2 CRISPR loci and a new WxL locus. Second, we found nine replacements where a sequence specific to V583 was substituted by a sequence specific to OG1RF. For example, the iol operon of OG1RF replaces a possible prophage and the vanB transposon in V583. Finally, we found 16 regions that were present in V583 but missing from OG1RF, including the proposed pathogenicity island, several probable prophages, and the cpsCDEFGHIJK capsular polysaccharide operon. OG1RF was more rapidly but less frequently lethal than V583 in the mouse peritonitis model and considerably outcompeted V583 in a murine model of urinary tract infections. Conclusion E. faecalis OG1RF carries a number of unique loci compared to V583, but the almost complete lack of mobile genetic elements demonstrates that this is not a defining feature of the species. Additionally, OG1RF's effects in experimental models suggest that mediators of virulence may be diverse between different E. faecalis strains and that virulence is not dependent on the presence of mobile genetic elements.
Collapse
Affiliation(s)
- Agathe Bourgogne
- Division of Infectious Diseases, Department of Medicine, University of Texas Medical School, Houston, Texas 77030, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
108
|
Budowle B, Aranda XG, Lagace RE, Hennessy LK, Planz JV, Rodriguez M, Eisenberg AJ. Null allele sequence structure at the DYS448 locus and implications for profile interpretation. Int J Legal Med 2008; 122:421-7. [DOI: 10.1007/s00414-008-0258-y] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2008] [Accepted: 05/27/2008] [Indexed: 11/24/2022]
|
109
|
Jelinsky SA, Choe SE, Crabtree JS, Cotreau MM, Wilson E, Saraf K, Dorner AJ, Brown EL, Peano BJ, Zhang X, Winneker RC, Harris HA. Molecular analysis of the vaginal response to estrogens in the ovariectomized rat and postmenopausal woman. BMC Med Genomics 2008; 1:27. [PMID: 18578861 PMCID: PMC2453134 DOI: 10.1186/1755-8794-1-27] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2008] [Accepted: 06/25/2008] [Indexed: 11/21/2022] Open
Abstract
Background Vaginal atrophy (VA) is the thinning of the vaginal epithelial lining, typically the result of lowered estrogen levels during menopause. Some of the consequences of VA include increased susceptibility to bacterial infection, pain during sexual intercourse, and vaginal burning or itching. Although estrogen treatment is highly effective, alternative therapies are also desired for women who are not candidates for post-menopausal hormone therapy (HT). The ovariectomized (OVX) rat is widely accepted as an appropriate animal model for many estrogen-dependent responses in humans; however, since reproductive biology can vary significantly between mammalian systems, this study examined how well the OVX rat recapitulates human biology. Methods We analyzed 19 vaginal biopsies from human subjects pre and post 3-month 17β-estradiol treated by expression profiling. Data were compared to transcriptional profiling generated from vaginal samples obtained from ovariectomized rats treated with 17β-estradiol for 6 hrs, 3 days or 5 days. The level of differential expression between pre- vs. post- estrogen treatment was calculated for each of the human and OVX rat datasets. Probe sets corresponding to orthologous rat and human genes were mapped to each other using NCBI Homologene. Results A positive correlation was observed between the rat and human responses to estrogen. Genes belonging to several biological pathways and GO categories were similarly differentially expressed in rat and human. A large number of the coordinately regulated biological processes are already known to be involved in human VA, such as inflammation, epithelial development, and EGF pathway activation. Conclusion At the transcriptional level, there is evidence of significant overlap of the effects of estrogen treatment between the OVX rat and human VA samples.
Collapse
|
110
|
Moon S, Cho S, Kim H. Organization and evolution of mitochondrial gene clusters in human. Genomics 2008; 92:85-93. [PMID: 18559289 DOI: 10.1016/j.ygeno.2008.01.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2007] [Revised: 01/07/2008] [Accepted: 01/08/2008] [Indexed: 11/29/2022]
Abstract
Currently, the spatial patterns of mitochondrial genes and how the genomic localization of (pseudo)genes originated from mitochondrial DNA remain largely unexplained. The aim of this study was to elucidate the organization of mitochondrial (pseudo)genes given their evolutionary origin. We used a keyword finding method and a bootstrapping method to estimate parameter values that represent the distribution pattern of mitochondrial genes in the nuclear genome. Almost half of mitochondrial genes showing physical clusters were located in the pericentromeric and subtelomeric regions of the chromosome. Most interestingly, the size of these clusters ranged from 0.085 to 3.2 Mb (average+/-SD 1.3+/-0.73 Mb), which coincides with the size of the evolutionary pocket, or the average size of evolutionary breakpoint regions. Our findings imply that the localization of mitochondrial genes in the human genome is determined independent of adaptation.
Collapse
Affiliation(s)
- Sunjin Moon
- Laboratory of Bioinformatics and Population Genetics, Department of Agricultural Biotechnology, Seoul National University, Seoul 151-742, Korea
| | | | | |
Collapse
|
111
|
WANG SHU, HU ROUHMEI, HSIAO HANCW, HECHT DAVIDA, NG KALOK, CHEN RONGMING, SHEU PHILLIPCY, TSAI JEFFREYJP. USING SCDL FOR INTEGRATING TOOLS AND DATA FOR COMPLEX BIOMEDICAL APPLICATIONS. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING 2008. [DOI: 10.1142/s1793351x08000476] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Current bioinformatics tools or databases are very heterogeneous in terms of data formats, database schema, and terminologies. Additionally, most biomedical databases and analysis tools are scattered across different web sites making interoperability across such different services more difficult. It is desired that these diverse databases and analysis tools be normalized, integrated and encompassed with a semantic interface such that users of biological data and tools could communicate with the system in natural language and a workflow could be automatically generated and distributed into appropriate tools. In this paper, the BioSemantic System is presented to bridge complex biological/biomedical research problems and computational solutions via semantic computing. Due to the diversity of problems in various research fields, the semantic capability description language (SCDL) plays an important role as a common language and generic form for problem formalization. Several queries as well as their corresponding SCDL descriptions are provided as examples. For complex applications, multiple SCDL queries may be connected via control structures. For these cases, we present an algorithm to map a user request to one or more existing services if they exist.
Collapse
Affiliation(s)
- SHU WANG
- Department of Electrical Engineering & Computer Science, University of California at Irvine, USA
| | - ROUH-MEI HU
- Department of Biotechnology, Asia University, Taiwan
| | | | - DAVID A. HECHT
- Department of Bioinformatics, Asia University, Taiwan
- Department of Chemistry, Southwestern College, USA
| | - KA-LOK NG
- Department of Bioinformatics, Asia University, Taiwan
| | - RONG-MING CHEN
- Department of Computer Science & Information Engineering, National University of Tainan, Taiwan
| | - PHILLIP C.-Y. SHEU
- Department of Electrical Engineering & Computer Science, University of California at Irvine, USA
- Department of Bioinformatics, Asia University, Taiwan
| | - JEFFREY J. P. TSAI
- Department of Bioinformatics, Asia University, Taiwan
- Department of Computer Science, University of Illinois at Chicago, USA
| |
Collapse
|
112
|
Cheng D, Knox C, Young N, Stothard P, Damaraju S, Wishart DS. PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res 2008; 36:W399-405. [PMID: 18487273 PMCID: PMC2447794 DOI: 10.1093/nar/gkn296] [Citation(s) in RCA: 155] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
A particular challenge in biomedical text mining is to find ways of handling ‘comprehensive’ or ‘associative’ queries such as ‘Find all genes associated with breast cancer’. Given that many queries in genomics, proteomics or metabolomics involve these kind of comprehensive searches we believe that a web-based tool that could support these searches would be quite useful. In response to this need, we have developed the PolySearch web server. PolySearch supports >50 different classes of queries against nearly a dozen different types of text, scientific abstract or bioinformatic databases. The typical query supported by PolySearch is ‘Given X, find all Y's’ where X or Y can be diseases, tissues, cell compartments, gene/protein names, SNPs, mutations, drugs and metabolites. PolySearch also exploits a variety of techniques in text mining and information retrieval to identify, highlight and rank informative abstracts, paragraphs or sentences. PolySearch's performance has been assessed in tasks such as gene synonym identification, protein–protein interaction identification and disease gene identification using a variety of manually assembled ‘gold standard’ text corpuses. Its f-measure on these tasks is 88, 81 and 79%, respectively. These values are between 5 and 50% better than other published tools. The server is freely available at http://wishart.biology.ualberta.ca/polysearch
Collapse
Affiliation(s)
- Dean Cheng
- Department of Computing Science, University of Alberta, Canada
| | | | | | | | | | | |
Collapse
|
113
|
Blenkiron C, Goldstein LD, Thorne NP, Spiteri I, Chin SF, Dunning MJ, Barbosa-Morais NL, Teschendorff AE, Green AR, Ellis IO, Tavaré S, Caldas C, Miska EA. MicroRNA expression profiling of human breast cancer identifies new markers of tumor subtype. Genome Biol 2008; 8:R214. [PMID: 17922911 PMCID: PMC2246288 DOI: 10.1186/gb-2007-8-10-r214] [Citation(s) in RCA: 720] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2007] [Revised: 08/22/2007] [Accepted: 10/08/2007] [Indexed: 12/19/2022] Open
Abstract
Integrated analysis of miRNA expression and genomic changes in human breast tumors allows the classification of tumor subtypes. Background MicroRNAs (miRNAs), a class of short non-coding RNAs found in many plants and animals, often act post-transcriptionally to inhibit gene expression. Results Here we report the analysis of miRNA expression in 93 primary human breast tumors, using a bead-based flow cytometric miRNA expression profiling method. Of 309 human miRNAs assayed, we identify 133 miRNAs expressed in human breast and breast tumors. We used mRNA expression profiling to classify the breast tumors as luminal A, luminal B, basal-like, HER2+ and normal-like. A number of miRNAs are differentially expressed between these molecular tumor subtypes and individual miRNAs are associated with clinicopathological factors. Furthermore, we find that miRNAs could classify basal versus luminal tumor subtypes in an independent data set. In some cases, changes in miRNA expression correlate with genomic loss or gain; in others, changes in miRNA expression are likely due to changes in primary transcription and or miRNA biogenesis. Finally, the expression of DICER1 and AGO2 is correlated with tumor subtype and may explain some of the changes in miRNA expression observed. Conclusion This study represents the first integrated analysis of miRNA expression, mRNA expression and genomic changes in human breast cancer and may serve as a basis for functional studies of the role of miRNAs in the etiology of breast cancer. Furthermore, we demonstrate that bead-based flow cytometric miRNA expression profiling might be a suitable platform to classify breast cancer into prognostic molecular subtypes.
Collapse
Affiliation(s)
- Cherie Blenkiron
- Cancer Research UK, Cambridge Research Institute, Li Ka-Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
114
|
Navratil V, Penel S, Delmotte S, Mouchiroud D, Gautier C, Aouacheria A. DigiPINS: A database for vertebrate exonic single nucleotide polymorphisms and its application to cancer association studies. Biochimie 2008; 90:563-9. [DOI: 10.1016/j.biochi.2007.09.017] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2007] [Accepted: 09/21/2007] [Indexed: 11/28/2022]
|
115
|
Taswell C. DOORS to the Semantic Web and Grid With a PORTAL for Biomedical Computing. ACTA ACUST UNITED AC 2008; 12:191-204. [DOI: 10.1109/titb.2007.905861] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
116
|
Merging microarray data from separate breast cancer studies provides a robust prognostic test. BMC Bioinformatics 2008; 9:125. [PMID: 18304324 PMCID: PMC2409450 DOI: 10.1186/1471-2105-9-125] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2007] [Accepted: 02/27/2008] [Indexed: 11/15/2022] Open
Abstract
Background There is an urgent need for new prognostic markers of breast cancer metastases to ensure that newly diagnosed patients receive appropriate therapy. Recent studies have demonstrated the potential value of gene expression signatures in assessing the risk of developing distant metastases. However, due to the small sample sizes of individual studies, the overlap among signatures is almost zero and their predictive power is often limited. Integrating microarray data from multiple studies in order to increase sample size is therefore a promising approach to the development of more robust prognostic tests. Results In this study, by using a highly stable data aggregation procedure based on expression comparisons, we have integrated three independent microarray gene expression data sets for breast cancer and identified a structured prognostic signature consisting of 112 genes organized into 80 pair-wise expression comparisons. A classical likelihood ratio test based on these comparisons, essentially weighted voting, achieves 88.6% sensitivity and 54.6% specificity in an independent external test set of 154 samples. The test is highly informative in assessing the risk of developing distant metastases within five years (hazard ratio 9.3 with 95% CI 2.9–29.9). Conclusion Rank-based features provide a stable way to integrate patient data from separate microarray studies due to invariance to data normalization, and such features can be combined into a useful predictor of distant metastases in breast cancer within a statistical modeling framework which begins to capture gene-gene interactions. Upon further confirmation on large-scale independent data, such prognostic signatures and tests could provide a powerful tool to guide adjuvant systemic treatment that could greatly reduce the cost of breast cancer treatment, both in terms of toxic side effects and health care expenditures.
Collapse
|
117
|
Pontius JU, Mullikin JC, Smith DR, Lindblad-Toh K, Gnerre S, Clamp M, Chang J, Stephens R, Neelam B, Volfovsky N, Schäffer AA, Agarwala R, Narfström K, Murphy WJ, Giger U, Roca AL, Antunes A, Menotti-Raymond M, Yuhki N, Pecon-Slattery J, Johnson WE, Bourque G, Tesler G, O'Brien SJ. Initial sequence and comparative analysis of the cat genome. Genome Res 2008; 17:1675-89. [PMID: 17975172 DOI: 10.1101/gr.6380007] [Citation(s) in RCA: 251] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing approximately 65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence.
Collapse
Affiliation(s)
- Joan U Pontius
- Laboratory of Genomic Diversity, SAIC-Frederick, Inc., NCI-Frederick, Frederick, Maryland 21702, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
118
|
Kosinski J, Kubareva E, Bujnicki JM. A model of restriction endonuclease MvaI in complex with DNA: a template for interpretation of experimental data and a guide for specificity engineering. Proteins 2007; 68:324-36. [PMID: 17407166 DOI: 10.1002/prot.21460] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
R.MvaI is a Type II restriction enzyme (REase), which specifically recognizes the pentanucleotide DNA sequence 5'-CCWGG-3' (W indicates A or T). It belongs to a family of enzymes, which recognize related sequences, including 5'-CCSGG-3' (S indicates G or C) in the case of R.BcnI, or 5'-CCNGG-3' (where N indicates any nucleoside) in the case of R.ScrFI. REases from this family hydrolyze the phosphodiester bond in the DNA between the 2nd and 3rd base in both strands, thereby generating a double strand break with 5'-protruding single nucleotides. So far, no crystal structures of REases with similar cleavage patterns have been solved. Characterization of sequence-structure-function relationships in this family would facilitate understanding of evolution of sequence specificity among REases and could aid in engineering of enzymes with new specificities. However, sequences of R.MvaI or its homologs show no significant similarity to any proteins with known structures, thus precluding straightforward comparative modeling. We used a fold recognition approach to identify a remote relationship between R.MvaI and the structure of DNA repair enzyme MutH, which belongs to the PD-(D/E)XK superfamily together with many other REases. We constructed a homology model of R.MvaI and used it to predict functionally important amino acid residues and the mode of interaction with the DNA. In particular, we predict that only one active site of R.MvaI interacts with the DNA target at a time, and the cleavage of both strands (5'-CCAGG-3' and 5'-CCTGG-3') is achieved by two independent catalytic events. The model is in good agreement with the available experimental data and will serve as a template for further analyses of R.MvaI, R.BcnI, R.ScrFI and other related enzymes.
Collapse
Affiliation(s)
- Jan Kosinski
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland.
| | | | | |
Collapse
|
119
|
Kim E, Goren A, Ast G. Insights into the connection between cancer and alternative splicing. Trends Genet 2007; 24:7-10. [PMID: 18054115 DOI: 10.1016/j.tig.2007.10.001] [Citation(s) in RCA: 131] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2007] [Revised: 10/21/2007] [Accepted: 10/22/2007] [Indexed: 01/14/2023]
Abstract
Computational and experimental evidence has revealed that cancerous cells express transcript variants that are abnormally spliced, suggesting that mRNAs are more frequently alternatively spliced in cancerous tissues than in normal ones. We show that cancerous tissues exhibit lower levels of alternative splicing than do normal tissues. Moreover, we found that the distribution of types of alternative splicing differs between cancerous and normal tissues. We further show evidence suggesting that the lower levels of alternative splicing in cancerous tissues might be a result of disruption of splicing regulatory proteins.
Collapse
Affiliation(s)
- Eddo Kim
- Department of Human Genetics and Molecular Medicine, Sackler Faculty of Medicine, Tel-Aviv University, Ramat Aviv 69978, Israel
| | | | | |
Collapse
|
120
|
Frenz CM. Deafness mutation mining using regular expression based pattern matching. BMC Med Inform Decis Mak 2007; 7:32. [PMID: 17961241 PMCID: PMC2180167 DOI: 10.1186/1472-6947-7-32] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2007] [Accepted: 10/25/2007] [Indexed: 11/16/2022] Open
Abstract
Background While keyword based queries of databases such as Pubmed are frequently of great utility, the ability to use regular expressions in place of a keyword can often improve the results output by such databases. Regular expressions can allow for the identification of element types that cannot be readily specified by a single keyword and can allow for different words with similar character sequences to be distinguished. Results A Perl based utility was developed to allow the use of regular expressions in Pubmed searches, thereby improving the accuracy of the searches. Conclusion This utility was then utilized to create a comprehensive listing of all DFN deafness mutations discussed in Pubmed records containing the keywords "human ear".
Collapse
Affiliation(s)
- Christopher M Frenz
- Department of Computer Engineering Technology, New York City College of Technology (CUNY), 300 Jay St, Brooklyn, NY 11201, USA.
| |
Collapse
|
121
|
Rava P, Hussain MM. Acquisition of triacylglycerol transfer activity by microsomal triglyceride transfer protein during evolution. Biochemistry 2007; 46:12263-74. [PMID: 17924655 DOI: 10.1021/bi700762z] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Microsomal triglyceride transfer protein (MTP) is essential for the assembly of neutral-lipid-rich apolipoprotein B (apoB) lipoproteins. Previously we reported that the Drosophila MTP transfers phospholipids but does not transfer triglycerides. In contrast, human MTP transfers both lipids. To explore the acquisition of triglyceride transfer activity by MTP, we evaluated amino acid sequences, protein structures, and the biochemical and cellular properties of various MTP orthologues obtained from species that diverged during evolution. All MTP orthologues shared similar secondary and tertiary structures, associated with protein disulfide isomerase, localized to the endoplasmic reticulum, and supported apoB secretion. While vertebrate MTPs transferred triglyceride, invertebrate MTPs lacked this activity. Thus, triglyceride transfer activity was acquired during the transition from invertebrates to vertebrates. Within vertebrates, fish, amphibians, and birds displayed 27%, 40%, and 100% triglyceride transfer activity compared to mammals. We conclude that MTP triglyceride transfer activity first appeared in fish and speculate that the acquisition of triglyceride transfer activity by MTP provided for a significant advantage in the evolution of larger and more complex organisms.
Collapse
Affiliation(s)
- Paul Rava
- Molecular and Cellular Biology Program, School of Graduate Studies, SUNY Downstate Medical Center, Brooklyn, New York 11203, USA
| | | |
Collapse
|
122
|
Rahman FA, Ainscough JFX, Copeland N, Coverley D. Cancer-associated missplicing of exon 4 influences the subnuclear distribution of the DNA replication factor CIZ1. Hum Mutat 2007; 28:993-1004. [PMID: 17508423 DOI: 10.1002/humu.20550] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Cip1-interacting zinc finger protein 1 (CIZ1, also known as CDKN1A-interacting zinc finger protein 1) stimulates initiation of mammalian DNA replication and is normally tethered to the nuclear matrix within DNA replication foci. Here, we show that an alternatively spliced human CIZ1 variant, lacking exon 4 (Delta E4), is misexpressed as a consequence of intronic mutation in Ewing tumor (ET) cell lines. In all ET lines tested, exon 4 is skipped and an upstream mononucleotide repeat element is expanded to contain up to 28 thymidines, compared to 16 in controls. In exon-trap experiments, a 24T variant produced three-fold more exon skipping than a 16T variant, demonstrating a direct effect on splicing. In functional assays, Delta E4 protein retains replication activity, but fails to form subnuclear foci. Furthermore, coexpression of mouse Delta E4 with Ciz1 prevents Ciz1 from localizing appropriately, having a dominant negative effect on foci formation. The data show that conditional exclusion of exon 4 influences the spatial distribution of the Ciz1 protein within the nucleus, and raise the possibility that CIZ1 alternative splicing could influence organized patterns of DNA replication.
Collapse
|
123
|
Gioia J, Yerrapragada S, Qin X, Jiang H, Igboeli OC, Muzny D, Dugan-Rocha S, Ding Y, Hawes A, Liu W, Perez L, Kovar C, Dinh H, Lee S, Nazareth L, Blyth P, Holder M, Buhay C, Tirumalai MR, Liu Y, Dasgupta I, Bokhetache L, Fujita M, Karouia F, Eswara Moorthy P, Siefert J, Uzman A, Buzumbo P, Verma A, Zwiya H, McWilliams BD, Olowu A, Clinkenbeard KD, Newcombe D, Golebiewski L, Petrosino JF, Nicholson WL, Fox GE, Venkateswaran K, Highlander SK, Weinstock GM. Paradoxical DNA repair and peroxide resistance gene conservation in Bacillus pumilus SAFR-032. PLoS One 2007; 2:e928. [PMID: 17895969 PMCID: PMC1976550 DOI: 10.1371/journal.pone.0000928] [Citation(s) in RCA: 92] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2007] [Accepted: 08/31/2007] [Indexed: 11/25/2022] Open
Abstract
Background Bacillus spores are notoriously resistant to unfavorable conditions such as UV radiation, γ-radiation, H2O2, desiccation, chemical disinfection, or starvation. Bacillus pumilus SAFR-032 survives standard decontamination procedures of the Jet Propulsion Lab spacecraft assembly facility, and both spores and vegetative cells of this strain exhibit elevated resistance to UV radiation and H2O2 compared to other Bacillus species. Principal Findings The genome of B. pumilus SAFR-032 was sequenced and annotated. Lists of genes relevant to DNA repair and the oxidative stress response were generated and compared to B. subtilis and B. licheniformis. Differences in conservation of genes, gene order, and protein sequences are highlighted because they potentially explain the extreme resistance phenotype of B. pumilus. The B. pumilus genome includes genes not found in B. subtilis or B. licheniformis and conserved genes with sequence divergence, but paradoxically lacks several genes that function in UV or H2O2 resistance in other Bacillus species. Significance This study identifies several candidate genes for further research into UV and H2O2 resistance. These findings will help explain the resistance of B. pumilus and are applicable to understanding sterilization survival strategies of microbes.
Collapse
Affiliation(s)
- Jason Gioia
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Shailaja Yerrapragada
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Xiang Qin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Huaiyang Jiang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Okezie C. Igboeli
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Donna Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Shannon Dugan-Rocha
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Yan Ding
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Alicia Hawes
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Wen Liu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Lesette Perez
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Christie Kovar
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Huyen Dinh
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Sandra Lee
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Lynne Nazareth
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Peter Blyth
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Michael Holder
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Christian Buhay
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Madhan R. Tirumalai
- Department of Biology and Biochemistry, University of Houston, Houston, Texas, United States of America
| | - Yamei Liu
- Department of Biology and Biochemistry, University of Houston, Houston, Texas, United States of America
| | - Indrani Dasgupta
- Department of Biology and Biochemistry, University of Houston, Houston, Texas, United States of America
| | - Lina Bokhetache
- Department of Biology and Biochemistry, University of Houston, Houston, Texas, United States of America
| | - Masaya Fujita
- Department of Biology and Biochemistry, University of Houston, Houston, Texas, United States of America
| | - Fathi Karouia
- Department of Biology and Biochemistry, University of Houston, Houston, Texas, United States of America
| | - Prahathees Eswara Moorthy
- Department of Biology and Biochemistry, University of Houston, Houston, Texas, United States of America
| | - Johnathan Siefert
- Department of Biology and Biochemistry, University of Houston, Houston, Texas, United States of America
| | - Akif Uzman
- Department of Natural Sciences, University of Houston‐Downtown, Houston, Texas, United States of America
| | - Prince Buzumbo
- Department of Natural Sciences, University of Houston‐Downtown, Houston, Texas, United States of America
| | - Avani Verma
- Department of Natural Sciences, University of Houston‐Downtown, Houston, Texas, United States of America
| | - Hiba Zwiya
- Department of Natural Sciences, University of Houston‐Downtown, Houston, Texas, United States of America
| | - Brian D. McWilliams
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas, United States of America
| | - Adeola Olowu
- University of St. Thomas, Houston Texas, United States of America
| | - Kenneth D. Clinkenbeard
- Department of Veterinary Pathobiology, Center for Veterinary Health Sciences, Oklahoma State University, Stillwater, Oklahoma, United States of America
| | - David Newcombe
- University of Idaho Coeur d'Alene, Coeur d'Alene, Idaho, United States of America
- Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, United States of America
| | - Lisa Golebiewski
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas, United States of America
| | - Joseph F. Petrosino
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas, United States of America
| | - Wayne L. Nicholson
- Department of Microbiology and Cell Science, University of Florida Space Life Sciences Laboratory, Kennedy Space Center, Florida, United States of America
| | - George E. Fox
- Department of Biology and Biochemistry, University of Houston, Houston, Texas, United States of America
| | - Kasthuri Venkateswaran
- Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, United States of America
| | - Sarah K. Highlander
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas, United States of America
| | - George M. Weinstock
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
124
|
Zhang Z, Chen D, Fenstermacher DA. Integrated analysis of independent gene expression microarray datasets improves the predictability of breast cancer outcome. BMC Genomics 2007; 8:331. [PMID: 17883867 PMCID: PMC2064937 DOI: 10.1186/1471-2164-8-331] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2006] [Accepted: 09/20/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene expression profiles based on microarray data have been suggested by many studies as potential molecular prognostic indexes of breast cancer. However, due to the confounding effect of clinical background, independent studies often obtained inconsistent results. The current study investigated the possibility to improve the quality and generality of expression profiles by integrated analysis of multiple datasets. Profiles of recurrence outcome were derived from two independent datasets and validated by a third dataset. RESULTS The clinical background of patients significantly influenced the content and performance of expression profiles when the training samples were unbalanced. The integrated profiling of two independent datasets lead to higher classification accuracy (71.11% vs. 70.59%) and larger ROC curve area (0.789 vs. 0.767) of the testing samples. Cell cycle, especially M phase mitosis, was significantly overrepresented by the 60-gene profile obtained from integrated analysis (p < 0.0001). This profiles significantly differentiated poor and good prognosis in a third patient cohort (p = 0.003). Simulation procedures demonstrated that the change of profile specificity had more instant influence on the performance of expression profiles than the change of profile sensitivity. CONCLUSION The current study confirmed that the gene expression profile generated by integrated analysis of multiple datasets achieved better prediction of breast cancer recurrence. However, the content and performance of profiles was confounded by clinical background of training patients. In future studies, prognostic profile applicable to the general population should be derived from more diversified and balanced patient cohorts in larger scale.
Collapse
Affiliation(s)
- Zhe Zhang
- Department of Biomedical Engineering, University of North Carolina, Chapel Hill NC 27506, USA
| | - Dechang Chen
- Department of Preventive Medicine and Biometrics, Uniformed Services University of the Health Sciences, Bethesda, MD 20814, USA
| | - David A Fenstermacher
- Research Informatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| |
Collapse
|
125
|
Li H, Guan L, Liu T, Guo Y, Zheng WM, Wong GKS, Wang J. A cross-species alignment tool (CAT). BMC Bioinformatics 2007; 8:349. [PMID: 17880681 PMCID: PMC2082505 DOI: 10.1186/1471-2105-8-349] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2007] [Accepted: 09/19/2007] [Indexed: 01/06/2023] Open
Abstract
Background The main two sorts of automatic gene annotation frameworks are ab initio and alignment-based, the latter splitting into two sub-groups. The first group is used for intra-species alignments, among which are successful ones with high specificity and speed. The other group contains more sensitive methods which are usually applied in aligning inter-species sequences. Results Here we present a new algorithm called CAT (for Cross-species Alignment Tool). It is designed to align mRNA sequences to mammalian-sized genomes. CAT is implemented using C scripts and is freely available on the web at . Conclusions Examined from different angles, CAT outperforms other extant alignment tools. Tested against all available mouse-human and zebrafish-human orthologs, we demonstrate that CAT combines the specificity and speed of the best intra-species algorithms, like BLAT and sim4, with the sensitivity of the best inter-species tools, like GeneWise.
Collapse
Affiliation(s)
- Heng Li
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China
- James D. Watson Institute of Genome Sciences of Zhejiang University, Hangzhou 310008, China
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100080, China
- Graduate University of the Chinese Academy of Sciences, Yuquan Road 19A, Beijing 100039, China
| | - Liang Guan
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China
- James D. Watson Institute of Genome Sciences of Zhejiang University, Hangzhou 310008, China
- Graduate University of the Chinese Academy of Sciences, Yuquan Road 19A, Beijing 100039, China
- Institute of Computing Technology, Chinese Academy of Science, Beijing 100080, China
| | - Tao Liu
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China
- James D. Watson Institute of Genome Sciences of Zhejiang University, Hangzhou 310008, China
| | - Yiran Guo
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China
- Graduate University of the Chinese Academy of Sciences, Yuquan Road 19A, Beijing 100039, China
| | - Wei-Mou Zheng
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China
- James D. Watson Institute of Genome Sciences of Zhejiang University, Hangzhou 310008, China
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100080, China
| | - Gane Ka-Shu Wong
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China
- James D. Watson Institute of Genome Sciences of Zhejiang University, Hangzhou 310008, China
- UW Genome Center, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Jun Wang
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing 101300, China
- James D. Watson Institute of Genome Sciences of Zhejiang University, Hangzhou 310008, China
- The Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230, Odense M, Denmark
| |
Collapse
|
126
|
Vider-Shalit T, Fishbain V, Raffaeli S, Louzoun Y. Phase-dependent immune evasion of herpesviruses. J Virol 2007; 81:9536-45. [PMID: 17609281 PMCID: PMC1951411 DOI: 10.1128/jvi.02636-06] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2006] [Accepted: 06/22/2007] [Indexed: 12/14/2022] Open
Abstract
Viruses employ various modes to evade immune detection. Two possible evasion modes are a reduction of the number of epitopes presented and the mimicry of host epitopes. The immune evasion efforts are not uniform among viral proteins. The number of epitopes in a given viral protein and the similarity of the epitopes to host peptides can be used as a measure of the viral attempts to hide this protein. Using bioinformatics tools, we here present a genomic analysis of the attempts of four human herpesviruses (herpes simplex virus type 1-human herpesvirus 1, Epstein-Barr virus-human herpesvirus 4, human cytomegalovirus-human herpesvirus 5, and Kaposi's sarcoma-associated herpesvirus-human herpesvirus 8) and one murine herpesvirus (murine herpesvirus 68) to escape from immune detection. We determined the full repertoire of CD8 T-lymphocyte epitopes presented by each viral protein and show that herpesvirus proteins present many fewer epitopes than expected. Furthermore, the epitopes that are presented are more similar to host epitopes than are random viral epitopes, minimizing the immune response. We defined a score for the size of the immune repertoire (the SIR score) based on the number of epitopes in a protein. The numbers of epitopes in proteins expressed in the latent and early phases of infection were significantly smaller than those in proteins expressed in the lytic phase in all tested viruses. The latent and immediate-early epitopes were also more similar to host epitopes than were lytic epitopes. A clear trend emerged from the analysis. In general, herpesviruses demonstrated an effort to evade immune detection. However, within a given herpesvirus, proteins expressed in phases critical to the fate of infection (e.g., early lytic and latent) evaded immune detection more than all others. The application of the SIR score to specific proteins allows us to quantify the importance of immune evasion and to detect optimal targets for immunotherapy and vaccine development.
Collapse
|
127
|
Ingsriswang S, Pacharawongsakda E. sMOL Explorer: an open source, web-enabled database and exploration tool for Small MOLecules datasets. Bioinformatics 2007; 23:2498-500. [PMID: 17660205 DOI: 10.1093/bioinformatics/btm363] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED sMOL Explorer is a 2D ligand-based computational tool that provides three major functionalities: data management, information retrieval and extraction and statistical analysis and data mining through Web interface. With sMOL Explorer, users can create personal databases by adding each small molecule via a drawing interface or uploading the data files from internal and external projects into the sMOL database. Then, the database can be browsed and queried with textual and structural similarity search. The molecule can also be submitted to search against external public databases including PubChem, KEGG, DrugBank and eMolecules. Moreover, users can easily access a variety of data mining tools from Weka and R packages to perform analysis including (1) finding the frequent substructure, (2) clustering the molecular fingerprints, (3) identifying and removing irrelevant attributes from the data and (4) building the classification model of biological activity. AVAILABILITY sMOL Explorer is an Open Source project and is freely available to all interested users at http://www.biotec.or.th/ISL/SMOL/.
Collapse
Affiliation(s)
- Supawadee Ingsriswang
- Information Systems Laboratory, BIOTEC Central Research Unit, National Center for Genetic Engineering and Biotechnology (BIOTEC), Klongluang, Pathumthani, 12120, Thailand.
| | | |
Collapse
|
128
|
Abstract
BACKGROUND Using computational database searches, we have demonstrated previously that no gene sequences could be found for at least 36% of enzyme activities that have been assigned an Enzyme Commission number. Here we present a follow-up literature-based survey involving a statistically significant sample of such "orphan" activities. The survey was intended to determine whether sequences for these enzyme activities are truly unknown, or whether these sequences are absent from the public sequence databases but can be found in the literature. RESULTS We demonstrate that for ~80% of sampled orphans, the absence of sequence data is bona fide. Our analyses further substantiate the notion that many of these enzyme activities play biologically important roles. CONCLUSION This survey points toward significant scientific cost of having such a large fraction of characterized enzyme activities disconnected from sequence data. It also suggests that a larger effort, beginning with a comprehensive survey of all putative orphan activities, would resolve nearly 300 artifactual orphans and reconnect a wealth of enzyme research with modern genomics. For these reasons, we propose that a systematic effort to identify the cognate genes of orphan enzymes be undertaken.
Collapse
|
129
|
Davies L, Anderson IP, Turner PC, Shirras AD, Rees HH, Rigden DJ. An unsuspected ecdysteroid/steroid phosphatase activity in the key T-cell regulator, Sts-1: surprising relationship to insect ecdysteroid phosphate phosphatase. Proteins 2007; 67:720-31. [PMID: 17348005 DOI: 10.1002/prot.21357] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The insect enzyme ecdysteroid phosphate phosphatase (EPP) mobilizes active ecdysteroids from an inactive phosphorylated pool. Previously assigned to a novel class, it is shown here that it resides in the large histidine phosphatase superfamily related to cofactor-dependent phosphoglycerate mutase, a superfamily housing notably diverse catalytic activities. Molecular modeling reveals a plausible substrate-binding mode for EPP. Analysis of genomic and transcript data for a number of insect species shows that EPP may exist in both the single domain form previously characterized and in a longer, multidomain form. This latter form bears a quite unexpected relationship in sequence and domain architecture to vertebrate proteins, including Sts-1, characterized as a key regulator of T-cell activity. Long form Drosophila melanogaster EPP, human Sts-1, and a related protein from Caenorhabditis elegans have all been cloned, assayed, and shown to catalyse the hydrolysis of ecdysteroid and steroid phosphates. The surprising relationship described and explored here between EPP and Sts-1 has implications for our understanding of the function(s) of both.
Collapse
MESH Headings
- Adaptor Proteins, Signal Transducing/chemistry
- Adaptor Proteins, Signal Transducing/genetics
- Adaptor Proteins, Signal Transducing/metabolism
- Amino Acid Sequence
- Animals
- Binding Sites
- Carrier Proteins/chemistry
- Carrier Proteins/genetics
- Carrier Proteins/metabolism
- Cell Line
- Chromatography, High Pressure Liquid
- Cloning, Molecular
- Computational Biology
- Databases, Protein
- Evolution, Molecular
- Humans
- Hydrophobic and Hydrophilic Interactions
- Insect Proteins/chemistry
- Insect Proteins/genetics
- Insect Proteins/metabolism
- Models, Molecular
- Molecular Sequence Data
- Open Reading Frames/genetics
- Phosphoric Monoester Hydrolases/chemistry
- Phosphoric Monoester Hydrolases/genetics
- Phosphoric Monoester Hydrolases/metabolism
- Phylogeny
- Protein Structure, Secondary
- Protein Structure, Tertiary
- Protein Tyrosine Phosphatases
- Sequence Homology, Amino Acid
- Transfection
Collapse
Affiliation(s)
- Lyndsay Davies
- School of Biological Sciences, University of Liverpool, Biosciences Building, Liverpool L69 7ZB, United Kingdom
| | | | | | | | | | | |
Collapse
|
130
|
Cahan P, Rovegno F, Mooney D, Newman JC, Laurent GS, McCaffrey TA. Meta-analysis of microarray results: challenges, opportunities, and recommendations for standardization. Gene 2007; 401:12-8. [PMID: 17651921 PMCID: PMC2111172 DOI: 10.1016/j.gene.2007.06.016] [Citation(s) in RCA: 100] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2007] [Revised: 06/06/2007] [Accepted: 06/12/2007] [Indexed: 12/31/2022]
Abstract
Microarray profiling of gene expression is a powerful tool for discovery, but the ability to manage and compare the resulting data can be problematic. Biological, experimental, and technical variations between studies of the same phenotype/phenomena create substantial differences in results. The application of conventional meta-analysis to raw microarray data is complicated by differences in the type of microarray used, gene nomenclatures, species, and analytical methods. An alternative approach to combining multiple microarray studies is to compare the published gene lists which result from the investigators' analyses of the raw data, as implemented in Lists of Lists Annotated (LOLA: www.lola.gwu.edu) and L2L (depts.washington.edu/l2l/). The present review considers both the potential value and the limitations of databasing and enabling the comparison of results from different microarray studies. Further, a major impediment to cross-study comparisons is the absence of a standard for reporting microarray study results. We propose a reporting standard: standard microarray results template (SMART), which will facilitate the integration of microarray studies.
Collapse
Affiliation(s)
- Patrick Cahan
- Department of Internal Medicine, Washington University, St. Louis, MO 63110, USA
| | - Felicia Rovegno
- The George Washington University Medical Center, Department of Biochemistry and Molecular Biology & The Catherine Birch McCormick Genomics Center
| | - Denise Mooney
- The George Washington University Medical Center, Department of Biochemistry and Molecular Biology & The Catherine Birch McCormick Genomics Center
| | - John C. Newman
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | - Georges St. Laurent
- The George Washington University Medical Center, Department of Biochemistry and Molecular Biology & The Catherine Birch McCormick Genomics Center
| | - Timothy A. McCaffrey
- The George Washington University Medical Center, Department of Biochemistry and Molecular Biology & The Catherine Birch McCormick Genomics Center
- * Address for correspondence: Tim McCaffrey, Ph.D., The George Washington University Medical Center, Department of Biochemistry and Molecular Biology, 2300 I Street NW. Ross Hall 541, Washington, D.C. 20037, (202) 994-8919, (202) 994-8924 FAX,
| |
Collapse
|
131
|
Jain M, Khurana P, Tyagi AK, Khurana JP. Genome-wide analysis of intronless genes in rice and Arabidopsis. Funct Integr Genomics 2007; 8:69-78. [PMID: 17578610 DOI: 10.1007/s10142-007-0052-9] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2007] [Revised: 04/07/2007] [Accepted: 05/06/2007] [Indexed: 10/23/2022]
Abstract
Intronless genes, a characteristic feature of prokaryotes, constitute a significant portion of the eukaryotic genomes. Our analysis revealed the presence of 11,109 (19.9%) and 5,846 (21.7%) intronless genes in rice and Arabidopsis genomes, respectively, belonging to different cellular role and gene ontology categories. The distribution and conservation of rice and Arabidopsis intronless genes among different taxonomic groups have been analyzed. A total of 301 and 296 intronless genes from rice and Arabidopsis, respectively, are conserved among organisms representing the three major domains of life, i.e., archaea, bacteria, and eukaryotes. These evolutionarily conserved proteins are predicted to be involved in housekeeping cellular functions. Interestingly, among the 68% of rice and 77% of Arabidopsis intronless genes present only in eukaryotic genomes, approximately 51% and 57% genes have orthologs only in plants, and thus may represent the plant-specific genes. Furthermore, 831 and 144 intronless genes of rice and Arabidopsis, respectively, referred to as ORFans, do not exhibit homology to any of the genes in the database and may perform species-specific functions. These data can serve as a resource for further comparative, evolutionary, and functional analysis of intronless genes in plants and other organisms.
Collapse
Affiliation(s)
- Mukesh Jain
- Interdisciplinary Centre for Plant Genomics and Department of Plant Molecular Biology, University of Delhi South Campus, Benito Juarez Road, New Delhi 110 021, India
| | | | | | | |
Collapse
|
132
|
Zhou L, Florea L. Designing sensitive and specific spaced seeds for cross-species mRNA-to-genome alignment. J Comput Biol 2007; 14:113-30. [PMID: 17456011 DOI: 10.1089/cmb.2006.0130] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
As the demand for accurately aligning gene sequences to the genome of a related species grows with the sequencing of new genomes, spaced seeds emerge as a promising vehicle for increasing alignment sensitivity. We extend the existing {0, 1} match-mismatch models for sensitivity evaluation to take into account the compositional structure of coding sequences and ultimately produce seeds better suited to this particular application. Designing seeds for alignment programs, however, needs to balance sensitivity and specificity. We assess the effects of seed variations on both sensitivity and specificity in an extended model that incorporates transitions and differentiates among the three codon positions, and show that spaced seeds with transitions offer a better sensitivity-specificity tradeoff. Furthermore, we propose a theoretical formulation for rigorously assessing seed specificity, starting from Bernoulli and Markov models of the mRNA and genomic sequences. Within this framework, we perform the first comprehensive analysis of seeds to serve as a blueprint for selecting sensitive and specific seeds for practical applications. Our analyses show that specificity is relatively constant for seeds of a given weight, while sensitivity varies widely, with the highest values attained by seeds allowing a small (2-6) number of transitions.A strategy for designing seeds, therefore, is to first select the weight of the seed by identifying the desired sensitivity-specificity tradeoff, then choose the most sensitive seed(s) within that weight group. We illustrate our methods with the alignment of chicken coding sequences against the human genome assembly version HG17.
Collapse
Affiliation(s)
- Leming Zhou
- Department of Computer Science, George Washington University, Washington, DC 20052, USA
| | | |
Collapse
|
133
|
Wishart DS. In Silico Drug Exploration and Discovery Using DrugBank. ACTA ACUST UNITED AC 2007; Chapter 14:Unit 14.4. [DOI: 10.1002/0471250953.bi1404s18] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- David S. Wishart
- University of Alberta and the National Institute of Nanotechnology (NINT) National Research Council Edmonton Alberta Canada
| |
Collapse
|
134
|
Gowri VS, Tina KG, Krishnadev O, Srinivasan N. Strategies for the effective identification of remotely related sequences in multiple PSSM search approach. Proteins 2007; 67:789-94. [PMID: 17380509 DOI: 10.1002/prot.21356] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Searches using position specific scoring matrices (PSSMs) have been commonly used in remote homology detection procedures such as PSI-BLAST and RPS-BLAST. A PSSM is generated typically using one of the sequences of a family as the reference sequence. In the case of PSI-BLAST searches the reference sequence is same as the query. Recently we have shown that searches against the database of multiple family-profiles, with each one of the members of the family used as a reference sequence, are more effective than searches against the classical database of single family-profiles. Despite relatively a better overall performance when compared with common sequence-profile matching procedures, searches against the multiple family-profiles database result in a few false positives and false negatives. Here we show that profile length and divergence of sequences used in the construction of a PSSM have major influence on the performance of multiple profile based search approach. We also identify that a simple parameter defined by the number of PSSMs corresponding to a family that is hit, for a query, divided by the total number of PSSMs in the family can distinguish effectively the true positives from the false positives in the multiple profiles search approach.
Collapse
Affiliation(s)
- V S Gowri
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
| | | | | | | |
Collapse
|
135
|
Wilkerson MD, Schlueter SD, Brendel V. yrGATE: a web-based gene-structure annotation tool for the identification and dissemination of eukaryotic genes. Genome Biol 2007; 7:R58. [PMID: 16859520 PMCID: PMC1779557 DOI: 10.1186/gb-2006-7-7-r58] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2006] [Revised: 06/08/2006] [Accepted: 07/05/2006] [Indexed: 11/10/2022] Open
Abstract
Your Gene structure Annotation Tool for Eukaryotes (yrGATE) provides an Annotation Tool and Community Utilities for worldwide web-based community genome and gene annotation. Annotators can evaluate gene structure evidence derived from multiple sources to create gene structure annotations. Administrators regulate the acceptance of annotations into published gene sets. yrGATE is designed to facilitate rapid and accurate annotation of emerging genomes as well as to confirm, refine, or correct currently published annotations. yrGATE is highly portable and supports different standard input and output formats. The yrGATE software and usage cases are available at http://www.plantgdb.org/prj/yrGATE.
Collapse
Affiliation(s)
- Matthew D Wilkerson
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011-3260, USA
| | - Shannon D Schlueter
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011-3260, USA
| | - Volker Brendel
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011-3260, USA
- Department of Statistics, Iowa State University, Ames, IA 50011-3260, USA
| |
Collapse
|
136
|
PCR-based landmark unique gene (PLUG) markers effectively assign homoeologous wheat genes to A, B and D genomes. BMC Genomics 2007; 8:135. [PMID: 17535443 PMCID: PMC1904201 DOI: 10.1186/1471-2164-8-135] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2007] [Accepted: 05/30/2007] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND EST-PCR markers normally represent specific products from target genes, and are therefore effective tools for genetic analysis. However, because wheat is an allohexaploid plant, PCR products derived from homoeologous genes are often simultaneously amplified. Such products may be easier to differentiate if they include intron sequences, which are more polymorphic than exon sequences. However, genomic sequence data for wheat are limited; therefore it is difficult to predict the location of introns. By using the similarities in gene structures between rice and wheat, we developed a system called PLUG (PCR-based Landmark Unique Gene) to design primers so that PCR products include intron sequences. We then investigated whether products amplified using such primers could serve as markers able to distinguish multiple products derived from homoeologous genes. RESULTS The PLUG system consists of the following steps: (1) Single-copy rice genes (Landmark Unique Gene loci; LUGs) exhibiting high degrees of homology to wheat UniGene sequences are extracted; (2) Alignment analysis is carried out using the LUGs and wheat UniGene sequences to predict exon-exon junctions, and LUGs which can be used to design wheat primers flanking introns (TaEST-LUGs) are extracted; and (3) Primers are designed in an interactive manner. From a total of 4,312 TaEST-LUGs, 24 loci were randomly selected and used to design primers. With all of these primer sets, we obtained specific, intron-containing products from the target genes. These markers were assigned to chromosomes using wheat nullisomic-tetrasomic lines. By PCR-RFLP analysis using agarose gel electrophoresis, 19 of the 24 markers were located on at least one chromosome. CONCLUSION In the development of wheat EST-PCR markers capable of efficiently sorting products derived from homoeologous genes, it is important to design primers able to amplify products that include intron sequences with insertion/deletion polymorphisms. Using the PLUG system, wheat EST sequences that can be used for marker development are selected based on comparative genomics with rice, and then primer sets flanking intron sequences are prepared in an interactive, semi-automatic manner. Hence, the PLUG system is an effective tool for large-scale marker development.
Collapse
|
137
|
Mazumder R, Hu ZZ, Vinayaka CR, Sagripanti JL, Frost SDW, Kosakovsky Pond SL, Wu CH. Computational analysis and identification of amino acid sites in dengue E proteins relevant to development of diagnostics and vaccines. Virus Genes 2007; 35:175-86. [PMID: 17508277 DOI: 10.1007/s11262-007-0103-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2007] [Accepted: 04/11/2007] [Indexed: 10/23/2022]
Abstract
We have identified 72 completely conserved amino acid residues in the E protein of major groups of the Flavivirus genus by computational analyses. In the dengue species we have identified 12 highly conserved sequence regions, 186 negatively selected sites, and many dengue serotype-specific negatively selected sites. The flavivirus-conserved sites included residues involved in forming six disulfide bonds crucial for the structural integrity of the protein, the fusion motif involved in viral infectivity, and the interface residues of the oligomers. The structural analysis of the E protein showed 19 surface-exposed non-conserved residues, 128 dimer or trimer interface residues, and regions, which undergo major conformational change during trimerization. Eleven consensus T(h)-cell epitopes common to all four dengue serotypes were predicted. Most of these corresponded to dengue-conserved regions or negatively selected sites. Of special interest are six singular sites (N(37), Q(211), D(215), P(217), H(244), K(246)) in dengue E protein that are conserved, are part of the predicted consensus T(h)-cell epitopes and are exposed in the dimer or trimer. We propose these sites and corresponding epitopic regions as potential candidates for prioritization by experimental biologists for development of diagnostics and vaccines that may be difficult to circumvent by natural or man-made alteration of dengue virus.
Collapse
Affiliation(s)
- Raja Mazumder
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, USA
| | | | | | | | | | | | | |
Collapse
|
138
|
Kowalska A, Bozsaky E, Ramsauer T, Rieder D, Bindea G, Lörch T, Trajanoski Z, Ambros PF. A new platform linking chromosomal and sequence information. Chromosome Res 2007; 15:327-39. [PMID: 17406992 DOI: 10.1007/s10577-007-1129-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2006] [Revised: 01/24/2007] [Accepted: 01/24/2007] [Indexed: 10/23/2022]
Abstract
We have tested whether a direct correlation of sequence information and staining properties of chromosomes is possible and whether this combined information can be used to precisely map any position on the chromosome. Despite huge differences of compaction between the naked DNA and the DNA packed in chromosomes we found a striking correlation when visualizing the GGCC density on both levels. Software was developed that allows one to superimpose chromosomal fluorescence intensity profiles generated by chromolysin A3 (CMA3) staining with GGCC density extracted from the Ensembl database. Thus, any position along the chromosome can be defined in megabase pairs (Mb) besides the cytoband information, enabling direct alignment of chromosomal information with the sequence data. The mapping tool was validated using 13 different BAC clones, resulting in a mean difference from Ensembl data of 2 Mb (ranging from 0.79 to 3.57 Mb). Our results indicate that the sequence density information and information gained with sequence-specific fluorochromes are superimposable. Thus, the visualized GGCC motif density along the chromosome (sequence bands) provides a unique platform for comparing different types of genomic information.
Collapse
Affiliation(s)
- Agata Kowalska
- CCRI, Children's Cancer Research Institute, St. Anna Kinderkrebsforschung, 1090, Vienna, Austria
| | | | | | | | | | | | | | | |
Collapse
|
139
|
Montgomery SB, Griffith OL, Schuetz JM, Brooks-Wilson A, Jones SJM. A survey of genomic properties for the detection of regulatory polymorphisms. PLoS Comput Biol 2007; 3:e106. [PMID: 17559298 PMCID: PMC1892352 DOI: 10.1371/journal.pcbi.0030106] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2006] [Accepted: 04/25/2007] [Indexed: 11/18/2022] Open
Abstract
Advances in the computational identification of functional noncoding polymorphisms will aid in cataloging novel determinants of health and identifying genetic variants that explain human evolution. To date, however, the development and evaluation of such techniques has been limited by the availability of known regulatory polymorphisms. We have attempted to address this by assembling, from the literature, a computationally tractable set of regulatory polymorphisms within the ORegAnno database (http://www.oreganno.org). We have further used 104 regulatory single-nucleotide polymorphisms from this set and 951 polymorphisms of unknown function, from 2-kb and 152-bp noncoding upstream regions of genes, to investigate the discriminatory potential of 23 properties related to gene regulation and population genetics. Among the most important properties detected in this region are distance to transcription start site, local repetitive content, sequence conservation, minor and derived allele frequencies, and presence of a CpG island. We further used the entire set of properties to evaluate their collective performance in detecting regulatory polymorphisms. Using a 10-fold cross-validation approach, we were able to achieve a sensitivity and specificity of 0.82 and 0.71, respectively, and we show that this performance is strongly influenced by the distance to the transcription start site.
Collapse
Affiliation(s)
- Stephen B Montgomery
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada.
| | | | | | | | | |
Collapse
|
140
|
Welsch C, Albrecht M, Maydt J, Herrmann E, Welker MW, Sarrazin C, Scheidig A, Lengauer T, Zeuzem S. Structural and functional comparison of the non-structural protein 4B in flaviviridae. J Mol Graph Model 2007; 26:546-57. [PMID: 17507273 DOI: 10.1016/j.jmgm.2007.03.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2006] [Revised: 03/23/2007] [Accepted: 03/28/2007] [Indexed: 12/27/2022]
Abstract
Flaviviridae are evolutionarily related viruses, comprising the hepatitis C virus (HCV), with the non-structural protein 4B (NS4B) as one of the least characterized proteins. NS4B is located in the endoplasmic reticulum membrane and is assumed to be a multifunctional protein. However, detailed structure information is missing. The hydrophobic nature of NS4B is a major difficulty for many experimental techniques. We applied bioinformatics methods to analyse structural and functional properties of NS4B in different viruses. We distinguish a central non-globular membrane portion with four to five transmembrane regions from an N- and C-terminal part with non-transmembrane helical elements. We demonstrate high similarity in sequence and structure for the C-terminal part within the flaviviridae family. A palmitoylation site contained in the C-terminal part of HCV is equally conserved in GB virus B. Furthermore, we identify and characterize an N-terminal basic leucine zipper (bZIP) motif in HCV, which is suggestive of a functionally important interaction site. In addition, we model the interaction of the bZIP region with the recently identified interaction partner CREB-RP/ATF6beta, a human activating transcription factor involved in ER-stress. In conclusion, the versatile structure, together with functional sites and motifs, possibly enables NS4B to adopt a role as protein hub in the membranous web interaction network of virus and host proteins. Important structural and functional properties of NS4B are predicted with implications for ER-stress response, altered gene expression and replication efficacy.
Collapse
Affiliation(s)
- Christoph Welsch
- Internal Medicine II, Saarland University Hospital, Kirrberger Strasse, 66421 Homburg/Saar, Germany.
| | | | | | | | | | | | | | | | | |
Collapse
|
141
|
Pritham EJ, Putliwala T, Feschotte C. Mavericks, a novel class of giant transposable elements widespread in eukaryotes and related to DNA viruses. Gene 2007; 390:3-17. [PMID: 17034960 DOI: 10.1016/j.gene.2006.08.008] [Citation(s) in RCA: 157] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2006] [Accepted: 08/02/2006] [Indexed: 11/23/2022]
Abstract
We previously identified a group of atypical mobile elements designated Mavericks from the nematodes Caenorhabditis elegans and C. briggsae and the zebrafish Danio rerio. Here we present the results of comprehensive database searches of the genome sequences available, which reveal that Mavericks are widespread in invertebrates and non-mammalian vertebrates but show a patchy distribution in non-animal species, being present in the fungi Glomus intraradices and Phakopsora pachyrhizi and in several single-celled eukaryotes such as the ciliate Tetrahymena thermophila, the stramenopile Phytophthora infestans and the trichomonad Trichomonas vaginalis, but not detectable in plants. This distribution, together with comparative and phylogenetic analyses of Maverick-encoded proteins, is suggestive of an ancient origin of these elements in eukaryotes followed by lineage-specific losses and/or recurrent episodes of horizontal transmission. In addition, we report that Maverick elements have amplified recently to high copy numbers in T. vaginalis where they now occupy as much as 30% of the genome. Sequence analysis confirms that most Mavericks encode a retroviral-like integrase, but lack other open reading frames typically found in retroelements. Nevertheless, the length and conservation of the target site duplication created upon Maverick insertion (5- or 6-bp) is consistent with a role of the integrase-like protein in the integration of a double-stranded DNA transposition intermediate. Mavericks also display long terminal-inverted repeats but do not contain ORFs similar to proteins encoded by DNA transposons. Instead, Mavericks encode a conserved set of 5 to 9 genes (in addition to the integrase) that are predicted to encode proteins with homology to replication and packaging proteins of some bacteriophages and diverse eukaryotic double-stranded DNA viruses, including a DNA polymerase B homolog and putative capsid proteins. Based on these and other structural similarities, we speculate that Mavericks represent an evolutionary missing link between seemingly disparate invasive DNA elements that include bacteriophages, adenoviruses and eukaryotic linear plasmids.
Collapse
Affiliation(s)
- Ellen J Pritham
- The University of Texas at Arlington, The Department of Biology, Arlington, TX 76019, United States.
| | | | | |
Collapse
|
142
|
Gene function in early mouse embryonic stem cell differentiation. BMC Genomics 2007; 8:85. [PMID: 17394647 PMCID: PMC1851713 DOI: 10.1186/1471-2164-8-85] [Citation(s) in RCA: 113] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2006] [Accepted: 03/29/2007] [Indexed: 12/20/2022] Open
Abstract
Background Little is known about the genes that drive embryonic stem cell differentiation. However, such knowledge is necessary if we are to exploit the therapeutic potential of stem cells. To uncover the genetic determinants of mouse embryonic stem cell (mESC) differentiation, we have generated and analyzed 11-point time-series of DNA microarray data for three biologically equivalent but genetically distinct mESC lines (R1, J1, and V6.5) undergoing undirected differentiation into embryoid bodies (EBs) over a period of two weeks. Results We identified the initial 12 hour period as reflecting the early stages of mESC differentiation and studied probe sets showing consistent changes of gene expression in that period. Gene function analysis indicated significant up-regulation of genes related to regulation of transcription and mRNA splicing, and down-regulation of genes related to intracellular signaling. Phylogenetic analysis indicated that the genes showing the largest expression changes were more likely to have originated in metazoans. The probe sets with the most consistent gene changes in the three cell lines represented 24 down-regulated and 12 up-regulated genes, all with closely related human homologues. Whereas some of these genes are known to be involved in embryonic developmental processes (e.g. Klf4, Otx2, Smn1, Socs3, Tagln, Tdgf1), our analysis points to others (such as transcription factor Phf21a, extracellular matrix related Lama1 and Cyr61, or endoplasmic reticulum related Sc4mol and Scd2) that have not been previously related to mESC function. The majority of identified functions were related to transcriptional regulation, intracellular signaling, and cytoskeleton. Genes involved in other cellular functions important in ESC differentiation such as chromatin remodeling and transmembrane receptors were not observed in this set. Conclusion Our analysis profiles for the first time gene expression at a very early stage of mESC differentiation, and identifies a functional and phylogenetic signature for the genes involved. The data generated constitute a valuable resource for further studies. All DNA microarray data used in this study are available in the StemBase database of stem cell gene expression data [1] and in the NCBI's GEO database.
Collapse
|
143
|
Clavel T, Lippman R, Gavini F, Doré J, Blaut M. Clostridium saccharogumia sp. nov. and Lactonifactor longoviformis gen. nov., sp. nov., two novel human faecal bacteria involved in the conversion of the dietary phytoestrogen secoisolariciresinol diglucoside. Syst Appl Microbiol 2007; 30:16-26. [PMID: 17196483 DOI: 10.1016/j.syapm.2006.02.003] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2006] [Indexed: 10/24/2022]
Abstract
Two anaerobic bacteria involved in the conversion of the plant lignan secoisolariciresinol diglucoside were isolated from faeces of a healthy male adult. The first isolate, strain SDG-Mt85-3Db, was a mesophilic strictly anaerobic Gram-positive helically coiled rod. Based on 16S r RNA gene sequence analysis, its nearest relatives were Clostridium cocleatum (96.7% similarity) and Clostridium ramosum (96.6%). In contrast to these species, the isolate was devoid of alpha-galactosidase and -glucosidase and did not grow on maltose, melibiose, raffinose, rhamnose and trehalose. The hypothesis that strain SDG-Mt85-3Db represents a new bacterial species of the Clostridium cluster XVIII was confirmed by DNA-DNA hybridisation experiments. The G+C content of DNA of strain SDG-Mt85-3Db (30.7+/-0.8 mol%) was comparable with that of Clostridium butyricum, the type species of the genus Clostridium. The name Clostridium saccharogumia is proposed for strain SDG-Mt85-3Db (=DSM 17460T=CCUG 51486T). The second isolate, strain ED-Mt61/PYG-s6, was a mesophilic strictly anaerobic Gram-positive regular rod. Based on 16S rRNA gene sequence analysis, its nearest relatives were Clostridium amygdalinum (93.3%), Clostridium saccharolyticum (93.1%) and Ruminococcus productus (93.0%). The isolate differed from these species in its ability to dehydrogenate enterodiol. It also possessed alpha-arabinosidase and -galactosidase and had a higher G+C content of DNA (48.0 mol%). According to these findings, it is proposed to create a novel genus, Lactonifactor, and a novel species, Lactonifactor longoviformis, to accommodate strain ED-Mt61/PYG-s6. The type strain is DSM 17459T (=CCUG 51487T).
Collapse
Affiliation(s)
- Thomas Clavel
- Department of Gastrointestinal Microbiology, German Institute of Human Nutrition Potsdam-Rehbrücke, Arthur-Scheunert-Allee 155, 14558 Nuthetal, Germany
| | | | | | | | | |
Collapse
|
144
|
A comparative genomics approach to identifying the plasticity transcriptome. BMC Neurosci 2007; 8:20. [PMID: 17355637 PMCID: PMC1831778 DOI: 10.1186/1471-2202-8-20] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2006] [Accepted: 03/13/2007] [Indexed: 02/04/2023] Open
Abstract
Background Neuronal activity regulates gene expression to control learning and memory, homeostasis of neuronal function, and pathological disease states such as epilepsy. A great deal of experimental evidence supports the involvement of two particular transcription factors in shaping the genomic response to neuronal activity and mediating plasticity: CREB and zif268 (egr-1, krox24, NGFI-A). The gene targets of these two transcription factors are of considerable interest, since they may help develop hypotheses about how neural activity is coupled to changes in neural function. Results We have developed a computational approach for identifying binding sites for these transcription factors within the promoter regions of annotated genes in the mouse, rat, and human genomes. By combining a robust search algorithm to identify discrete binding sites, a comparison of targets across species, and an analysis of binding site locations within promoter regions, we have defined a group of candidate genes that are strong CREB- or zif268 targets and are thus regulated by neural activity. Our analysis revealed that CREB and zif268 share a disproportionate number of targets in common and that these common targets are dominated by transcription factors. Conclusion These observations may enable a more detailed understanding of the regulatory networks that are induced by neural activity and contribute to the plasticity transcriptome. The target genes identified in this study will be a valuable resource for investigators who hope to define the functions of specific genes that underlie activity-dependent changes in neuronal properties.
Collapse
|
145
|
Wood V. How to get the most from fission yeast genome data: a report from the 2006 European Fission Yeast Meeting computing workshop. Yeast 2007; 23:905-12. [PMID: 17072881 DOI: 10.1002/yea.1419] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
A fission yeast computing workshop 'How to get the most from the fission yeast genome data' was run as a satellite to the European Fission Yeast Meeting. The broad aims of the workshop were to provide fission yeast bench biologists with a set of tools and protocols to query the fission yeast genome data in specific ways, in order to extract biologically meaningful information of interest, which can be tailored to the needs of individual research projects. A description of the workshop content is provided and a selection of the tools presented are reviewed.
Collapse
Affiliation(s)
- Valerie Wood
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK.
| |
Collapse
|
146
|
Holloway DT, Kon M, DeLisi C. Machine learning for regulatory analysis and transcription factor target prediction in yeast. SYSTEMS AND SYNTHETIC BIOLOGY 2007; 1:25-46. [PMID: 19003435 PMCID: PMC2533145 DOI: 10.1007/s11693-006-9003-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
High throughput technologies, including array-based chromatin immunoprecipitation, have rapidly increased our knowledge of transcriptional maps-the identity and location of regulatory binding sites within genomes. Still, the full identification of sites, even in lower eukaryotes, remains largely incomplete. In this paper we develop a supervised learning approach to site identification using support vector machines (SVMs) to combine 26 different data types. A comparison with the standard approach to site identification using position specific scoring matrices (PSSMs) for a set of 104 Saccharomyces cerevisiae regulators indicates that our SVM-based target classification is more sensitive (73 vs. 20%) when specificity and positive predictive value are the same. We have applied our SVM classifier for each transcriptional regulator to all promoters in the yeast genome to obtain thousands of new targets, which are currently being analyzed and refined to limit the risk of classifier over-fitting. For the purpose of illustration we discuss several results, including biochemical pathway predictions for Gcn4 and Rap1. For both transcription factors SVM predictions match well with the known biology of control mechanisms, and possible new roles for these factors are suggested, such as a function for Rap1 in regulating fermentative growth. We also examine the promoter melting temperature curves for the targets of YJR060W, and show that targets of this TF have potentially unique physical properties which distinguish them from other genes. The SVM output automatically provides the means to rank dataset features to identify important biological elements. We use this property to rank classifying k-mers, thereby reconstructing known binding sites for several TFs, and to rank expression experiments, determining the conditions under which Fhl1, the factor responsible for expression of ribosomal protein genes, is active. We can see that targets of Fhl1 are differentially expressed in the chosen conditions as compared to the expression of average and negative set genes. SVM-based classifiers provide a robust framework for analysis of regulatory networks. Processing of classifier outputs can provide high quality predictions and biological insight into functions of particular transcription factors. Future work on this method will focus on increasing the accuracy and quality of predictions using feature reduction and clustering strategies. Since predictions have been made on only 104 TFs in yeast, new classifiers will be built for the remaining 100 factors which have available binding data.
Collapse
Affiliation(s)
- Dustin T. Holloway
- Molecular Biology Cell Biology and Biochemistry, Boston University, Boston, MA 02215 USA
| | - Mark Kon
- Department of Mathematics and Statistics, Boston University, Boston, MA 02215 USA
- Bioinformatics and Systems Biology, Boston University, Boston, MA 02215 USA
| | - Charles DeLisi
- Bioinformatics and Systems Biology, Boston University, Boston, MA 02215 USA
| |
Collapse
|
147
|
Zhu S, Okuno Y, Tsujimoto G, Mamitsuka H. Application of a new probabilistic model for mining implicit associated cancer genes from OMIM and medline. Cancer Inform 2007; 2:361-71. [PMID: 19458778 PMCID: PMC2675505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
An important issue in current medical science research is to find the genes that are strongly related to an inherited disease. A particular focus is placed on cancer-gene relations, since some types of cancers are inherited. As biomedical databases have grown speedily in recent years, an informatics approach to predict such relations from currently available databases should be developed. Our objective is to find implicit associated cancer-genes from biomedical databases including the literature database. Co-occurrence of biological entities has been shown to be a popular and efficient technique in biomedical text mining. We have applied a new probabilistic model, called mixture aspect model (MAM) [48], to combine different types of co-occurrences of genes and cancer derived from Medline and OMIM (Online Mendelian Inheritance in Man). We trained the probability parameters of MAM using a learning method based on an EM (Expectation and Maximization) algorithm. We examined the performance of MAM by predicting associated cancer gene pairs. Through cross-validation, prediction accuracy was shown to be improved by adding gene-gene co-occurrences from Medline to cancer-gene cooccurrences in OMIM. Further experiments showed that MAM found new cancer-gene relations which are unknown in the literature. Supplementary information can be found at http://www.bic.kyotou.ac.jp/pathway/zhusf/CancerInformatics/Supplemental2006.html.
Collapse
Affiliation(s)
- Shanfeng Zhu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University,Correspondence: Shanfeng Zhu, Kyoto University, Gokasho, Uji, 611-0011, Japan.
, Phone: +81-774-383038, Fax: +81-774-383037
| | - Yasushi Okuno
- Graduate School of Pharmaceutical Sciences, Kyoto University
| | - Gozoh Tsujimoto
- Graduate School of Pharmaceutical Sciences, Kyoto University
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University,Graduate School of Pharmaceutical Sciences, Kyoto University
| |
Collapse
|
148
|
Abstract
Breast cancer is the second most common cause of cancer-related death in women in the US and the UK, accounting for 15-17% of all female cancer deaths. Current treatment strategies include hormone therapy, such as anti-estrogens (tamoxifen) and aromatase inhibitors (exemastane, anastrozole, letrozole), as well as cytotoxics, such as the taxanes (paclitaxel, docetaxel). With multiple therapy choices, a method to prospectively screen patients prior to therapy selection is now needed. Pharmacogenetics seeks to develop screening mechanisms to optimise drug therapy. DNA variations in metabolism, transport and drug target genes may contribute to chemotherapy efficacy and toxicities. The status of the identification of genetic markers for breast cancer therapy selection is highlighted in this review.
Collapse
Affiliation(s)
- Sharon Marsh
- Washington University School of Medicine, Division of Oncology, St Louis, MO 63110, USA.
| | | |
Collapse
|
149
|
Grow M, Neff AW, Mescher AL, King MW. Global analysis of gene expression in Xenopus hindlimbs during stage-dependent complete and incomplete regeneration. Dev Dyn 2007; 235:2667-85. [PMID: 16871633 DOI: 10.1002/dvdy.20897] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Xenopus laevis tadpoles are capable of limb regeneration after amputation, in a process that initially involves the formation of a blastema. However, Xenopus has full regenerative capacity only through premetamorphic stages. We have used the Affymetrix Xenopus laevis Genome Genechip microarray to perform a large-scale screen of gene expression in the regeneration-complete, stage 53 (st53), and regeneration-incomplete, stage 57 (st57), hindlimbs at 1 and 5 days postamputation. Through an exhaustive reannotation of the Genechip and a variety of comparative bioinformatic analyses, we have identified genes that are differentially expressed between the regeneration-complete and -incomplete stages, detected the transcriptional changes associated with the regenerating blastema, and compared these results with those of other regeneration researchers. We focus particular attention on striking transcriptional activity observed in genes associated with patterning, stress response, and inflammation. Overall, this work provides the most comprehensive views yet of a regenerating limb and different transcriptional compositions of regeneration-competent and deficient tissues.
Collapse
Affiliation(s)
- Matthew Grow
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana, USA.
| | | | | | | |
Collapse
|
150
|
Murphy AM, MacHugh DE, Park SDE, Scraggs E, Haley CS, Lynn DJ, Boland MP, Doherty ML. Linkage mapping of the locus for inherited ovine arthrogryposis (IOA) to sheep chromosome 5. Mamm Genome 2007; 18:43-52. [PMID: 17242863 DOI: 10.1007/s00335-006-0016-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2006] [Accepted: 09/21/2006] [Indexed: 11/30/2022]
Abstract
Arthrogryposis is a congenital malformation affecting the limbs of newborn animals and infants. Previous work has demonstrated that inherited ovine arthrogryposis (IOA) has an autosomal recessive mode of inheritance. Two affected homozygous recessive (art/art) Suffolk rams were used as founders for a backcross pedigree of half-sib families segregating the IOA trait. A genome scan was performed using 187 microsatellite genetic markers and all backcross animals were phenotyped at birth for the presence and severity of arthrogryposis. Pairwise LOD scores of 1.86, 1.35, and 1.32 were detected for three microsatellites, BM741, JAZ, and RM006, that are located on sheep Chr 5 (OAR5). Additional markers in the region were identified from the genetic linkage map of BTA7 and by in silico analyses of the draft bovine genome sequence, three of which were informative. Interval mapping of all autosomes produced an F value of 21.97 (p < 0.01) for a causative locus in the region of OAR5 previously flagged by pairwise linkage analysis. Inspection of the orthologous region of HSA5 highlighted a previously fine-mapped locus for human arthrogryposis multiplex congenita neurogenic type (AMCN). A survey of the HSA5 genome sequence identified plausible candidate genes for both IOA and human AMCN.
Collapse
Affiliation(s)
- Angela M Murphy
- Animal Genomics Laboratory, School of Agriculture, Food Science and Veterinary Medicine, College of Life Sciences, University College Dublin, Belfield, Dublin 4, Ireland
| | | | | | | | | | | | | | | |
Collapse
|