1151
|
Budd A, Blandin S, Levashina EA, Gibson TJ. Bacterial alpha2-macroglobulins: colonization factors acquired by horizontal gene transfer from the metazoan genome? Genome Biol 2004; 5:R38. [PMID: 15186489 PMCID: PMC463071 DOI: 10.1186/gb-2004-5-6-r38] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2004] [Revised: 04/02/2004] [Accepted: 04/08/2004] [Indexed: 11/14/2022] Open
Abstract
Homologs of metazoan α2-macroglobulins have been found in bacteria. The distribution of these genes in diverse bacterial clades suggests they have been acquired by multiple horizontal transfers. Background Invasive bacteria are known to have captured and adapted eukaryotic host genes. They also readily acquire colonizing genes from other bacteria by horizontal gene transfer. Closely related species such as Helicobacter pylori and Helicobacter hepaticus, which exploit different host tissues, share almost none of their colonization genes. The protease inhibitor α2-macroglobulin provides a major metazoan defense against invasive bacteria, trapping attacking proteases required by parasites for successful invasion. Results Database searches with metazoan α2-macroglobulin sequences revealed homologous sequences in bacterial proteomes. The bacterial α2-macroglobulin phylogenetic distribution is patchy and violates the vertical descent model. Bacterial α2-macroglobulin genes are found in diverse clades, including purple bacteria (proteobacteria), fusobacteria, spirochetes, bacteroidetes, deinococcids, cyanobacteria, planctomycetes and thermotogae. Most bacterial species with bacterial α2-macroglobulin genes exploit higher eukaryotes (multicellular plants and animals) as hosts. Both pathogenically invasive and saprophytically colonizing species possess bacterial α2-macroglobulins, indicating that bacterial α2-macroglobulin is a colonization rather than a virulence factor. Conclusions Metazoan α2-macroglobulins inhibit proteases of pathogens. The bacterial homologs may function in reverse to block host antimicrobial defenses. α2-macroglobulin was probably acquired one or more times from metazoan hosts and has then spread widely through other colonizing bacterial species by more than 10 independent horizontal gene transfers. yfhM-like bacterial α2-macroglobulin genes are often found tightly linked with pbpC, encoding an atypical peptidoglycan transglycosylase, PBP1C, that does not function in vegetative peptidoglycan synthesis. We suggest that YfhM and PBP1C are coupled together as a periplasmic defense and repair system. Bacterial α2-macroglobulins might provide useful targets for enhancing vaccine efficacy in combating infections.
Collapse
Affiliation(s)
- Aidan Budd
- European Molecular Biology Laboratory, 69012 Heidelberg, Germany
| | | | - Elena A Levashina
- UPR 9022 du CNRS, IBMC, rue René Descartes, F-67087 Strasbourg CEDEX, France
| | - Toby J Gibson
- European Molecular Biology Laboratory, 69012 Heidelberg, Germany
| |
Collapse
|
1152
|
Abstract
Background Computational gene prediction continues to be an important problem, especially for genomes with little experimental data. Results I introduce the SNAP gene finder which has been designed to be easily adaptable to a variety of genomes. In novel genomes without an appropriate gene finder, I demonstrate that employing a foreign gene finder can produce highly inaccurate results, and that the most compatible parameters may not come from the nearest phylogenetic neighbor. I find that foreign gene finders are more usefully employed to bootstrap parameter estimation and that the resulting parameters can be highly accurate. Conclusion Since gene prediction is sensitive to species-specific parameters, every genome needs a dedicated gene finder.
Collapse
Affiliation(s)
- Ian Korf
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
| |
Collapse
|
1153
|
Feig M, Karanicolas J, Brooks CL. MMTSB Tool Set: enhanced sampling and multiscale modeling methods for applications in structural biology. J Mol Graph Model 2004; 22:377-95. [PMID: 15099834 DOI: 10.1016/j.jmgm.2003.12.005] [Citation(s) in RCA: 738] [Impact Index Per Article: 35.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We describe the Multiscale Modeling Tools for Structural Biology (MMTSB) Tool Set (https://mmtsb.scripps.edu/software/mmtsbToolSet.html), which is a novel set of utilities and programming libraries that provide new enhanced sampling and multiscale modeling techniques for the simulation of proteins and nucleic acids. The tool set interfaces with the existing molecular modeling packages CHARMM and Amber for classical all-atom simulations, and with MONSSTER for lattice-based low-resolution conformational sampling. In addition, it adds new functionality for the integration and translation between both levels of detail. The replica exchange method is implemented to allow enhanced sampling of both the all-atom and low-resolution models. The tool set aims at applications in structural biology that involve protein or nucleic acid structure prediction, refinement, and/or extended conformational sampling. With structure prediction applications in mind, the tool set also implements a facility that allows the control and application of modeling tasks on a large set of conformations in what we have termed ensemble computing. Ensemble computing encompasses loosely coupled, parallel computation on high-end parallel computers, clustered computational grids and desktop grid environments. This paper describes the design and implementation of the MMTSB Tool Set and illustrates its utility with three typical examples--scoring of a set of predicted protein conformations in order to identify the most native-like structures, ab initio folding of peptides in implicit solvent with the replica exchange method, and the prediction of a missing fragment in a larger protein structure.
Collapse
Affiliation(s)
- Michael Feig
- Department of Molecular Biology, TPC6, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA
| | | | | |
Collapse
|
1154
|
Schageman JJ, Horton CJ, Niu S, Garner HR, Pertsemlidis A. ELXR: a resource for rapid exon-directed sequence analysis. Genome Biol 2004; 5:R36. [PMID: 15128450 PMCID: PMC416472 DOI: 10.1186/gb-2004-5-5-r36] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2004] [Revised: 02/13/2004] [Accepted: 04/14/2004] [Indexed: 11/20/2022] Open
Abstract
ELXR is a web-based tool for designing exon-specific PCR/sequencing primers. A database, ELXRdb, containing precomputed primer pairs has been developed. ELXR (Exon Locator and Extractor for Resequencing) streamlines the process of determining exon/intron boundaries and designing PCR and sequencing primers for high-throughput resequencing of exons. We have pre-computed ELXR primer sets for all exons identified from the human, mouse, and rat mRNA reference sequence (RefSeq) public databases curated by the National Center for Biotechnology Information. The resulting exon-flanking PCR primer pairs have been compiled into a system called ELXRdb, which may be searched by keyword, gene name or RefSeq accession number.
Collapse
Affiliation(s)
- Jeoffrey J Schageman
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Harry Hines Boulevard, Dallas, TX 75390, USA.
| | | | | | | | | |
Collapse
|
1155
|
Abstract
MOTIVATION Mathematically optimal alignments do not always properly align active site residues or well-recognized structural elements. Most near-optimal sequence alignment algorithms display alternative alignment paths, rather than the conventional residue-by-residue pairwise alignment. Typically, these methods do not provide mechanisms for finding effectively the most biologically meaningful alignment in the potentially large set of options. RESULTS We have developed Web-based software that displays near optimal or alternative alignments of two protein or DNA sequences as a continuous moving picture. A WWW interface to a C++ program generates near optimal alignments, which are sent to a Java Applet, which displays them in a series of alignment frames. The Applet aligns residues so that consistently aligned regions remain at a fixed position on the display, while variable regions move. The display can be stopped to examine alignment details.
Collapse
Affiliation(s)
- Michael E Smoot
- Department of Systems and Information Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | | | | |
Collapse
|
1156
|
Klee EW, Carlson DF, Fahrenkrug SC, Ekker SC, Ellis LBM. Identifying secretomes in people, pufferfish and pigs. Nucleic Acids Res 2004; 32:1414-21. [PMID: 14990746 PMCID: PMC390277 DOI: 10.1093/nar/gkh286] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The proteins processed by the secretory pathway (secretome) are critical players in the development of multi-cellular eukaryotic organisms but have yet to be comprehensively studied at the genomic level. In this study, we use the Target P algorithm to predict human (13-20% of proteins found in individual datasets) and Fugu (14%) secretomes based on analysis of their nearly complete proteomes. We combine internal processing with prediction software to automate secreted protein identification and overcome one of the major challenges associated with EST data: identification of the minority of clones that encode N-terminally-complete proteins. We discuss the use of these methods to predict secreted proteins in EST-based consensus sequence sets, and we validate these predictions using an assay for cell-free cotranslational translocation. Analysis of TIGR Porcine Gene Index 4.0 as a test dataset resulted in the identification of 352 N-terminally-complete, putative secreted proteins. In functional agreement with our predictions, 34 of 40 (85%) of these cDNAs were verified to be cotranslationally translocated in an in vitro translation system. The methods developed here are specifically designed to accept partial open reading frames and improve secreted protein predictions in eukaryotic transcriptomes, and are valuable for the analysis and annotation of eukaryotic EST databases.
Collapse
Affiliation(s)
- Eric W Klee
- Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN 55455, USA
| | | | | | | | | |
Collapse
|
1157
|
Naderi A, Ahmed AA, Barbosa-Morais NL, Aparicio S, Brenton JD, Caldas C. Expression microarray reproducibility is improved by optimising purification steps in RNA amplification and labelling. BMC Genomics 2004; 5:9. [PMID: 15005798 PMCID: PMC343272 DOI: 10.1186/1471-2164-5-9] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2003] [Accepted: 01/30/2004] [Indexed: 11/22/2022] Open
Abstract
Background Expression microarrays have evolved into a powerful tool with great potential for clinical application and therefore reliability of data is essential. RNA amplification is used when the amount of starting material is scarce, as is frequently the case with clinical samples. Purification steps are critical in RNA amplification and labelling protocols, and there is a lack of sufficient data to validate and optimise the process. Results Here the purification steps involved in the protocol for indirect labelling of amplified RNA are evaluated and the experimentally determined best method for each step with respect to yield, purity, size distribution of the transcripts, and dye coupling is used to generate targets tested in replicate hybridisations. DNase treatment of diluted total RNA samples followed by phenol extraction is the optimal way to remove genomic DNA contamination. Purification of double-stranded cDNA is best achieved by phenol extraction followed by isopropanol precipitation at room temperature. Extraction with guanidinium-phenol and Lithium Chloride precipitation are the optimal methods for purification of amplified RNA and labelled aRNA respectively. Conclusion This protocol provides targets that generate highly reproducible microarray data with good representation of transcripts across the size spectrum and a coefficient of repeatability significantly better than that reported previously.
Collapse
Affiliation(s)
- Ali Naderi
- Cancer Genomics Program, Department of Oncology, University of Cambridge, Hutchison/MRC Research Centre, Hills Road, Cambridge CB2 2XZ, United Kingdom
| | - Ahmed A Ahmed
- Cancer Genomics Program, Department of Oncology, University of Cambridge, Hutchison/MRC Research Centre, Hills Road, Cambridge CB2 2XZ, United Kingdom
| | - Nuno L Barbosa-Morais
- Cancer Genomics Program, Department of Oncology, University of Cambridge, Hutchison/MRC Research Centre, Hills Road, Cambridge CB2 2XZ, United Kingdom
- Institute of Molecular Medicine, Faculty of Medicine, University of Lisbon, 1649-028 Lisbon, Portugal
| | - Samuel Aparicio
- Cancer Genomics Program, Department of Oncology, University of Cambridge, Hutchison/MRC Research Centre, Hills Road, Cambridge CB2 2XZ, United Kingdom
| | - James D Brenton
- Cancer Genomics Program, Department of Oncology, University of Cambridge, Hutchison/MRC Research Centre, Hills Road, Cambridge CB2 2XZ, United Kingdom
| | - Carlos Caldas
- Cancer Genomics Program, Department of Oncology, University of Cambridge, Hutchison/MRC Research Centre, Hills Road, Cambridge CB2 2XZ, United Kingdom
| |
Collapse
|
1158
|
Gunsalus KC, Yueh WC, MacMenamin P, Piano F. RNAiDB and PhenoBlast: web tools for genome-wide phenotypic mapping projects. Nucleic Acids Res 2004; 32:D406-10. [PMID: 14681444 PMCID: PMC308844 DOI: 10.1093/nar/gkh110] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
RNA interference (RNAi) is being used in large-scale genomic studies as a rapid way to obtain in vivo functional information associated with specific genes. How best to archive and mine the complex data derived from these studies provides a series of challenges associated with both the methods used to elicit the RNAi response and the functional data gathered. RNAiDB (RNAi Database; http://www. rnai.org) has been created for the archival, distribution and analysis of phenotypic data from large-scale RNAi analyses in Caenorhabditis elegans. The database contains a compendium of publicly available data and provides information on experimental methods and phenotypic results, including raw data in the form of images and streaming time-lapse movies. Phenotypic summaries together with graphical displays of RNAi to gene mappings allow quick intuitive comparison of results from different RNAi assays and visualization of the gene product(s) potentially inhibited by each RNAi experiment based on multiple sequence analysis methods. RNAiDB can be searched using combinatorial queries and using the novel tool PhenoBlast, which ranks genes according to their overall phenotypic similarity. RNAiDB could serve as a model database for distributing and navigating in vivo functional information from large-scale systematic phenotypic analyses in different organisms.
Collapse
Affiliation(s)
- Kristin C Gunsalus
- Center for Comparative Functional Genomics, Department of Biology, New York University, 1009 Silver Building, 100 Washington Square E., New York, NY 10003, USA.
| | | | | | | |
Collapse
|
1159
|
Fredman D, Munns G, Rios D, Sjöholm F, Siegfried M, Lenhard B, Lehväslaiho H, Brookes AJ. HGVbase: a curated resource describing human DNA variation and phenotype relationships. Nucleic Acids Res 2004; 32:D516-9. [PMID: 14681471 PMCID: PMC308845 DOI: 10.1093/nar/gkh111] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Human Genome Variation Database (HGVbase; http://hgvbase.cgb.ki.se) has provided a curated summary of human DNA variation for more than 5 years, thus facilitating research into DNA sequence variation and human phenotypes. The database has undergone many changes and improvements to accommodate increasing volumes and new types of data. The focus of HGVbase has recently shifted towards information on haplotypes and phenotypes, relationships between phenotypes and DNA variation, and collaborative efforts to provide a global resource for genome-phenome data. Open sharing and precise phenotype definitions are necessary to advance the current understanding of common diseases that are typified by complex aetiologies, small genetic effect sizes and multiple confounding factors that obscure positive study results. Association data will increasingly be collected as part of this new project thrust. This report describes the evolving features of HGVbase, and covers in detail the technological choices we have made to enable efficient storage and data mining of increasingly large and complex data sets.
Collapse
Affiliation(s)
- D Fredman
- Center for Genomics and Bioinformatics, Karolinska Institute, Berzelius väg 35, S-171 77 Stockholm, Sweden
| | | | | | | | | | | | | | | |
Collapse
|
1160
|
Hermoso A, Aguilar D, Aviles FX, Querol E. TrSDB: a proteome database of transcription factors. Nucleic Acids Res 2004; 32:D171-3. [PMID: 14681387 PMCID: PMC308835 DOI: 10.1093/nar/gkh101] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
TrSDB-TranScout Database-(http://ibb.uab.es/trsdb) is a proteome database of eukaryotic transcription factors based upon predicted motifs by TranScout and data sources such as InterPro and Gene Ontology Annotation. Nine eukaryotic proteomes are included in the current version. Extensive and diverse information for each database entry, different analyses considering TranScout classification and similarity relationships are offered for research on transcription factors or gene expression.
Collapse
Affiliation(s)
- Antoni Hermoso
- Institut de Biotecnologia i Biomedicina and Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain
| | | | | | | |
Collapse
|
1161
|
Ben Zakour N, Gautier M, Andonov R, Lavenier D, Cochet MF, Veber P, Sorokin A, Le Loir Y. GenoFrag: software to design primers optimized for whole genome scanning by long-range PCR amplification. Nucleic Acids Res 2004; 32:17-24. [PMID: 14704339 PMCID: PMC373259 DOI: 10.1093/nar/gkg928] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Genome sequence data can be used to analyze genome plasticity by whole genome PCR scanning. Small sized chromosomes can indeed be fully amplified by long-range PCR with a set of primers designed using a reference strain and applied to several other strains. Analysis of the resulting patterns can reveal the genome plasticity. To facilitate such analysis, we have developed GenoFrag, a software package for the design of primers optimized for whole genome scanning by long-range PCR. GenoFrag was developed for the analysis of Staphylococcus aureus genome plasticity by whole genome amplification in approximately 10 kb-long fragments. A set of primers was generated from the genome sequence of S.aureus N315, employed here as a reference strain. Two subsets of primers were successfully used to amplify two portions of the N315 chromosome. This experimental validation demonstrates that GenoFrag is a robust and reliable tool for primer design and that whole genome PCR scanning can be envisaged for the analysis of genome diversity in S.aureus, one of the major public health concerns worldwide.
Collapse
Affiliation(s)
- Nouri Ben Zakour
- Laboratoire d'Hygiène Alimentaire, UMR STLO, Institut National de la Recherche Agronomique, Ecole Nationale Supérieure Agronomique, 65 rue de Saint Brieuc, CS84215, 35042 Rennes cedex, France
| | | | | | | | | | | | | | | |
Collapse
|
1162
|
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004. [PMID: 15461798 DOI: 10.1186/gb-2004-5-10-r89] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/15/2023] Open
Abstract
The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples.
Collapse
Affiliation(s)
- Robert C Gentleman
- Department of Biostatistical Science, Dana-Farber Cancer Institute, 44 Binney St, Boston, MA 02115, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1163
|
Ross HA, Rodrigo AG. An Assessment of Matrix Representation with Compatibility in Supertree Construction. COMPUTATIONAL BIOLOGY 2004. [DOI: 10.1007/978-1-4020-2330-9_3] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
1164
|
Diener SE, Dunn-Coleman N, Foreman P, Houfek TD, Teunissen PJM, van Solingen P, Dankmeyer L, Mitchell TK, Ward M, Dean RA. Characterization of the protein processing and secretion pathways in a comprehensive set of expressed sequence tags fromTrichoderma reesei. FEMS Microbiol Lett 2004; 230:275-82. [PMID: 14757250 DOI: 10.1016/s0378-1097(03)00916-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
Abstract
Trichoderma reesei is a filamentous fungus widely used as an efficient protein producer and known to secrete large quantities of biomass degrading enzymes. Much work has been done aimed at improving the secretion efficiency of this fungus. It is generally accepted that the major bottlenecks in secretion are protein folding and ornamentation steps in this pathway. In an attempt to identify genes involved in these steps, the 5' ends of 21888 cDNA clones were sequenced from which a unique set of over 5000 were also 3' sequenced. Using annotation tools Gene Ontology terms were assigned to 2732 of the sequences. Homologs to the majority of Aspergillus niger's Srg genes as well as a number of homologs to genes involved in protein folding and ornamentation pathways were identified.
Collapse
Affiliation(s)
- S E Diener
- Fungal Genomics Laboratory, North Carolina State University, Suite 1200, 840 Main Campus Drive, Raleigh, NC 27606, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
1165
|
Berezikov E, Guryev V, Plasterk RHA, Cuppen E. CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting. Genome Res 2003; 14:170-8. [PMID: 14672977 PMCID: PMC314294 DOI: 10.1101/gr.1642804] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Prediction of transcription-factor target sites in promoters remains difficult due to the short length and degeneracy of the target sequences. Although the use of orthologous sequences and phylogenetic footprinting approaches may help in the recognition of conserved and potentially functional sequences, correct alignment of the short transcription-factor binding sites can be problematic for established algorithms, especially when aligning more divergent species. Here, we report a novel phylogenetic footprinting approach, CONREAL, that uses biologically relevant information, that is, potential transcription-factor binding sites as represented by positional weight matrices, to establish anchors between orthologous sequences and to guide promoter sequence alignment. Comparison of the performance of CONREAL with the global alignment programs LAGAN and AVID using a reference data set, shows that CONREAL performs equally well for closely related species like rodents and human, and has a clear added value for aligning promoter elements of more divergent species like human and fish, as it identifies conserved transcription-factor binding sites that are not found by other methods. CONREAL is accessible via a Web interface at http://conreal.niob.knaw.nl/.
Collapse
Affiliation(s)
- Eugene Berezikov
- Hubrecht Laboratory, Netherlands Institute for Developmental Biology, 3584 CT, Utrecht, The Netherlands.
| | | | | | | |
Collapse
|
1166
|
Ardell DH, Lozupone CA, Landweber LF. Polymorphism, Recombination and Alternative Unscrambling in the DNA Polymerase α Gene of the Ciliate Stylonychia lemnae (Alveolata; class Spirotrichea). Genetics 2003; 165:1761-77. [PMID: 14704164 PMCID: PMC1462920 DOI: 10.1093/genetics/165.4.1761] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Abstract
DNA polymerase α is the most highly scrambled gene known in stichotrichous ciliates. In its hereditary micronuclear form, it is broken into >40 pieces on two loci at least 3 kb apart. Scrambled genes must be reassembled through developmental DNA rearrangements to yield functioning macronuclear genes, but the mechanism and accuracy of this process are unknown. We describe the first analysis of DNA polymorphism in the macronuclear version of any scrambled gene. Six functional haplotypes obtained from five Eurasian strains of Stylonychia lemnae were highly polymorphic compared to Drosophila genes. Another incompletely unscrambled haplotype was interrupted by frameshift and nonsense mutations but contained more silent mutations than expected by allelic inactivation. In our sample, nucleotide diversity and recombination signals were unexpectedly high within a region encompassing the boundary of the two micronuclear loci. From this and other evidence we infer that both members of a long repeat at the ends of the loci provide alternative substrates for unscrambling in this region. Incongruent genealogies and recombination patterns were also consistent with separation of the two loci by a large genetic distance. Our results suggest that ciliate developmental DNA rearrangements may be more probabilistic and error prone than previously appreciated and constitute a potential source of macronuclear variation. From this perspective we introduce the nonsense-suppression hypothesis for the evolution of ciliate altered genetic codes. We also introduce methods and software to calculate the likelihood of hemizygosity in ciliate haplotype samples and to correct for multiple comparisons in sliding-window analyses of Tajima's D.
Collapse
Affiliation(s)
- David H Ardell
- Department of Molecular Evolution, Evolutionary Biology Center, Uppsala University, SE-752 36 Uppsala, Sweden.
| | | | | |
Collapse
|
1167
|
Rockman MV, Hahn MW, Soranzo N, Goldstein DB, Wray GA. Positive Selection on a Human-Specific Transcription Factor Binding Site Regulating IL4 Expression. Curr Biol 2003; 13:2118-23. [PMID: 14654003 DOI: 10.1016/j.cub.2003.11.025] [Citation(s) in RCA: 105] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
A single nucleotide polymorphism in the promoter of the multifunctional cytokine Interleukin 4 (IL4) affects the binding of NFAT, a key transcriptional activator of IL4 in T cells. This regulatory polymorphism influences the balance of cytokine signaling in the immune system, with important consequences-positive and negative-for human health. We determined that the NFAT binding site is unique to humans; it arose by point mutation along the lineage separating humans from other great apes. We show that its frequency distribution among human subpopulations has been shaped by the balance of selective forces on IL4's diverse roles. New statistical approaches, based on parametric and nonparametric comparisons to neutral variants typed in the same individuals, indicate that differentiation among subpopulations at the IL4 promoter polymorphism is too great to be attributed to neutral drift. The allele frequencies of this binding site represent local adaptation to diverse pathogenic challenges; disease states associated with the common derived allele are side-effects of positive selection on other IL4 functions.
Collapse
Affiliation(s)
- Matthew V Rockman
- Department of Biology, Duke University, Box 90338, Durham, NC 27708, USA.
| | | | | | | | | |
Collapse
|
1168
|
Stein LD, Bao Z, Blasiar D, Blumenthal T, Brent MR, Chen N, Chinwalla A, Clarke L, Clee C, Coghlan A, Coulson A, D'Eustachio P, Fitch DHA, Fulton LA, Fulton RE, Griffiths-Jones S, Harris TW, Hillier LW, Kamath R, Kuwabara PE, Mardis ER, Marra MA, Miner TL, Minx P, Mullikin JC, Plumb RW, Rogers J, Schein JE, Sohrmann M, Spieth J, Stajich JE, Wei C, Willey D, Wilson RK, Durbin R, Waterston RH. The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. PLoS Biol 2003; 1:E45. [PMID: 14624247 PMCID: PMC261899 DOI: 10.1371/journal.pbio.0000045] [Citation(s) in RCA: 666] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2003] [Accepted: 09/04/2003] [Indexed: 11/19/2022] Open
Abstract
The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs) known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp) and C. elegans (100.3 Mbp) genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C. briggsae, we found strong evidence for 1,300 new C. elegans genes. In addition, comparisons of the two genomes will help to understand the evolutionary forces that mold nematode genomes. With the Caenorhabditis briggsae genome now in hand, C. elegans biologists have a powerful new research tool to refine their knowledge of gene function in C. elegans and to study the path of genome evolution
Collapse
MESH Headings
- Animals
- Biological Evolution
- Caenorhabditis/genetics
- Caenorhabditis elegans/genetics
- Chromosome Mapping
- Chromosomes, Artificial, Bacterial
- Cluster Analysis
- Codon
- Conserved Sequence
- Evolution, Molecular
- Exons
- Gene Library
- Genome
- Genomics/methods
- Interspersed Repetitive Sequences
- Introns
- MicroRNAs/genetics
- Models, Genetic
- Models, Statistical
- Molecular Sequence Data
- Multigene Family
- Open Reading Frames
- Physical Chromosome Mapping
- Plasmids/metabolism
- Protein Structure, Tertiary
- Proteins/chemistry
- RNA/chemistry
- RNA, Ribosomal/genetics
- RNA, Spliced Leader
- RNA, Transfer/genetics
- Sequence Analysis, DNA
- Species Specificity
Collapse
Affiliation(s)
- Lincoln D Stein
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA..
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1169
|
Coser KR, Chesnes J, Hur J, Ray S, Isselbacher KJ, Shioda T. Global analysis of ligand sensitivity of estrogen inducible and suppressible genes in MCF7/BUS breast cancer cells by DNA microarray. Proc Natl Acad Sci U S A 2003; 100:13994-9. [PMID: 14610279 PMCID: PMC283534 DOI: 10.1073/pnas.2235866100] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
To obtain comprehensive information on 17beta-estradiol (E2) sensitivity of genes that are inducible or suppressible by this hormone, we designed a method that determines ligand sensitivities of large numbers of genes by using DNA microarray and a set of simple Perl computer scripts implementing the standard metric statistics. We used it to characterize effects of low (0-100 pM) concentrations of E2 on the transcriptome profile of MCF7/BUS human breast cancer cells, whose E2 dose-dependent growth curve saturated with 100 pM E2. Evaluation of changes in mRNA expression for all genes covered by the DNA microarray indicated that, at a very low concentration (10 pM), E2 suppressed approximately 3-5 times larger numbers of genes than it induced, whereas at higher concentrations (30-100 pM) it induced approximately 1.5-2 times more genes than it suppressed. Using clearly defined statistical criteria, E2-inducible genes were categorized into several classes based on their E2 sensitivities. This approach of hormone sensitivity analysis revealed that expression of two previously reported E2-inducible autocrine growth factors, transforming growth factor alpha and stromal cell-derived factor 1, was not affected by 100 pM and lower concentrations of E2 but strongly enhanced by 10 nM E2, which was far higher than the concentration that saturated the E2 dose-dependent growth curve of MCF7/BUS cells. These observations suggested that biological actions of E2 are derived from expression of multiple genes whose E2 sensitivities differ significantly and, hence, depend on the E2 concentration, especially when it is lower than the saturating level, emphasizing the importance of characterizing the ligand dose-dependent aspects of E2 actions.
Collapse
Affiliation(s)
- Kathryn R Coser
- Department of Tumor Biology and DNA Microarray Core Facility, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | | | | | | | | | | |
Collapse
|
1170
|
Cannon SB, Kozik A, Chan B, Michelmore R, Young ND. DiagHunter and GenoPix2D: programs for genomic comparisons, large-scale homology discovery and visualization. Genome Biol 2003; 4:R68. [PMID: 14519203 PMCID: PMC328457 DOI: 10.1186/gb-2003-4-10-r68] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2003] [Revised: 06/19/2003] [Accepted: 08/08/2003] [Indexed: 11/10/2022] Open
Abstract
The DiagHunter and GenoPix2D applications work together to enable genomic comparisons and exploration at both genome-wide and single-gene scales. DiagHunter identifies homologous regions (synteny blocks) within or between genomes. GenoPix2D allows interactive display of synteny blocks and other genomic features, as well as querying by annotation and by sequence similarity. The DiagHunter and GenoPix2D applications work together to enable genomic comparisons and exploration at both genome-wide and single-gene scales. DiagHunter identifies homologous regions (synteny blocks) within or between genomes. DiagHunter works efficiently with diverse, large datasets to predict extended and interrupted synteny blocks and to generate graphical and text output quickly. GenoPix2D allows interactive display of synteny blocks and other genomic features, as well as querying by annotation and by sequence similarity.
Collapse
Affiliation(s)
- Steven B Cannon
- Plant Biology Department, University of Minnesota, St Paul, MN 55108, USA.
| | | | | | | | | |
Collapse
|
1171
|
Cannon SB, Young ND. OrthoParaMap: distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies. BMC Bioinformatics 2003; 4:35. [PMID: 12952558 PMCID: PMC200972 DOI: 10.1186/1471-2105-4-35] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2003] [Accepted: 09/02/2003] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In eukaryotic genomes, most genes are members of gene families. When comparing genes from two species, therefore, most genes in one species will be homologous to multiple genes in the second. This often makes it difficult to distinguish orthologs (separated through speciation) from paralogs (separated by other types of gene duplication). Combining phylogenetic relationships and genomic position in both genomes helps to distinguish between these scenarios. This kind of comparison can also help to describe how gene families have evolved within a single genome that has undergone polyploidy or other large-scale duplications, as in the case of Arabidopsis thaliana - and probably most plant genomes. RESULTS We describe a suite of programs called OrthoParaMap (OPM) that makes genomic comparisons, identifies syntenic regions, determines whether sets of genes in a gene family are related through speciation or internal chromosomal duplications, maps this information onto phylogenetic trees, and infers internal nodes within the phylogenetic tree that may represent local - as opposed to speciation or segmental - duplication. We describe the application of the software using three examples: the melanoma-associated antigen (MAGE) gene family on the X chromosomes of mouse and human; the 20S proteasome subunit gene family in Arabidopsis, and the major latex protein gene family in Arabidopsis. CONCLUSION OPM combines comparative genomic positional information and phylogenetic reconstructions to identify which gene duplications are likely to have arisen through internal genomic duplications (such as polyploidy), through speciation, or through local duplications (such as unequal crossing-over). The software is freely available at http://www.tc.umn.edu/~cann0010/.
Collapse
Affiliation(s)
- Steven B Cannon
- Plant Biology Department, University of Minnesota, St. Paul, MN 55108, USA
| | - Nevin D Young
- Plant Biology Department, University of Minnesota, St. Paul, MN 55108, USA
- Plant Pathology Department, University of Minnesota, St. Paul, MN 55108, USA
| |
Collapse
|
1172
|
Bowen NJ, Jordan IK, Epstein JA, Wood V, Levin HL. Retrotransposons and their recognition of pol II promoters: a comprehensive survey of the transposable elements from the complete genome sequence of Schizosaccharomyces pombe. Genome Res 2003; 13:1984-97. [PMID: 12952871 PMCID: PMC403668 DOI: 10.1101/gr.1191603] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The complete DNA sequence of the genome of Schizosaccharomyces pombe provides the opportunity to investigate the entire complement of transposable elements (TEs), their association with specific sequences, their chromosomal distribution, and their evolution. Using homology-based sequence identification, we found that the sequenced strain of S. pombe contained only one family of full-length transposons. This family, Tf2, consisted of 13 full-length copies of a long terminal repeat (LTR) retrotransposon. We found that LTR-LTR recombination of previously existing transposons had resulted in extensive populations of solo LTRs. These included 35 solo LTRs of Tf2, as well as 139 solo LTRs from other Tf families. Phylogenetic analysis of solo Tf LTRs reveals that Tf1 and Tf2 were the most recently active elements within the genome. The solo LTRs also served as footprints for previous insertion events by the Tf retrotransposons. Analysis of 186 genomic insertion events revealed a close association with RNA polymerase II promoters. These insertions clustered in the promoter-proximal regions of genes, upstream of protein coding regions by 100 to 400 nucleotides. The association of Tf insertions with pol II promoters was very similar to the preference previously observed for Tf1 integration. We found that the recently active Tf elements were absent from centromeres and pericentromeric regions of the genome containing tandem tRNA gene clusters. In addition, our analysis revealed that chromosome III has twice the density of insertion events compared to the other two chromosomes. Finally we describe a novel repetitive sequence, wtf, which was also preferentially located on chromosome III, and was often located near solo LTRs of Tf elements.
Collapse
Affiliation(s)
- Nathan J Bowen
- Section on Eukaryotic Transposable Elements, Laboratory of Gene Regulation and Development, National Institute of Child Health and Human Development (NICHD), National Institutes of Health (NIH), Bethesda, Maryland 20892, USA
| | | | | | | | | |
Collapse
|
1173
|
Carlson CM, Dupuy AJ, Fritz S, Roberg-Perez KJ, Fletcher CF, Largaespada DA. Transposon Mutagenesis of the Mouse Germline. Genetics 2003; 165:243-56. [PMID: 14504232 PMCID: PMC1462753 DOI: 10.1093/genetics/165.1.243] [Citation(s) in RCA: 103] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Abstract
Sleeping Beauty is a synthetic “cut-and-paste” transposon of the Tc1/mariner class. The Sleeping Beauty transposase (SB) was constructed on the basis of a consensus sequence obtained from an alignment of 12 remnant elements cloned from the genomes of eight different fish species. Transposition of Sleeping Beauty elements has been observed in cultured cells, hepatocytes of adult mice, one-cell mouse embryos, and the germline of mice. SB has potential as a random germline insertional mutagen useful for in vivo gene trapping in mice. Previous work in our lab has demonstrated transposition in the male germline of mice and transmission of novel inserted transposons in offspring. To determine sequence preferences and mutagenicity of SB-mediated transposition, we cloned and analyzed 44 gene-trap transposon insertion sites from a panel of 30 mice. The distribution and sequence content flanking these cloned insertion sites was compared to 44 mock insertion sites randomly selected from the genome. We find that germline SB transposon insertion sites are AT-rich and the sequence ANNTANNT is favored compared to other TA dinucleotides. Local transposition occurs with insertions closely linked to the donor site roughly one-third of the time. We find that ∼27% of the transposon insertions are in transcription units. Finally, we characterize an embryonic lethal mutation caused by endogenous splicing disruption in mice carrying a particular intron-inserted gene-trap transposon.
Collapse
Affiliation(s)
- Corey M Carlson
- The Arnold and Mabel Beckman Center for Transposon Research, Institute of Human Genetics, Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis 55455, USA
| | | | | | | | | | | |
Collapse
|
1174
|
Hoon S, Ratnapu KK, Chia JM, Kumarasamy B, Juguang X, Clamp M, Stabenau A, Potter S, Clarke L, Stupka E. Biopipe: a flexible framework for protocol-based bioinformatics analysis. Genome Res 2003; 13:1904-15. [PMID: 12869579 PMCID: PMC403782 DOI: 10.1101/gr.1363103] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We identify several challenges facing bioinformatics analysis today. Firstly, to fulfill the promise of comparative studies, bioinformatics analysis will need to accommodate different sources of data residing in a federation of databases that, in turn, come in different formats and modes of accessibility. Secondly, the tsunami of data to be handled will require robust systems that enable bioinformatics analysis to be carried out in a parallel fashion. Thirdly, the ever-evolving state of bioinformatics presents new algorithms and paradigms in conducting analysis. This means that any bioinformatics framework must be flexible and generic enough to accommodate such changes. In addition, we identify the need for introducing an explicit protocol-based approach to bioinformatics analysis that will lend rigorousness to the analysis. This makes it easier for experimentation and replication of results by external parties. Biopipe is designed in an effort to meet these goals. It aims to allow researchers to focus on protocol design. At the same time, it is designed to work over a compute farm and thus provides high-throughput performance. A common exchange format that encapsulates the entire protocol in terms of the analysis modules, parameters, and data versions has been developed to provide a powerful way in which to distribute and reproduce results. This will enable researchers to discuss and interpret the data better as the once implicit assumptions are now explicitly defined within the Biopipe framework.
Collapse
Affiliation(s)
- Shawn Hoon
- Institute of Molecular and Cell Biology, National University of Singapore, Singapore 117609
| | | | | | | | | | | | | | | | | | | |
Collapse
|
1175
|
Saunders NFW, Thomas T, Curmi PMG, Mattick JS, Kuczek E, Slade R, Davis J, Franzmann PD, Boone D, Rusterholtz K, Feldman R, Gates C, Bench S, Sowers K, Kadner K, Aerts A, Dehal P, Detter C, Glavina T, Lucas S, Richardson P, Larimer F, Hauser L, Land M, Cavicchioli R. Mechanisms of thermal adaptation revealed from the genomes of the Antarctic Archaea Methanogenium frigidum and Methanococcoides burtonii. Genome Res 2003; 13:1580-8. [PMID: 12805271 PMCID: PMC403754 DOI: 10.1101/gr.1180903] [Citation(s) in RCA: 161] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We generated draft genome sequences for two cold-adapted Archaea, Methanogenium frigidum and Methanococcoides burtonii, to identify genotypic characteristics that distinguish them from Archaea with a higher optimal growth temperature (OGT). Comparative genomics revealed trends in amino acid and tRNA composition, and structural features of proteins. Proteins from the cold-adapted Archaea are characterized by a higher content of noncharged polar amino acids, particularly Gln and Thr and a lower content of hydrophobic amino acids, particularly Leu. Sequence data from nine methanogen genomes (OGT 15 degrees -98 degrees C) were used to generate 1111 modeled protein structures. Analysis of the models from the cold-adapted Archaea showed a strong tendency in the solvent-accessible area for more Gln, Thr, and hydrophobic residues and fewer charged residues. A cold shock domain (CSD) protein (CspA homolog) was identified in M. frigidum, two hypothetical proteins with CSD-folds in M. burtonii, and a unique winged helix DNA-binding domain protein in M. burtonii. This suggests that these types of nucleic acid binding proteins have a critical role in cold-adapted Archaea. Structural analysis of tRNA sequences from the Archaea indicated that GC content is the major factor influencing tRNA stability in hyperthermophiles, but not in the psychrophiles, mesophiles or moderate thermophiles. Below an OGT of 60 degrees C, the GC content in tRNA was largely unchanged, indicating that any requirement for flexibility of tRNA in psychrophiles is mediated by other means. This is the first time that comparisons have been performed with genome data from Archaea spanning the growth temperature extremes from psychrophiles to hyperthermophiles.
Collapse
Affiliation(s)
- Neil F W Saunders
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW 2052, Australia
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1176
|
Nair R, Rost B. LOC3D: annotate sub-cellular localization for protein structures. Nucleic Acids Res 2003; 31:3337-40. [PMID: 12824321 PMCID: PMC168921 DOI: 10.1093/nar/gkg514] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
LOC3D (http://cubic.bioc.columbia.edu/db/LOC3d/) is both a weekly-updated database and a web server for predictions of sub-cellular localization for eukaryotic proteins of known three-dimensional (3D) structure. Localization is predicted using four different methods: (i) PredictNLS, prediction of nuclear proteins through nuclear localization signals; (ii) LOChom, inferring localization through sequence homology; (iii) LOCkey, inferring localization through automatic text analysis of SWISS-PROT keywords; and (iv) LOC3Dini, ab initio prediction through a system of neural networks and vector support machines. The final prediction is based on the method that predicts localization with the highest confidence. The LOC3D database currently contains predictions for >8700 eukaryotic protein chains taken from the Protein Data Bank (PDB). The web server can be used to predict sub-cellular localization for proteins for which only a predicted structure is available from threading servers. This makes the resource of particular interest to structural genomics initiatives.
Collapse
Affiliation(s)
- Rajesh Nair
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA.
| | | |
Collapse
|
1177
|
Sandelin A, Höglund A, Lenhard B, Wasserman WW. Integrated analysis of yeast regulatory sequences for biologically linked clusters of genes. Funct Integr Genomics 2003; 3:125-34. [PMID: 12827523 DOI: 10.1007/s10142-003-0086-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2002] [Revised: 04/07/2003] [Accepted: 04/29/2003] [Indexed: 10/26/2022]
Abstract
Dramatic progress in deciphering the regulatory controls in Saccharomyces cerevisiae has been enabled by the fusion of high-throughput genomics technologies with advanced sequence analysis algorithms. Sets of genes likely to function together and with similar expression profiles have been identified in diverse studies. By fusing an advanced pattern recognition algorithm for identification of transcription factor binding sites with a new method for the quantitative comparison of binding properties of transcription factors, we provide an integrated means to move from expression data to biological insights. The Yeast Regulatory Sequence Analysis system, YRSA, combines standard functions with a novel pattern characterization procedure in an intuitive interface designed for use by a broad range of scientists. The features of the system include automated retrieval of user-defined promoter sequences, binding site discovery by pattern recognition, graphical displays of the observed pattern and positions of similar sequences in the specified genes, and comparison of the new pattern against a collection of binding patterns for characterized transcription factors. The comprehensive YRSA system was used to study the regulatory mechanisms of yeast regulons. Analysis of the regulatory controls of a battery of genes induced by DNA damaging agents supports a putative mediating role for the cell-cycle checkpoint regulatory element MCB. YRSA is available at http://yrsa.cgb.ki.se. [YRSA: ancient Scandinavian name meaning old she-bear (Latin Ursus arctos = brown bear/grizzly).]
Collapse
Affiliation(s)
- Albin Sandelin
- Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden
| | | | | | | |
Collapse
|
1178
|
Abstract
Miniature inverted repeat transposable elements (MITEs) are ubiquitous and numerous in higher eukaryotic genomes. Analysis of MITE families is laborious and time consuming, especially when multiple MITE families are involved in the study. Based on the structural characteristics of MITEs and genetic principles for transposable elements (TEs), we have developed a computational tool kit named MITE analysis kit (MAK) to automate the processes (http://perl.idmb.tamu.edu/mak.htm). In addition to its ability to routinely retrieve family member sequences and to report the positions of these elements relative to the closest neighboring genes, MAK is a powerful tool for revealing anchor elements that link MITE families to known transposable element families. Implementation of the MAK is described, as are genetic principles and algorithms used in its derivation. Test runs of the programs for several MITE families yielded anchor sequences that retain TIRs and coding regions reminiscent of transposases. These anchor sequences are consistent with previously reported putative autonomous elements for these MITE families. Furthermore, analysis of two MITE families with no known links to any transposon family revealed two novel transposon families, namely Math and Kid, belonging to the IS5/Harbinger/PIF superfamily.
Collapse
Affiliation(s)
- Guojun Yang
- Institute of Developmental and Molecular Biology and Department of Biology, Texas A&M University,College Station, TX 77843-3155, USA
| | | |
Collapse
|
1179
|
Lenhard B, Wahlestedt C, Wasserman WW. GeneLynx mouse: integrated portal to the mouse genome. Genome Res 2003; 13:1501-4. [PMID: 12819149 PMCID: PMC403699 DOI: 10.1101/gr.951403] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
GeneLynx Mouse is a meta-database providing an extensive collection of hyperlinks to mouse gene-specific information in diverse databases available via the Internet. The GeneLynx project is based on the simple notion that given any gene-specific identifier (e.g., accession number, gene name, text, or sequence), scientists should be able to access a single location that provides a set of links to all the publicly available information pertinent to the specified gene. The recent climax in the mouse genome and RIKEN cDNA sequencing projects provided the data necessary for the development of a gene-centric mouse information portal based on the GeneLynx ideals. Clusters of RIKEN cDNA sequences were used to define the initial set of mouse genes. Like its human counterpart, GeneLynx Mouse is designed as an extensible relational database with an intuitive and user-friendly Web interface. Data is automatically extracted from diverse resources, using appropriate approaches to maximize the coverage. To promote cross-database interoperability, an indexing utility is provided to facilitate the establishment of hyperlinks in external databases. As a result of the integration of the human and mouse systems, GeneLynx now serves as a powerful comparative genomics data mining resource. GeneLynx Mouse can be freely accessed at http://mouse.genelynx.org.
Collapse
Affiliation(s)
- Boris Lenhard
- Center for Genomics and Bioinformatics, Karolinska Institutet, 17177 Stockholm, Sweden.
| | | | | |
Collapse
|
1180
|
Navarro JD, Niranjan V, Peri S, Jonnalagadda CK, Pandey A. From biological databases to platforms for biomedical discovery. Trends Biotechnol 2003; 21:263-8. [PMID: 12788546 DOI: 10.1016/s0167-7799(03)00108-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The use of high-throughput DNA sequencing and proteomic methods has led to an unprecedented increase in the amount of genomic and proteomic data. Application of computing technologies and development of computational tools to analyze and present these data has not kept pace with the accumulation of information. Here, we discuss the use of different database systems to store biological information and mention some of the key emerging computing technologies that are likely to have a key role in the future of bioinformatics.
Collapse
Affiliation(s)
- J Daniel Navarro
- McKusick-Nathans Institute of Genetic Medicine and Dept. of Biological Chemistry, Johns Hopkins University, Baltimore, MD 21287, USA
| | | | | | | | | |
Collapse
|
1181
|
Schageman JJ, Ferguson DA, Zang Q, Spencer JA, Huff JW, Graff JM, Lian Y, Garner HR, Pertsemlidis A. Reading the fine print of the human genome. IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE : THE QUARTERLY MAGAZINE OF THE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY 2003; 22:105-8. [PMID: 12733468 DOI: 10.1109/memb.2003.1195706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Jeoffrey J Schageman
- McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX 75390-8591, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
1182
|
Pan X, Liu H, Clarke J, Jones J, Bevan M, Stein L. ATIDB: Arabidopsis thaliana insertion database. Nucleic Acids Res 2003; 31:1245-51. [PMID: 12582244 PMCID: PMC150240 DOI: 10.1093/nar/gkg222] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Insertional mutagenesis techniques, including transposon- and T-DNA-mediated mutagenesis, are key resources for systematic identification of gene function in the model plant species Arabidopsis thaliana. We have developed a database (http://atidb.cshl.org/) for archiving, searching and analyzing insertional mutagenesis lines. Flanking sequences from approximately 10 500 insertion lines (including transposon and T-DNA insertions) from several tagging programs in Arabidopsis were mapped to the genome sequence through our annotation system before being entered into the database. The database front end provides World Wide Web searching and analyzing interfaces for genome researchers and other biologists. Users can search the database to identify insertions in a particular gene or perform genome-wide analysis to study the distribution and preference of insertions. Tools integrated with the database include a graphical genome browser, a protein search function, a graphical representation of the insertion distribution and a Blast search function. The database is based on open source components and is available under an open source license.
Collapse
Affiliation(s)
- Xiaokang Pan
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | | | | | | | | | | |
Collapse
|
1183
|
Thorisson GA, Stein LD. The SNP Consortium website: past, present and future. Nucleic Acids Res 2003; 31:124-7. [PMID: 12519964 PMCID: PMC165499 DOI: 10.1093/nar/gkg052] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2002] [Accepted: 09/11/2002] [Indexed: 01/20/2023] Open
Abstract
The SNP Consortium website (http://snp.cshl.org) has undergone many changes since its initial conception three years ago. The database back end has been changed from the venerable ACeDB to the more scalable MySQL engine. Users can access the data via gene or single nucleotide polymorphism (SNP) keyword searches and browse or dump SNP data to textfiles. A graphical genome browsing interface shows SNPs mapped onto the genome assembly in the context of externally available gene predictions and other features. SNP allele frequency and genotype data are available via FTP-download and on individual SNP report web pages. SNP linkage maps are available for download and for browsing in a comparative map viewer. All software components of the data coordinating center (DCC) website (http://snp.cshl.org) are open source.
Collapse
|
1184
|
Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, Lewis S. The generic genome browser: a building block for a model organism system database. Genome Res 2002; 12:1599-610. [PMID: 12368253 PMCID: PMC187535 DOI: 10.1101/gr.403602] [Citation(s) in RCA: 852] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2002] [Accepted: 08/09/2002] [Indexed: 11/24/2022]
Abstract
The Generic Model Organism System Database Project (GMOD) seeks to develop reusable software components for model organism system databases. In this paper we describe the Generic Genome Browser (GBrowse), a Web-based application for displaying genomic annotations and other features. For the end user, features of the browser include the ability to scroll and zoom through arbitrary regions of a genome, to enter a region of the genome by searching for a landmark or performing a full text search of all features, and the ability to enable and disable tracks and change their relative order and appearance. The user can upload private annotations to view them in the context of the public ones, and publish those annotations to the community. For the data provider, features of the browser software include reliance on readily available open source components, simple installation, flexible configuration, and easy integration with other components of a model organism system Web site. GBrowse is freely available under an open source license. The software, its documentation, and support are available at http://www.gmod.org.
Collapse
Affiliation(s)
- Lincoln D Stein
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11790, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1185
|
Mungall CJ, Misra S, Berman BP, Carlson J, Frise E, Harris N, Marshall B, Shu S, Kaminker JS, Prochnik SE, Smith CD, Smith E, Tupy JL, Wiel C, Rubin GM, Lewis SE. An integrated computational pipeline and database to support whole-genome sequence annotation. Genome Biol 2002; 3:RESEARCH0081. [PMID: 12537570 PMCID: PMC151183 DOI: 10.1186/gb-2002-3-12-research0081] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2002] [Accepted: 11/28/2002] [Indexed: 01/02/2023] Open
Abstract
We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture.
Collapse
Affiliation(s)
- C J Mungall
- Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|