201
|
Zafar N, Mazumder R, Seto D. CoreGenes: a computational tool for identifying and cataloging "core" genes in a set of small genomes. BMC Bioinformatics 2002; 3:12. [PMID: 11972896 PMCID: PMC111185 DOI: 10.1186/1471-2105-3-12] [Citation(s) in RCA: 111] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2001] [Accepted: 04/24/2002] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Improvements in DNA sequencing technology and methodology have led to the rapid expansion of databases comprising DNA sequence, gene and genome data. Lower operational costs and heightened interest resulting from initial intriguing novel discoveries from genomics are also contributing to the accumulation of these data sets. A major challenge is to analyze and to mine data from these databases, especially whole genomes. There is a need for computational tools that look globally at genomes for data mining. RESULTS CoreGenes is a global JAVA-based interactive data mining tool that identifies and catalogs a "core" set of genes from two to five small whole genomes simultaneously. CoreGenes performs hierarchical and iterative BLASTP analyses using one genome as a reference and another as a query. Subsequent query genomes are compared against each newly generated "consensus." These iterations lead to a matrix comprising related genes from this set of genomes, e. g., viruses, mitochondria and chloroplasts. Currently the software is limited to small genomes on the order of 330 kilobases or less. CONCLUSION A computational tool CoreGenes has been developed to analyze small whole genomes globally. BLAST score-related and putatively essential "core" gene data are displayed as a table with links to GenBank for further data on the genes of interest. This web resource is available at http://pumpkins.ib3.gmu.edu:8080/CoreGenes or http://www.bif.atcc.org/CoreGenes.
Collapse
Affiliation(s)
- Nikhat Zafar
- School of Computational Sciences, George Mason University, 10900 University Boulevard, MSN 4E3, Manassas, VA 20110 USA
| | - Raja Mazumder
- School of Computational Sciences, George Mason University, 10900 University Boulevard, MSN 4E3, Manassas, VA 20110 USA
| | - Donald Seto
- School of Computational Sciences, George Mason University, 10900 University Boulevard, MSN 4E3, Manassas, VA 20110 USA
- Center for Biomedical Genomics and Informatics, College of Arts and Sciences, George Mason University, 10900 University Boulevard, MSN 4E3, Manassas, VA 20110 USA
| |
Collapse
|
202
|
Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 2002; 30:1575-84. [PMID: 11917018 PMCID: PMC101833 DOI: 10.1093/nar/30.7.1575] [Citation(s) in RCA: 2415] [Impact Index Per Article: 105.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Detection of protein families in large databases is one of the principal research objectives in structural and functional genomics. Protein family classification can significantly contribute to the delineation of functional diversity of homologous proteins, the prediction of function based on domain architecture or the presence of sequence motifs as well as comparative genomics, providing valuable evolutionary insights. We present a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families. The method relies on the Markov cluster (MCL) algorithm for the assignment of proteins into families based on precomputed sequence similarity information. This novel approach does not suffer from the problems that normally hinder other protein sequence clustering algorithms, such as the presence of multi-domain proteins, promiscuous domains and fragmented proteins. The method has been rigorously tested and validated on a number of very large databases, including SwissProt, InterPro, SCOP and the draft human genome. Our results indicate that the method is ideally suited to the rapid and accurate detection of protein families on a large scale. The method has been used to detect and categorise protein families within the draft human genome and the resulting families have been used to annotate a large proportion of human proteins.
Collapse
Affiliation(s)
- A J Enright
- Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
203
|
De Las Rivas J, Lozano JJ, Ortiz AR. Comparative analysis of chloroplast genomes: functional annotation, genome-based phylogeny, and deduced evolutionary patterns. Genome Res 2002; 12:567-83. [PMID: 11932241 PMCID: PMC187516 DOI: 10.1101/gr.209402] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
All protein sequences from 19 complete chloroplast genomes (cpDNA) have been studied using a new computational method able to analyze functional correlations among series of protein sequences contained in complete proteomes. First, all open reading frames (ORFs) from the cpDNAs, comprising a total of 2266 protein sequences, were compared against the 3168 proteins from Synechocystis PCC6803 complete genome to find functionally related orthologous proteins. Additionally, all cpDNA genomes were pairwise compared to find orthologous groups not present in cyanobacteria. Annotations in the cluster of othologous proteins database and CyanoBase were used as reference for the functional assignments. Following this protocol, new functional assignments were made for ORFs of unknown function and for ycfs (hypothetical chloroplast frames), which still lack a functional assignment. Using this information, a matrix of functional relationships was derived from profiles of the presence and/or absence of orthologous proteins; the matrix included 1837 proteins in 277 orthologous clusters. A factor analysis study of this matrix, followed by cluster analysis, allowed us to obtain accurate phylogenetic reconstructions and the detection of genes probably involved in speciation as phylogenetic correlates. Finally, by grouping common evolutionary patterns, we show that it is possible to determine functionally linked protein networks. This has allowed us to suggest putative associations for some unknown ORFs.
Collapse
Affiliation(s)
- Javier De Las Rivas
- Instituto de Recursos Naturales y Agrobiologia, Consejo Superior de Investigaciones Cientificas, 37071 Salamanca, Spain
| | | | | |
Collapse
|
204
|
Koonin EV, Aravind L. Origin and evolution of eukaryotic apoptosis: the bacterial connection. Cell Death Differ 2002; 9:394-404. [PMID: 11965492 DOI: 10.1038/sj.cdd.4400991] [Citation(s) in RCA: 291] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2001] [Accepted: 11/21/2001] [Indexed: 11/09/2022] Open
Abstract
The availability of numerous complete genome sequences of prokaryotes and several eukaryotic genome sequences provides for new insights into the origin of unique functional systems of the eukaryotes. Several key enzymes of the apoptotic machinery, including the paracaspase and metacaspase families of the caspase-like protease superfamily, apoptotic ATPases and NACHT family NTPases, and mitochondrial HtrA-like proteases, have diverse homologs in bacteria, but not in archaea. Phylogenetic analysis strongly suggests a mitochondrial origin for metacaspases and the HtrA-like proteases, whereas acquisition from Actinomycetes appears to be the most likely scenario for AP-ATPases. The homologs of apoptotic proteins are particularly abundant and diverse in bacteria that undergo complex development, such as Actinomycetes, Cyanobacteria and alpha-proteobacteria, the latter being progenitors of the mitochondria. In these bacteria, the apoptosis-related domains typically form multidomain proteins, which are known or inferred to participate in signal transduction and regulation of gene expression. Some of these bacterial multidomain proteins contain fusions between apoptosis-related domains, such as AP-ATPase fused with a metacaspase or a TIR domain. Thus, bacterial homologs of eukaryotic apoptotic machinery components might functionally and physically interact with each other as parts of signaling pathways that remain to be investigated. An emerging scenario of the origin of the eukaryotic apoptotic system involves acquisition of several central apoptotic effectors as a consequence of mitochondrial endosymbiosis and probably also as a result of subsequent, additional horizontal gene transfer events, which was followed by recruitment of newly emerging eukaryotic domains as adaptors.
Collapse
Affiliation(s)
- E V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | |
Collapse
|
205
|
Abstract
Pathway reconstruction builds on genome and biochemical data with the aim of reconstructing higher level interactions between identified enzymes in a specific genome, in particular the different enzyme pathways (species or individual/patient). Metabolite flow in a pathway is analyzed by different tools, such as elementary mode analysis. This reveals key enzymes and pharmacological targets in the enzyme network. An overview of bioinformatic tools and algorithms for these tasks, application examples and recent results from these techniques are presented. Target selection, drug development and optimization can all be sped up using these approaches.
Collapse
|
206
|
Schmidt S, Bork P, Dandekar T. A versatile structural domain analysis server using profile weight matrices. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2002; 42:405-7. [PMID: 11911710 DOI: 10.1021/ci010374r] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The WEB tool "AnDom" assigns to a given protein sequence all experimentally determined structural domains contained within it, including multidomain and large proteins. The server uses profile specific matrices from custom generated multiple sequence alignments of all known SCOP domains (SCOP version 1.50). Prediction time is short allowing numerous applications for structural genomics including investigation of complex eucaryotic protein families. The WWW server is at http://www.bork.embl-heidelberg.de/AnDom, and profiles can be downloaded at ftp.bork.embl-heidelberg.de/pub/users/ schmidt/AnDom.
Collapse
|
207
|
Gustavsson P. Merging classical and modern genetic tools in the identification of disease genes. Ups J Med Sci 2002; 107:1-8. [PMID: 12296448 DOI: 10.3109/2000-1967-136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Peter Gustavsson
- Department of Genetics and Pathology, Uppsala University, Sweden.
| |
Collapse
|
208
|
Novatchkova M, Eisenhaber F. Can molecular mechanisms of biological processes be extracted from expression profiles? Case study: endothelial contribution to tumor-induced angiogenesis. Bioessays 2001; 23:1159-75. [PMID: 11746235 DOI: 10.1002/bies.10013] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Whereas the genome contains all potential developmental programs, expression profiles permit the determination of genes that are actively transcribed under defined physiological conditions. In this article, the idea of extracting biological mechanisms from expression data is tested. Molecular processes of the endothelial contribution to angiogenesis are derived from recently published expression profiles. The analysis reveals the sensitivity limits of experimental detection of transcriptional changes and how sequence-analytic techniques can help to identify the function of genes in question. We conclude that the transcripts (http://mendel.imp.univie.ac.at/SEQUENCES/TEMS/) found to be up-regulated in angiogenesis are involved in extracellular matrix remodeling, cellular migration, adhesion, cell-cell communication rather than in angiogenesis initiation or integrative control. Comparison with tissue-specific patterns of EST occurrence shows that, indeed, the presumptive tumor-specific endothelial markers are more generally expressed by cell types involved in migration and matrix remodeling processes. This exemplary study demonstrates how bioinformatics approaches can be helpful in deriving mechanistic information from diverse sources of experimental data.
Collapse
Affiliation(s)
- M Novatchkova
- Research Institute of Molecular Pathology, Vienna, Rep. Austria
| | | |
Collapse
|
209
|
Dandekar T, Du F, Schirmer RH, Schmidt S. Medical target prediction from genome sequence: combining different sequence analysis algorithms with expert knowledge and input from artificial intelligence approaches. COMPUTERS & CHEMISTRY 2001; 26:15-21. [PMID: 11765847 DOI: 10.1016/s0097-8485(01)00095-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
By exploiting the rapid increase in available sequence data, the definition of medically relevant protein targets has been improved by a combination of: (i) differential genome analysis (target list): and (ii) analysis of individual proteins (target analysis). Fast sequence comparisons, data mining, and genetic algorithms further promote these procedures. Mycobacterium tuberculosis proteins were chosen as applied examples.
Collapse
Affiliation(s)
- T Dandekar
- European Molecular Biology Laboratory, PO Box 102209, Meyerhostrasse 1, D-69012 Heidelberg, Germany.
| | | | | | | |
Collapse
|
210
|
Bobik TA, Rasche ME. Identification of the human methylmalonyl-CoA racemase gene based on the analysis of prokaryotic gene arrangements. Implications for decoding the human genome. J Biol Chem 2001; 276:37194-8. [PMID: 11481338 DOI: 10.1074/jbc.m107232200] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
In this report, we identify the human DL-methylmalonyl-CoA racemase gene by analyzing prokaryotic gene arrangements and extrapolating the information obtained to human genes by homology searches. Sequence similarity searches were used to identify two groups of homologues that were frequently arranged with prokaryotic methylmalonyl-CoA mutase genes, and that were of unknown function. Both gene groups had homologues in the human genome. Because methylmalonyl-CoA mutases are involved in the metabolism of propionyl-CoA, we inferred that conserved neighbors of methylmalonyl-CoA mutase genes and their human homologues were also involved in this process. Subsequent biochemical studies confirmed this inference by showing that the prokaryotic gene PH0272 and its human homologue both encode DL-methylmalonyl-CoA racemases. To our knowledge this is the first report in which the function of a eukaryotic gene was determined based on the analysis of prokaryotic gene arrangements. Importantly, such analyses are rapid and may be generally applicable for the identification of human genes that lack homologues of known function or that have been misidentified on the basis of sequence similarity searches.
Collapse
Affiliation(s)
- T A Bobik
- Department of Microbiology and Cell Science, University of Florida, Gainesville, Florida 32611, USA.
| | | |
Collapse
|
211
|
Schmid KJ, Aquadro CF. The evolutionary analysis of "orphans" from the Drosophila genome identifies rapidly diverging and incorrectly annotated genes. Genetics 2001; 159:589-98. [PMID: 11606536 PMCID: PMC1461820 DOI: 10.1093/genetics/159.2.589] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
In genome projects of eukaryotic model organisms, a large number of novel genes of unknown function and evolutionary history ("orphans") are being identified. Since many orphans have no known homologs in distant species, it is unclear whether they are restricted to certain taxa or evolve rapidly, either because of a lack of constraints or positive Darwinian selection. Here we use three criteria for the selection of putatively rapidly evolving genes from a single sequence of Drosophila melanogaster. Thirteen candidate genes were chosen from the Adh region on the second chromosome and 1 from the tip of the X chromosome. We succeeded in obtaining sequence from 6 of these in the closely related species D. simulans and D. yakuba. Only 1 of the 6 genes showed a large number of amino acid replacements and in-frame insertions/deletions. A population survey of this gene suggests that its rapid evolution is due to the fixation of many neutral or nearly neutral mutations. Two other genes showed "normal" levels of divergence between species. Four genes had insertions/deletions that destroy the putative reading frame within exons, suggesting that these exons have been incorrectly annotated. The evolutionary analysis of orphan genes in closely related species is useful for the identification of both rapidly evolving and incorrectly annotated genes.
Collapse
Affiliation(s)
- K J Schmid
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA.
| | | |
Collapse
|
212
|
Herniou EA, Olszewski JA, Cory JS, O'Reilly DR. The genome sequence and evolution of baculoviruses. ANNUAL REVIEW OF ENTOMOLOGY 2001; 48:211-234. [PMID: 12414741 DOI: 10.1146/annurev.ento.48.091801.112756] [Citation(s) in RCA: 333] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Comparative analysis of the complete genome sequences of 13 baculoviruses revealed a core set of 30 genes, 20 of which have known functions. Phylogenetic analyses of these 30 genes yielded a tree with 4 major groups: the genus Granulovirus (GVs), the group I and II lepidopteran nucleopolyhedroviruses (NPVs), and the dipteran NPV, CuniNPV. These major divisions within the family Baculoviridae were also supported by phylogenies based on gene content and gene order. Gene content mapping has revealed the patterns of gene acquisitions and losses that have taken place during baculovirus evolution, and it has highlighted the fluid nature of baculovirus genomes. The identification of shared protein phylogenetic profiles provided evidence for two putative DNA repair systems and for viral proteins specific for infection of lymantrid hosts. Examination of gene order conservation revealed a core gene cluster of four genes, helicase, lef-5, ac96, and 38K(ac98), whose relative positions are conserved in all baculovirus genomes.
Collapse
Affiliation(s)
- Elisabeth A Herniou
- Department of Biological Sciences, Imperial College of Science, Technology and Medicine, London SW7 2AZ, United Kingdom.
| | | | | | | |
Collapse
|
213
|
Abstract
The human genome sequence provides the framework for understanding the biology of human cell function. The next step is to intensify the investigation of protein function in the context of complex biological systems. Cellular functions are carried out by molecular complexes acting in concert rather than by single molecules or single reactions. Parallels have been drawn between scale-free nonbiologic networks and functionally interconnected metabolic pathways in the cell. Modeling of metabolic networks, in which functional modules or subnetworks represent individual related pathways, will lead to the prediction of protein function in the larger context of a complex system. Depending on the robustness of these metabolic networks, single-gene defects alone or in combination with other gene defects and the environment have the potential for invoking a spectrum of alterations in the integrity of a given network. The overall purpose of this review is to highlight the importance of simple heterozygosity for one pathogenic mutation or combinatorial heterozygosity for two or more mutations within or between individual genes in altering the stability of metabolic networks. Several forms of heterozygosity are considered, e.g., intra- and interallelic heterozygosity and double heterozygosity. The concepts of synergistic heterozygosity, loss of heterozygosity, and mitochondrial DNA heteroplasmy also are discussed in relation to the quantitative effects of coexisting mutations on the phenotypic expression of disease.
Collapse
Affiliation(s)
- G D Vladutiu
- Department of Pediatrics, Division of Genetics, School of Medicine & Biomedical Sciences, University at Buffalo, 936 Delaware Avenue, Buffalo, New York 14209, USA.
| |
Collapse
|
214
|
Aloy P, Querol E, Aviles FX, Sternberg MJ. Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. J Mol Biol 2001; 311:395-408. [PMID: 11478868 DOI: 10.1006/jmbi.2001.4870] [Citation(s) in RCA: 196] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
A major problem in genome annotation is whether it is valid to transfer the function from a characterised protein to a homologue of unknown activity. Here, we show that one can employ a strategy that uses a structure-based prediction of protein functional sites to assess the reliability of functional inheritance. We have automated and benchmarked a method based on the evolutionary trace approach. Using a multiple sequence alignment, we identified invariant polar residues, which were then mapped onto the protein structure. Spatial clusters of these invariant residues formed the predicted functional site. For 68 of 86 proteins examined, the method yielded information about the observed functional site. This algorithm for functional site prediction was then used to assess the validity of transferring the function between homologues. This procedure was tested on 18 pairs of homologous proteins with unrelated function and 70 pairs of proteins with related function, and was shown to be 94 % accurate. This automated method could be linked to schemes for genome annotation. Finally, we examined the use of functional site prediction in protein-protein and protein-DNA docking. The use of predicted functional sites was shown to filter putative docked complexes with a discrimination similar to that obtained by manually including biological information about active sites or DNA-binding residues.
Collapse
Affiliation(s)
- P Aloy
- Institut de Biologia Fonamental and Departament de Bioquimica, Universitat Autonoma de Barcelona, Bellaterra, Barcelona, 08193, Spain
| | | | | | | |
Collapse
|
215
|
Yano N, Habib NA, Fadden KJ, Yamashita H, Mitry R, Jauregui H, Kane A, Endoh M, Rifai A. Profiling the adult human liver transcriptome: analysis by cDNA array hybridization. J Hepatol 2001; 35:178-86. [PMID: 11580139 DOI: 10.1016/s0168-8278(01)00104-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
BACKGROUND/AIMS A comprehensive profile of genes expressed at the mRNA level (transcriptome) in human liver tissue is important for elucidating the pathogenesis and treatment of hepatic diseases. The recent development of cDNA array hybridization allows the parallel monitoring of thousands of genes expressed in a single organ. METHODS High-density microarrays containing 4043 known and unique human cDNA gene targets were used to quantitatively analyze the expression of genes in human livers. Expressed gene transcripts were classified by function and listed with information of their chromosomal positions. Computational analysis was used to cluster genes according to similarity in pattern of gene expression. RESULTS A total of 2418 unique gene transcripts were detected in five liver specimens. Through relational database analysis, we determined 1212 genes that were commonly expressed in 4 of the five liver specimens. Furthermore, analysis of the total 2418 expressed genes by self-organizing maps and hierarchical clustering unexpectedly revealed a genomic acute phase response in two of the liver specimens. CONCLUSIONS These findings represent a comprehensive preliminary molecular index of genes transcribed in the adult human liver. The information may serve as a resource for speeding up the discovery of genes underlying human hepatic diseases.
Collapse
Affiliation(s)
- N Yano
- Department of Pathology and Laboratory Medicine, Brown University School of Medicine, Providence, RI 02903, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
216
|
Yanai I, Derti A, DeLisi C. Genes linked by fusion events are generally of the same functional category: a systematic analysis of 30 microbial genomes. Proc Natl Acad Sci U S A 2001; 98:7940-5. [PMID: 11438739 PMCID: PMC35447 DOI: 10.1073/pnas.141236298] [Citation(s) in RCA: 126] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Recent work in computational genomics has shown that a functional association between two genes can be derived from the existence of a fusion of the two as one continuous sequence in another genome. For each of 30 completely sequenced microbial genomes, we established all such fusion links among its genes and determined the distribution of links within and among 15 broad functional categories. We found that 72% of all fusion links related genes of the same functional category. A comparison of the distribution of links to simulations on the basis of a random model further confirmed the significance of intracategory fusion links. Where a gene of annotated function is linked to an unclassified gene, the fusion link suggests that the two genes belong to the same functional category. The predictions based on fusion links are shown here for Methanobacterium thermoautotrophicum, and another 661 predictions are available at http://fusion.bu.edu.
Collapse
Affiliation(s)
- I Yanai
- Bioinformatics Graduate Program and Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | | | | |
Collapse
|
217
|
Kell DB, Darby RM, Draper J. Genomic computing. Explanatory analysis of plant expression profiling data using machine learning. PLANT PHYSIOLOGY 2001; 126:943-951. [PMID: 11457944 PMCID: PMC1540126 DOI: 10.1104/pp.126.3.943] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Affiliation(s)
- D B Kell
- of Biological Sciences, University of Wales, Aberystwyth SY23 3DD, United Kingdom
| | | | | |
Collapse
|
218
|
van Belkum A, Struelens M, de Visser A, Verbrugh H, Tibayrenc M. Role of genomic typing in taxonomy, evolutionary genetics, and microbial epidemiology. Clin Microbiol Rev 2001; 14:547-60. [PMID: 11432813 PMCID: PMC88989 DOI: 10.1128/cmr.14.3.547-560.2001] [Citation(s) in RCA: 126] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Currently, genetic typing of microorganisms is widely used in several major fields of microbiological research. Taxonomy, research aimed at elucidation of evolutionary dynamics or phylogenetic relationships, population genetics of microorganisms, and microbial epidemiology all rely on genetic typing data for discrimination between genotypes. Apart from being an essential component of these fundamental sciences, microbial typing clearly affects several areas of applied microbiological research. The epidemiological investigation of outbreaks of infectious diseases and the measurement of genetic diversity in relation to relevant biological properties such as pathogenicity, drug resistance, and biodegradation capacities are obvious examples. The diversity among nucleic acid molecules provides the basic information for all fields described above. However, researchers in various disciplines tend to use different vocabularies, a wide variety of different experimental methods to monitor genetic variation, and sometimes widely differing modes of data processing and interpretation. The aim of the present review is to summarize the technological and fundamental concepts used in microbial taxonomy, evolutionary genetics, and epidemiology. Information on the nomenclature used in the different fields of research is provided, descriptions of the diverse genetic typing procedures are presented, and examples of both conceptual and technological research developments for Escherichia coli are included. Recommendations for unification of the different fields through standardization of laboratory techniques are made.
Collapse
Affiliation(s)
- A van Belkum
- Department of Medical Microbiology & Infectious Diseases, Erasmus University Medical Center Rotterdam, 3015 GD Rotterdam, The Netherlands.
| | | | | | | | | |
Collapse
|
219
|
Abstract
Ligand-protein docking has been developed and used in facilitating new drug discoveries. In this approach, docking single or multiple small molecules to a receptor site is attempted to find putative ligands. A number of studies have shown that docking algorithms are capable of finding ligands and binding conformations at a receptor site close to experimentally determined structures. These algorithms are expected to be equally applicable to the identification of multiple proteins to which a small molecule can bind or weakly bind. We introduce a ligand-protein inverse-docking approach for finding potential protein targets of a small molecule by the computer-automated docking search of a protein cavity database. This database is developed from protein structures in the Protein Data Bank (PDB). Docking is conducted with a procedure involving multiple-conformer shape-matching alignment of a molecule to a cavity followed by molecular-mechanics torsion optimization and energy minimization on both the molecule and the protein residues at the binding region. Scoring is conducted by the evaluation of molecular-mechanics energy and, when applicable, by the further analysis of binding competitiveness against other ligands that bind to the same receptor site in at least one PDB entry. Testing results on two therapeutic agents, 4H-tamoxifen and vitamin E, showed that 50% of the computer-identified potential protein targets were implicated or confirmed by experiments. The application of this approach may facilitate the prediction of unknown and secondary therapeutic target proteins and those related to the side effects and toxicity of a drug or drug candidate. Proteins 2001;43:217-226.
Collapse
Affiliation(s)
- Y Z Chen
- Department of Computational Science, National University of Singapore, Blk S17, Level 7, 3 Science Drive 2, Singapore 117543.
| | | |
Collapse
|
220
|
Abstract
Ligand-protein docking has been developed and used in facilitating new drug discoveries. In this approach, docking single or multiple small molecules to a receptor site is attempted to find putative ligands. A number of studies have shown that docking algorithms are capable of finding ligands and binding conformations at a receptor site close to experimentally determined structures. These algorithms are expected to be equally applicable to the identification of multiple proteins to which a small molecule can bind or weakly bind. We introduce a ligand-protein inverse-docking approach for finding potential protein targets of a small molecule by the computer-automated docking search of a protein cavity database. This database is developed from protein structures in the Protein Data Bank (PDB). Docking is conducted with a procedure involving multiple-conformer shape-matching alignment of a molecule to a cavity followed by molecular-mechanics torsion optimization and energy minimization on both the molecule and the protein residues at the binding region. Scoring is conducted by the evaluation of molecular-mechanics energy and, when applicable, by the further analysis of binding competitiveness against other ligands that bind to the same receptor site in at least one PDB entry. Testing results on two therapeutic agents, 4H-tamoxifen and vitamin E, showed that 50% of the computer-identified potential protein targets were implicated or confirmed by experiments. The application of this approach may facilitate the prediction of unknown and secondary therapeutic target proteins and those related to the side effects and toxicity of a drug or drug candidate. Proteins 2001;43:217-226.
Collapse
Affiliation(s)
- Y Z Chen
- Department of Computational Science, National University of Singapore, Blk S17, Level 7, 3 Science Drive 2, Singapore 117543.
| | | |
Collapse
|
221
|
Iyer LM, Aravind L, Bork P, Hofmann K, Mushegian AR, Zhulin IB, Koonin EV. Quod erat demonstrandum? The mystery of experimental validation of apparently erroneous computational analyses of protein sequences. Genome Biol 2001; 2:RESEARCH0051. [PMID: 11790254 PMCID: PMC64836 DOI: 10.1186/gb-2001-2-12-research0051] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2001] [Revised: 09/07/2001] [Accepted: 10/04/2001] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Computational predictions are critical for directing the experimental study of protein functions. Therefore it is paradoxical when an apparently erroneous computational prediction seems to be supported by experiment. RESULTS We analyzed six cases where application of novel or conventional computational methods for protein sequence and structure analysis led to non-trivial predictions that were subsequently supported by direct experiments. We show that, on all six occasions, the original prediction was unjustified, and in at least three cases, an alternative, well-supported computational prediction, incompatible with the original one, could be derived. The most unusual cases involved the identification of an archaeal cysteinyl-tRNA synthetase, a dihydropteroate synthase and a thymidylate synthase, for which experimental verifications of apparently erroneous computational predictions were reported. Using sequence-profile analysis, multiple alignment and secondary-structure prediction, we have identified the unique archaeal 'cysteinyl-tRNA synthetase' as a homolog of extracellular polygalactosaminidases, and the 'dihydropteroate synthase' as a member of the beta-lactamase-like superfamily of metal-dependent hydrolases. CONCLUSIONS In each of the analyzed cases, the original computational predictions could be refuted and, in some instances, alternative strongly supported predictions were obtained. The nature of the experimental evidence that appears to support these predictions remains an open question. Some of these experiments might signify discovery of extremely unusual forms of the respective enzymes, whereas the results of others could be due to artifacts.
Collapse
Affiliation(s)
- Lakshminarayan M Iyer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Peer Bork
- EMBL, Biocomputing, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | | | - Arcady R Mushegian
- Stowers Institute for Medical Research, 1000 E 50th Street, Kansas City, MO 64410, USA
| | - Igor B Zhulin
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
222
|
Xie T, Ding D. Investigating 42 candidate orthologous protein groups by molecular evolutionary analysis on genome scale. Gene 2000; 261:305-10. [PMID: 11167018 DOI: 10.1016/s0378-1119(00)00506-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
It is one of key problems for comparative genomics to accurately identify orthologous genes/proteins. Here 42 quartettes of human, yeast Saccharomyces cerevisiae, nematode Caenorhabditis elegans, and fruit fly Drosophila melanogaster candidate orthologs, defined by using similarity-based highest hit criteria (Mushegian et al., 1998 Genome Res. 8: 590-598), were reconsidered according to molecular evolutionary analysis. We found that only 14 of the 42 candidate orthologous groups can be identified to have truly one-to-one orthologous relationships, whereas other groups were characterized by one (many)-to-many orthologous relationships or even more complex scenarios involving gene duplications and/or gene losses. The result could imply that the classical one-to-one orthology might be not as common as typically accepted and automated similarity-based methods should be used with caution when accurate orthology/paralogy discrimination is required.
Collapse
Affiliation(s)
- T Xie
- Shanghai Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 200031, Shanghai, China
| | | |
Collapse
|
223
|
Kyrpides NC, Ouzounis CA, Iliopoulos I, Vonstein V, Overbeek R. Analysis of the Thermotoga maritima genome combining a variety of sequence similarity and genome context tools. Nucleic Acids Res 2000; 28:4573-6. [PMID: 11071948 PMCID: PMC113882 DOI: 10.1093/nar/28.22.4573] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2000] [Revised: 10/03/2000] [Accepted: 10/03/2000] [Indexed: 11/12/2022] Open
Abstract
The proliferation of genome sequence data has led to the development of a number of tools and strategies that facilitate computational analysis. These methods include the identification of motif patterns, membership of the query sequences in family databases, metabolic pathway involvement and gene proximity. We re-examined the completely sequenced genome of Thermotoga maritima by employing the combined use of the above methods. By analyzing all 1877 proteins encoded in this genome, we identified 193 cases of conflicting annotations (10%), of which 164 are new function predictions and 29 are amendments of previously proposed assignments. These results suggest that the combined use of existing computational tools can resolve inconclusive sequence similarities and significantly improve the prediction of protein function from genome sequence.
Collapse
Affiliation(s)
- N C Kyrpides
- Integrated Genomics Inc., Chicago Technology Park, 2201 West Campbell Park Drive, Chicago, IL 60612, USA. Cambridge CB10 1SD, UK.
| | | | | | | | | |
Collapse
|
224
|
Krásný L, Vacík T, Fucík V, Jonák J. Cloning and characterization of the str operon and elongation factor Tu expression in Bacillus stearothermophilus. J Bacteriol 2000; 182:6114-22. [PMID: 11029432 PMCID: PMC94746 DOI: 10.1128/jb.182.21.6114-6122.2000] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The complete primary structure of the str operon of Bacillus stearothermophilus was determined. It was established that the operon is a five-gene transcriptional unit: 5'-ybxF (unknown function; homology to eukaryotic ribosomal protein L30)-rpsL (S12)-rpsG (S7)-fus (elongation factor G [EF-G])-tuf (elongation factor Tu [EF-Tu])-3'. The main operon promoter (strp) was mapped upstream of ybxF, and its strength was compared with the strength of the tuf-specific promoter (tufp) located in the fus-tuf intergenic region. The strength of the tufp region to initiate transcription is about 20-fold higher than that of the strp region, as determined in chloramphenicol acetyltransferase assays. Deletion mapping experiments revealed that the different strengths of the promoters are the consequence of a combined effect of oppositely acting cis elements, identified upstream of strp (an inhibitory region) and tufp (a stimulatory A/T-rich block). Our results suggest that the oppositely adjusted core promoters significantly contribute to the differential expression of the str operon genes, as monitored by the expression of EF-Tu and EF-G.
Collapse
Affiliation(s)
- L Krásný
- Department of Protein Biosynthesis, Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, 166 37 Prague 6, Czech Republic
| | | | | | | |
Collapse
|
225
|
Santucci A, Trabalzini L, Bovalini L, Ferro E, Neri P, Martelli P. Differences between predicted and observed sequences in Saccharomyces cerevisiae. Electrophoresis 2000; 21:3717-23. [PMID: 11271491 DOI: 10.1002/1522-2683(200011)21:17<3717::aid-elps3717>3.0.co;2-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
We recently studied the protein composition of a Saccharomyces cerevisiae wine yeast strain (K310) of enological interest. About 2,500 spots of 8-250 kDa observed molecular mass were resolved by two-dimensional gel electrophoresis. Experimental molecular masses and isoelectric points were calculated for most of them. Twenty-seven proteins were subjected to Edman microsequencing. N-terminal sequences of 12/27 proteins were determined, whereas internal sequences of 6/27 proteins were obtained following in situ proteolysis. Comparison between the experimental data and those reported in the SWISS-PROT database revealed some differences between genotypic and phenotypic sequences. These are indicative of the changes a protein can undergo with respect to the primary structure coded by the genomic DNA. Our results highlight the need to complement genomic analysis with detailed proteomics in order to refine the vast amount of information provided by DNA sequencing and to find an exact correlation between genome and proteome.
Collapse
Affiliation(s)
- A Santucci
- Dipartimento di Biologia Molecolare, Università degli Studi di Siena, Italy.
| | | | | | | | | | | |
Collapse
|
226
|
Abstract
Isoprenoid compounds are ubiquitous in living species and diverse in biological function. Isoprenoid side chains of the membrane lipids are biochemical markers distinguishing archaea from the rest of living forms. The mevalonate pathway of isoprenoid biosynthesis has been defined completely in yeast, while the alternative, deoxy-D-xylulose phosphate synthase pathway is found in many bacteria. In archaea, some enzymes of the mevalonate pathway are found, but the orthologs of three yeast proteins, accounting for the route from phosphomevalonate to geranyl pyrophosphate, are missing, as are the enzymes from the alternative pathway. To understand the evolution of isoprenoid biosynthesis, as well as the mechanism of lipid biosynthesis in archaea, sequence motifs in the known enzymes of the two pathways of isoprenoid biosynthesis were analyzed. New sequence relationships were detected, including similarities between diphosphomevalonate decarboxylase and kinases of the galactokinase superfamily, between the metazoan phosphomevalonate kinase and the nucleoside monophosphate kinase superfamily, and between isopentenyl pyrophosphate isomerases and MutT pyrophosphohydrolases. Based on these findings, orphan members of the galactokinase, nucleoside monophosphate kinase, and pyrophosphohydrolase families in archaeal genomes were evaluated as candidate enzymes for the three missing steps. Alternative methods of finding these missing links were explored, including physical linkage of open reading frames and patterns of ortholog distribution in different species. Combining these approaches resulted in the generation of a short list of 13 candidate genes for the three missing functions in archaea, whose participation in isoprenoid biosynthesis is amenable to biochemical and genetic investigation.
Collapse
Affiliation(s)
- A Smit
- Institute for Systems Biology, Seattle, Washington 98195, USA
| | | |
Collapse
|
227
|
Abstract
Operons, co-transcribed and co-regulated contiguous sets of genes, are poorly conserved over short periods of evolutionary time. The gene order, gene content and regulatory mechanisms of operons can be very different, even in closely related species. Here, we present several lines of evidence which suggest that, although an operon and its individual genes and regulatory structures are rearranged when comparing the genomes of different species, this rearrangement is a conservative process. Genomic rearrangements invariably maintain individual genes in very specific functional and regulatory contexts. We call this conserved context an uber-operon.
Collapse
Affiliation(s)
- W C Lathe
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69012, Heidelberg, Germany
| | | | | |
Collapse
|
228
|
Dandekar T, Huynen M, Regula JT, Ueberle B, Zimmermann CU, Andrade MA, Doerks T, Sánchez-Pulido L, Snel B, Suyama M, Yuan YP, Herrmann R, Bork P. Re-annotating the Mycoplasma pneumoniae genome sequence: adding value, function and reading frames. Nucleic Acids Res 2000; 28:3278-88. [PMID: 10954595 PMCID: PMC110705 DOI: 10.1093/nar/28.17.3278] [Citation(s) in RCA: 199] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Four years after the original sequence submission, we have re-annotated the genome of Mycoplasma pneumoniae to incorporate novel data. The total number of ORFss has been increased from 677 to 688 (10 new proteins were predicted in intergenic regions, two further were newly identified by mass spectrometry and one protein ORF was dismissed) and the number of RNAs from 39 to 42 genes. For 19 of the now 35 tRNAs and for six other functional RNAs the exact genome positions were re-annotated and two new tRNA(Leu) and a small 200 nt RNA were identified. Sixteen protein reading frames were extended and eight shortened. For each ORF a consistent annotation vocabulary has been introduced. Annotation reasoning, annotation categories and comparisons to other published data on M.pneumoniae functional assignments are given. Experimental evidence includes 2-dimensional gel electrophoresis in combination with mass spectrometry as well as gene expression data from this study. Compared to the original annotation, we increased the number of proteins with predicted functional features from 349 to 458. The increase includes 36 new predictions and 73 protein assignments confirmed by the published literature. Furthermore, there are 23 reductions and 30 additions with respect to the previous annotation. mRNA expression data support transcription of 184 of the functionally unassigned reading frames.
Collapse
Affiliation(s)
- T Dandekar
- EMBL, Postfach 102209, D-69012 Heidelberg, Germany, Max Delbrück Centre for Molecular Medicine, Robert-Rössle-Strabetae 10, 13092 Berlin-Buch, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
229
|
Huynen M, Snel B, Lathe W, Bork P. Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res 2000; 10:1204-10. [PMID: 10958638 PMCID: PMC310926 DOI: 10.1101/gr.10.8.1204] [Citation(s) in RCA: 350] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Various new methods have been proposed to predict functional interactions between proteins based on the genomic context of their genes. The types of genomic context that they use are Type I: the fusion of genes; Type II: the conservation of gene-order or co-occurrence of genes in potential operons; and Type III: the co-occurrence of genes across genomes (phylogenetic profiles). Here we compare these types for their coverage, their correlations with various types of functional interaction, and their overlap with homology-based function assignment. We apply the methods to Mycoplasma genitalium, the standard benchmarking genome in computational and experimental genomics. Quantitatively, conservation of gene order is the technique with the highest coverage, applying to 37% of the genes. By combining gene order conservation with gene fusion (6%), the co-occurrence of genes in operons in absence of gene order conservation (8%), and the co-occurrence of genes across genomes (11%), significant context information can be obtained for 50% of the genes (the categories overlap). Qualitatively, we observe that the functional interactions between genes are stronger as the requirements for physical neighborhood on the genome are more stringent, while the fraction of potential false positives decreases. Moreover, only in cases in which gene order is conserved in a substantial fraction of the genomes, in this case six out of twenty-five, does a single type of functional interaction (physical interaction) clearly dominate (>80%). In other cases, complementary function information from homology searches, which is available for most of the genes with significant genomic context, is essential to predict the type of interaction. Using a combination of genomic context and homology searches, new functional features can be predicted for 10% of M. genitalium genes.
Collapse
Affiliation(s)
- M Huynen
- European Molecular Biology Laboratory, 69117 Heidelberg, Germany.
| | | | | | | |
Collapse
|
230
|
Abstract
After our analysis of the distribution of predicted intrinsic curvature along all available complete prokaryotic genomes, the genomes were divided into two groups. Curvature distribution in all prokaryotes of the first group indicated a substantial fraction of promoters characterized by intrinsic DNA curvature located within or upstream of the promoter region. We did not find this peculiar DNA curvature distribution in prokaryotes in the second group. Remarkably, all bacteria of the first group were mesophilic, whereas many prokaryotes of the second group were hyperthermophilic. We hypothesize that DNA curvature plays a biologic role in gene regulation in mesophilic as opposed to hyperthermophilic prokaryotes, i.e., DNA curvature presumably has a functional adaptive significance determined by temperature selection.
Collapse
Affiliation(s)
- A Bolshoy
- Institute of Evolution, University of Haifa, Haifa, 31905 Israel.
| | | |
Collapse
|
231
|
Ponting CP, Schultz J, Copley RR, Andrade MA, Bork P. Evolution of domain families. ADVANCES IN PROTEIN CHEMISTRY 2000; 54:185-244. [PMID: 10829229 DOI: 10.1016/s0065-3233(00)54007-8] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- C P Ponting
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | | | | | | | |
Collapse
|
232
|
Huynen MA, Snel B. Gene and context: integrative approaches to genome analysis. ADVANCES IN PROTEIN CHEMISTRY 2000; 54:345-79. [PMID: 10829232 DOI: 10.1016/s0065-3233(00)54010-8] [Citation(s) in RCA: 45] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Affiliation(s)
- M A Huynen
- European Molecular Biology Laboratory, Heidelberg, Germany
| | | |
Collapse
|
233
|
Abstract
Faced with the avalanche of genomic sequences and data on messenger RNA expression, biological scientists are confronting a frightening prospect: piles of information but only flakes of knowledge. How can the thousands of sequences being determined and deposited, and the thousands of expression profiles being generated by the new array methods, be synthesized into useful knowledge? What form will this knowledge take? These are questions being addressed by scientists in the field known as 'functional genomics'.
Collapse
Affiliation(s)
- D Eisenberg
- Molecular Biology Institute and UCLA-DOE Laboratory of Structural Biology and Molecular Medicine, University of California at Los Angeles, 90095-1570, USA.
| | | | | | | |
Collapse
|
234
|
Abstract
A number of recent advances have been made in deriving function information from protein structure. A fold relationship to an already characterized protein will often allow general information about function to be deduced. More detailed information can be obtained using sequence relationships to already studied proteins. Methods of deducing function directly from structure, without the use of evolutionary relationships, are developing rapidly. All such methods may be used with models of protein structure, rather than with experimentally determined ones, but model accuracy imposes limitations. The rapid expansion of the structural genomics field has created a new urgency for improved methods of structure-based annotation of function.
Collapse
Affiliation(s)
- J Moult
- Center for Advanced Research in Biotechnology, University of Maryland, Biotechnology Institute, Rockville, MD 20850, USA.
| | | |
Collapse
|
235
|
Galperin MY, Koonin EV. Who's your neighbor? New computational approaches for functional genomics. Nat Biotechnol 2000; 18:609-13. [PMID: 10835597 DOI: 10.1038/76443] [Citation(s) in RCA: 224] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Several recently developed computational approaches in comparative genomics go beyond sequence comparison. By analyzing phylogenetic profiles of protein families, domain fusions, gene adjacency in genomes, and expression patterns, these methods predict many functional interactions between proteins and help deduce specific functions for numerous proteins. Although some of the resultant predictions may not be highly specific, these developments herald a new era in genomics in which the benefits of comparative analysis of the rapidly growing collection of complete genomes will become increasingly obvious.
Collapse
Affiliation(s)
- M Y Galperin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA
| | | |
Collapse
|
236
|
Abstract
Recently, a number of techniques have been proposed that use completely sequenced genomes for the function prediction of individual proteins encoded therein. They use the fusion of genes, their conserved location in operons or merely their co-occurrence in genomes to predict the existence of functional interactions between the proteins they encode. This type of information complements functional features that are predicted by classical homology-based search techniques.
Collapse
Affiliation(s)
- M Huynen
- European Molecular Biology Laboratory, Max-Delbrück-Centrum for Molecular Medicine, Heidelberg, Berlin-Buch, 69117, 13122, Germany.
| | | | | | | |
Collapse
|
237
|
Zweigenbaum J, Henion J. Bioanalytical high-throughput selected reaction monitoring-LC/MS determination of selected estrogen receptor modulators in human plasma: 2000 samples/day. Anal Chem 2000; 72:2446-54. [PMID: 10857619 DOI: 10.1021/ac991413p] [Citation(s) in RCA: 89] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The high-throughput determination of small molecules in biological matrixes has become an important part of drug discovery. This work shows that increased throughput LC/MS/MS techniques can be used for the analysis of selected estrogen receptor modulators in human plasma where more than 2000 samples may be analyzed in a 24-h period. The compounds used to demonstrate the high-throughput methodology include tamoxifen, raloxifene, 4-hydroxytamoxifen, nafoxidine, and idoxifene. Tamoxifen and raloxifene are used in both breast cancer therapy and osteoporosis and have shown prophylactic potential for the reduction of the risk of breast cancer. The described strategy provides LC/MS/MS separation and quantitation for each of the five test articles in control human plasma. The method includes sample preparation employing liquid-liquid extraction in the 96-well format, an LC separation of the five compounds in less than 30 s, and selected reaction monitoring detection from low nano- to microgram per milliter levels. Precision and accuracy are determined where each 96-well plate is considered a typical "tray" having calibration standards and quality control (QC) samples dispersed through each plate. A concept is introduced where 24 96-well plates analyzed in 1 day is considered a "grand tray", and the method is cross-validated with standards placed only at the beginning of the first plate and the end of the last plate. Using idoxifene-d5 as an internal standard, the results obtained for idoxifene and tamoxifen satisfy current bioanalytical method validation criteria on two separate days where 2112 and 2304 samples were run, respectively. Method validation included 24-h autosampler stability and one freeze-thaw cycle stability for the extracts. Idoxifene showed acceptable results with accuracy ranging from 0.3% for the high quality control (QC) to 15.4% for the low QC and precision of 3.6%-13.9% relative standard deviation. Tamoxifen showed accuracy ranging from 1.6% to 13.8% and precision from 7.8% to 15.2%. The linear dynamic range for these compounds was 3 orders of magnitude. The limit of quantification was 5 and 50 ng/ mL for tamoxifen and idoxifene, respectively. The other compounds in this study in general satisfy the more relaxed bioanalytical acceptance criteria for modern drug discovery. It is suggested that the quantification levels reported in this high-throughput analysis example are adequate for many drug discovery and related early pharmaceutical studies.
Collapse
Affiliation(s)
- J Zweigenbaum
- Analytical Toxicology, Department of Population Medicine and Diagnostic Science, New York State College of Veterinary Medicine, Cornell University, Ithaca 14850, USA
| | | |
Collapse
|
238
|
Moxon R, Tang C. Challenge of investigating biologically relevant functions of virulence factors in bacterial pathogens. Philos Trans R Soc Lond B Biol Sci 2000; 355:643-56. [PMID: 10874737 PMCID: PMC1692766 DOI: 10.1098/rstb.2000.0605] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Recent innovations have increased enormously the opportunities for investigating the molecular basis of bacterial pathogenicity, including the availability of whole-genome sequences, techniques for identifying key virulence genes, and the use of microarrays and proteomics. These methods should provide powerful tools for analysing the patterns of gene expression and function required for investigating host-microbe interactions in vivo. But, the challenge is exacting. Pathogenicity is a complex phenotype and the reductionist approach does not adequately address the eclectic and variable outcomes of host-microbe interactions, including evolutionary dynamics and ecological factors. There are difficulties in distinguishing bacterial 'virulence' factors from the many determinants that are permissive for pathogenicity, for example those promoting general fitness. A further practical problem for some of the major bacterial pathogens is that there are no satisfactory animal models or experimental assays that adequately reflect the infection under investigation. In this review, we give a personal perspective on the challenge of characterizing how bacterial pathogens behave in vivo and discuss some of the methods that might be most relevant for understanding the molecular basis of the diseases for which they are responsible. Despite the powerful genomic, molecular, cellular and structural technologies available to us, we are still struggling to come to grips with the question of 'What is a pathogen?'
Collapse
Affiliation(s)
- R Moxon
- Oxford University, Department of Paediatrics, John Radcliffe Hospital, UK.
| | | |
Collapse
|
239
|
Affiliation(s)
- P Bork
- European Molecular Biology Laboratory (EMBL) 69012 Heidelberg; Germany and Max-Delbrück-Centrum, D-13122 Berlin-Buch, Germany.
| |
Collapse
|
240
|
Abstract
The array format for analyzing peptide and protein function offers an attractive experimental alternative to traditional library screens. Powerful new approaches have recently been described, ranging from synthetic peptide arrays to whole proteins expressed in living cells. Comprehensive sets of purified peptides and proteins permit high-throughput screening for discrete biochemical properties, whereas formats involving living cells facilitate large-scale genetic screening for novel biological activities. In the past year, three major genome-scale studies using yeast as a model organism have investigated different aspects of protein function, including biochemical activities, gene disruption phenotypes, and protein-protein interactions. Such studies show that protein arrays can be used to examine in parallel the functions of thousands of proteins previously known only by their DNA sequence.
Collapse
Affiliation(s)
- A Q Emili
- Department of Genetics and Medicine, University of Washington, Seattle, WA 98195, USA
| | | |
Collapse
|
241
|
Yano N, Endoh M, Fadden K, Yamashita H, Kane A, Sakai H, Rifai A. Comprehensive gene expression profile of the adult human renal cortex: analysis by cDNA array hybridization. Kidney Int 2000; 57:1452-9. [PMID: 10760081 DOI: 10.1046/j.1523-1755.2000.00990.x] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
BACKGROUND Profiling of gene expression in healthy and diseased renal tissue is important for elucidating the pathogenesis of renal diseases. Comprehensive information about the genes expressed in renal tissue is unavailable. The recently developed cDNA array hybridization methodology allows simultaneous monitoring of thousands of genes expressed renal tissue. METHODS Complex [alpha-33P]-labeled cDNA probes were prepared from histopathologically uninvolved remnants of nine renal tissues obtained by nephrectomy. Each probe was hybridized to a high-density array of 18,326 paired target genes. The radioactive hybridization signals by phosphorimager screens were quantitated by special software. Bioinformatics from public genomic databases were used to assign a chromosomal location of each expressed transcript and gene function. Cluster analysis was used to arrange genes according to the similarity in pattern of gene expression. RESULTS A total of 7563 different gene transcripts was detected in the nine tissue samples. Approximately 870 of these genes were full-length mRNA human transcripts (HT), and the remaining 6693 were expressed sequence tags (ESTs). The full-length transcripts were classified by function of the gene product and were listed with information of their chromosomal positions. To allow a comparison between gene expression in clinical and experimental studies, the mouse genes with known similar function to the human counterpart were included in the bioinformatics analysis. Cluster analysis of 502 full-length genes that are expressed in four or more renal tissues revealed more than 110 genes that are highly expressed in all the renal specimens. CONCLUSIONS The presented data constitute a comprehensive preliminary transcriptional map of the adult human renal cortex. The information may serve as a resource for speeding up the discovery of genes underlying human renal disease. The integrated listing of the full-length expressed human and mouse genes is available through e-mail (Abdalla_Rifai@Brown.edu).
Collapse
Affiliation(s)
- N Yano
- Department of Pathology, Rhode Island Hospital and Brown University School of Medicine, Providence, RI 02903, USA
| | | | | | | | | | | | | |
Collapse
|
242
|
Abstract
Bioinformatics has, out of necessity, become a key aspect of drug discovery in the genomic revolution, contributing to both target discovery and target validation. The author describes the role that bioinformatics has played and will continue to play in response to the waves of genome-wide data sources that have become available to the industry, including expressed sequence tags, microbial genome sequences, model organism sequences, polymorphisms, gene expression data and proteomics. However, these knowledge sources must be intelligently integrated.
Collapse
|
243
|
Wilson CA, Kreychman J, Gerstein M. Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J Mol Biol 2000; 297:233-49. [PMID: 10704319 DOI: 10.1006/jmbi.2000.3550] [Citation(s) in RCA: 241] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Measuring in a quantitative, statistical sense the degree to which structural and functional information can be "transferred" between pairs of related protein sequences at various levels of similarity is an essential prerequisite for robust genome annotation. To this end, we performed pairwise sequence, structure and function comparisons on approximately 30,000 pairs of protein domains with known structure and function. Our domain pairs, which are constructed according to the SCOP fold classification, range in similarity from just sharing a fold, to being nearly identical. Our results show that traditional scores for sequence and structure similarity have the same basic exponential relationship as observed previously, with structural divergence, measured in RMS, being exponentially related to sequence divergence, measured in percent identity. However, as the scale of our survey is much larger than any previous investigations, our results have greater statistical weight and precision. We have been able to express the relationship of sequence and structure similarity using more "modern scores," such as Smith-Waterman alignment scores and probabilistic P-values for both sequence and structure comparison. These modern scores address some of the problems with traditional scores, such as determining a conserved core and correcting for length dependency; they enable us to phrase the sequence-structure relationship in more precise and accurate terms. We found that the basic exponential sequence-structure relationship is very general: the same essential relationship is found in the different secondary-structure classes and is evident in all the scoring schemes. To relate function to sequence and structure we assigned various levels of functional similarity to the domain pairs, based on a simple functional classification scheme. This scheme was constructed by combining and augmenting annotations in the enzyme and fly functional classifications and comparing subsets of these to the Escherichia coli and yeast classifications. We found sigmoidal relationships between similarity in function and sequence, with clear thresholds for different levels of functional conservation. For pairs of domains that share the same fold, precise function appears to be conserved down to approximately 40 % sequence identity, whereas broad functional class is conserved to approximately 25 %. Interestingly, percent identity is more effective at quantifying functional conservation than the more modern scores (e.g. P-values). Results of all the pairwise comparisons and our combined functional classification scheme for protein structures can be accessed from a web database at http://bioinfo.mbb.yale.edu/alignCopyright 2000 Academic Press.
Collapse
Affiliation(s)
- C A Wilson
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | | | | |
Collapse
|
244
|
Schuster S, Fell DA, Dandekar T. A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nat Biotechnol 2000; 18:326-32. [PMID: 10700151 DOI: 10.1038/73786] [Citation(s) in RCA: 578] [Impact Index Per Article: 23.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A set of linear pathways often does not capture the full range of behaviors of a metabolic network. The concept of 'elementary flux modes' provides a mathematical tool to define and comprehensively describe all metabolic routes that are both stoichiometrically and thermodynamically feasible for a group of enzymes. We have used this concept to analyze the interplay between the pentose phosphate pathway (PPP) and glycolysis. The set of elementary modes for this system involves conventional glycolysis, a futile cycle, all the modes of PPP function described in biochemistry textbooks, and additional modes that are a priori equally entitled to pathway status. Applications include maximizing product yield in amino acid and antibiotic synthesis, reconstruction and consistency checks of metabolism from genome data, analysis of enzyme deficiencies, and drug target identification in metabolic networks.
Collapse
Affiliation(s)
- S Schuster
- Department of Bioinformatics, Max Delbrück Center for Molecular Medicine, D-13092 Berlin-Buch, Germany
| | | | | |
Collapse
|
245
|
Kell DB, King RD. On the optimization of classes for the assignment of unidentified reading frames in functional genomics programmes: the need for machine learning. Trends Biotechnol 2000; 18:93-8. [PMID: 10675895 DOI: 10.1016/s0167-7799(99)01407-9] [Citation(s) in RCA: 46] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
At present, the assignment of function to novel genes uncovered by the systematic genome-sequencing programmes is a problem. Many studies anticipate that this can be achieved by analysing patterns of gene expression via the transcriptome, proteome and metabolome. Thus, functional genomics is, in part, an exercise in pattern classification. Because many genes have known functional classes, the problem of predicting their functional class is a supervised learning problem. However, most pattern classification methods that have been applied to the problem have been unsupervised clustering methods. Consequently, the best classification tools have not always been used. Furthermore, the present functional classes are suboptimal and new unsupervised clustering methods are needed to improve them. Better-structured functional classes will facilitate the prediction of biochemically testable functions.
Collapse
Affiliation(s)
- D B Kell
- Institute of Biological Sciences, University of Wales, Aberystwyth, UK SY23 3DD.
| | | |
Collapse
|
246
|
Gelfand MS, Koonin EV, Mironov AA. Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. Nucleic Acids Res 2000; 28:695-705. [PMID: 10637320 PMCID: PMC102549 DOI: 10.1093/nar/28.3.695] [Citation(s) in RCA: 136] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Intragenomic and intergenomic comparisons of upstream nucleotide sequences of archaeal genes were performed with the goal of predicting transcription regulatory sites (operators) and identifying likely regulons. Learning sets for the detection of regulatory sites were constructed using the available experimental data on archaeal transcription regulation or by analogy with known bacterial regulons, and further analysis was performed using iterative profile searches. The information content of the candidate signals detected by this method is insufficient for reliable predictions to be made. Therefore, this approach has to be complemented by examination of evolutionary conservation in different archaeal genomes. This combined strategy resulted in the prediction of a conserved heat shock regulon in all euryarchaea, a nitrogen fixation regulon in the methanogens Methanococcus jannaschii and Methanobacterium thermoautotrophicum and an aromatic amino acid regulon in M.thermoautotrophicum. Unexpectedly, the heat shock regulatory site was detected not only for genes that encode known chaperone proteins but also for archaeal histone genes. This suggests a possible function for archaeal histones in stress-related changes in DNA condensation. In addition, comparative analysis of the genomes of three Pyrococcus species resulted in the prediction of their purine metabolism and transport regulon. The results demonstrate the feasibility of prediction of at least some transcription regulatory sites by comparing poorly characterized prokaryotic genomes, particularly when several closely related genome sequences are available.
Collapse
Affiliation(s)
- M S Gelfand
- State Scientific Center for Biotechnology NIIGenetika, Moscow 113545, Russia.
| | | | | |
Collapse
|
247
|
Pritchard L, Dufton MJ. Do proteins learn to evolve? The Hopfield network as a basis for the understanding of protein evolution. J Theor Biol 2000; 202:77-86. [PMID: 10623501 DOI: 10.1006/jtbi.1999.1043] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Correlations between amino-acid residues can be observed in sets of aligned protein sequences, and the analysis of their statistical and evolutionary significance and distribution has been thoroughly investigated. In this paper, we present a model based on such covariations in protein sequences in which the pairs of residues that have mutual influence combine to produce a system analogous to a Hopfield neural network. The emergent properties of such a network, such as soft failure and the connection between network architecture and stored memory, have close parallels in known proteins. This model suggests that an explanation for observed characters of proteins such as the diminution of function by substitutions distant from the active site, the existence of protein folds (superfolds) that can perform several functions based on one architecture, and structural and functional resilience to destabilizing substitutions might derive from their inherent network-like structure. This model may also provide a basis for mapping the relationship between structure, function and evolutionary history of a protein family, and thus be a powerful tool for rational engineering.
Collapse
Affiliation(s)
- L Pritchard
- Department of Pure and Applied Chemistry, University of Strathclyde, 295 Cathedral Street, Glasgow, Scotland, G1 1XL, U.K
| | | |
Collapse
|
248
|
Sánchez R, Pieper U, Mirković N, de Bakker PI, Wittenstein E, Sali A. MODBASE, a database of annotated comparative protein structure models. Nucleic Acids Res 2000; 28:250-3. [PMID: 10592238 PMCID: PMC102433 DOI: 10.1093/nar/28.1.250] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/1999] [Revised: 10/11/1999] [Accepted: 10/11/1999] [Indexed: 11/14/2022] Open
Abstract
MODBASE is a queryable database of annotated comparative protein structure models. The models are derived by MODPIPE, an automated modeling pipeline relying on the programs PSI-BLAST and MODELLER. The database currently contains 3D models for substantial portions of approximately 17 000 proteins from 10 complete genomes, including those of Caenorhabditis elegans, Saccharomyces cerevisiae and Escherichia coli, as well as all the available sequences from Arabidopsis thaliana and Homo sapiens. The database also includes fold assignments and alignments on which the models were based. In addition, special care is taken to assess the quality of the models. ModBase is accessible through a web interface at http://guitar.rockefeller.edu/modbase/
Collapse
Affiliation(s)
- R Sánchez
- Laboratories of Molecular Biophysics, The Pels Family Center for Biochemistry, The Rockefeller University, 1230 York Avenue, New York, NY 10021, USA
| | | | | | | | | | | |
Collapse
|
249
|
King RD, Karwath A, Clare A, Dehaspe L. Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining. Yeast 2000; 17:283-93. [PMID: 11119305 PMCID: PMC2448385 DOI: 10.1002/1097-0061(200012)17:4<283::aid-yea52>3.0.co;2-f] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M. tuberculosis and E. coli genomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M. tuberculosis and 24% of those in E. coli, with an estimated accuracy of 60-80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology. These rules give insight into the evolutionary history of M. tuberculosis and E. coli.
Collapse
Affiliation(s)
- R D King
- Department of Computer Science, University of Wales, Aberystwyth, Penglais, Aberystwyth, Ceredigion SY23 3DB, UK
| | | | | | | |
Collapse
|
250
|
Natale DA, Shankavaram UT, Galperin MY, Wolf YI, Aravind L, Koonin EV. Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs). Genome Biol 2000; 1:RESEARCH0009. [PMID: 11178258 PMCID: PMC15027 DOI: 10.1186/gb-2000-1-5-research0009] [Citation(s) in RCA: 83] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2000] [Revised: 08/25/2000] [Accepted: 09/21/2000] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi. RESULTS A. pernix and P. abyssi proteins were assigned to COGs using the COGNITOR program; the results were verified on a case-by-case basis and augmented by additional database searches using the PSI-BLAST and TBLASTN programs. Functions were predicted for over 300 proteins from A. pernix, which could not be assigned a function using conventional methods with a conservative sequence similarity threshold, an approximately 50% increase compared to the original annotation. A. pernix shares most of the conserved core of proteins that were previously identified in the Euryarchaeota. Cluster analysis or distance matrix tree construction based on the co-occurrence of genomes in COGs showed that A. pernix forms a distinct group within the archaea, although grouping with the two species of Pyrococci, indicative of similar repertoires of conserved genes, was observed. No indication of a specific relationship between Crenarchaeota and eukaryotes was obtained in these analyses. Several proteins that are conserved in Euryarchaeota and most bacteria are unexpectedly missing in A. pernix, including the entire set of de novo purine biosynthesis enzymes, the GTPase FtsZ (a key component of the bacterial and euryarchaeal cell-division machinery), and the tRNA-specific pseudouridine synthase, previously considered universal. A. pernix is represented in 48 COGs that do not contain any euryarchaeal members. Many of these proteins are TCA cycle and electron transport chain enzymes, reflecting the aerobic lifestyle of A. pernix. CONCLUSIONS Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and predicted protein functions provide for a significant improvement in genome annotation. A differential genome display approach helps in a systematic investigation of common and distinct features of gene repertoires and in some cases reveals unexpected connections that may be indicative of functional similarities between phylogenetically distant organisms and of lateral gene exchange.
Collapse
Affiliation(s)
- Darren A Natale
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rockville Pike, Bethesda, MD 20894, USA. E-mail:
| | - Uma T Shankavaram
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rockville Pike, Bethesda, MD 20894, USA. E-mail:
| | - Michael Y Galperin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rockville Pike, Bethesda, MD 20894, USA. E-mail:
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rockville Pike, Bethesda, MD 20894, USA. E-mail:
| | - L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rockville Pike, Bethesda, MD 20894, USA. E-mail:
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rockville Pike, Bethesda, MD 20894, USA. E-mail:
| |
Collapse
|