151
|
Zhu S, Okuno Y, Tsujimoto G, Mamitsuka H. Application of a new probabilistic model for mining implicit associated cancer genes from OMIM and medline. Cancer Inform 2007; 2:361-71. [PMID: 19458778 PMCID: PMC2675505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
An important issue in current medical science research is to find the genes that are strongly related to an inherited disease. A particular focus is placed on cancer-gene relations, since some types of cancers are inherited. As biomedical databases have grown speedily in recent years, an informatics approach to predict such relations from currently available databases should be developed. Our objective is to find implicit associated cancer-genes from biomedical databases including the literature database. Co-occurrence of biological entities has been shown to be a popular and efficient technique in biomedical text mining. We have applied a new probabilistic model, called mixture aspect model (MAM) [48], to combine different types of co-occurrences of genes and cancer derived from Medline and OMIM (Online Mendelian Inheritance in Man). We trained the probability parameters of MAM using a learning method based on an EM (Expectation and Maximization) algorithm. We examined the performance of MAM by predicting associated cancer gene pairs. Through cross-validation, prediction accuracy was shown to be improved by adding gene-gene co-occurrences from Medline to cancer-gene cooccurrences in OMIM. Further experiments showed that MAM found new cancer-gene relations which are unknown in the literature. Supplementary information can be found at http://www.bic.kyotou.ac.jp/pathway/zhusf/CancerInformatics/Supplemental2006.html.
Collapse
Affiliation(s)
- Shanfeng Zhu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University,Correspondence: Shanfeng Zhu, Kyoto University, Gokasho, Uji, 611-0011, Japan.
, Phone: +81-774-383038, Fax: +81-774-383037
| | - Yasushi Okuno
- Graduate School of Pharmaceutical Sciences, Kyoto University
| | - Gozoh Tsujimoto
- Graduate School of Pharmaceutical Sciences, Kyoto University
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University,Graduate School of Pharmaceutical Sciences, Kyoto University
| |
Collapse
|
152
|
Abstract
Breast cancer is the second most common cause of cancer-related death in women in the US and the UK, accounting for 15-17% of all female cancer deaths. Current treatment strategies include hormone therapy, such as anti-estrogens (tamoxifen) and aromatase inhibitors (exemastane, anastrozole, letrozole), as well as cytotoxics, such as the taxanes (paclitaxel, docetaxel). With multiple therapy choices, a method to prospectively screen patients prior to therapy selection is now needed. Pharmacogenetics seeks to develop screening mechanisms to optimise drug therapy. DNA variations in metabolism, transport and drug target genes may contribute to chemotherapy efficacy and toxicities. The status of the identification of genetic markers for breast cancer therapy selection is highlighted in this review.
Collapse
Affiliation(s)
- Sharon Marsh
- Washington University School of Medicine, Division of Oncology, St Louis, MO 63110, USA.
| | | |
Collapse
|
153
|
Grow M, Neff AW, Mescher AL, King MW. Global analysis of gene expression in Xenopus hindlimbs during stage-dependent complete and incomplete regeneration. Dev Dyn 2007; 235:2667-85. [PMID: 16871633 DOI: 10.1002/dvdy.20897] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Xenopus laevis tadpoles are capable of limb regeneration after amputation, in a process that initially involves the formation of a blastema. However, Xenopus has full regenerative capacity only through premetamorphic stages. We have used the Affymetrix Xenopus laevis Genome Genechip microarray to perform a large-scale screen of gene expression in the regeneration-complete, stage 53 (st53), and regeneration-incomplete, stage 57 (st57), hindlimbs at 1 and 5 days postamputation. Through an exhaustive reannotation of the Genechip and a variety of comparative bioinformatic analyses, we have identified genes that are differentially expressed between the regeneration-complete and -incomplete stages, detected the transcriptional changes associated with the regenerating blastema, and compared these results with those of other regeneration researchers. We focus particular attention on striking transcriptional activity observed in genes associated with patterning, stress response, and inflammation. Overall, this work provides the most comprehensive views yet of a regenerating limb and different transcriptional compositions of regeneration-competent and deficient tissues.
Collapse
Affiliation(s)
- Matthew Grow
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana, USA.
| | | | | | | |
Collapse
|
154
|
Murphy AM, MacHugh DE, Park SDE, Scraggs E, Haley CS, Lynn DJ, Boland MP, Doherty ML. Linkage mapping of the locus for inherited ovine arthrogryposis (IOA) to sheep chromosome 5. Mamm Genome 2007; 18:43-52. [PMID: 17242863 DOI: 10.1007/s00335-006-0016-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2006] [Accepted: 09/21/2006] [Indexed: 11/30/2022]
Abstract
Arthrogryposis is a congenital malformation affecting the limbs of newborn animals and infants. Previous work has demonstrated that inherited ovine arthrogryposis (IOA) has an autosomal recessive mode of inheritance. Two affected homozygous recessive (art/art) Suffolk rams were used as founders for a backcross pedigree of half-sib families segregating the IOA trait. A genome scan was performed using 187 microsatellite genetic markers and all backcross animals were phenotyped at birth for the presence and severity of arthrogryposis. Pairwise LOD scores of 1.86, 1.35, and 1.32 were detected for three microsatellites, BM741, JAZ, and RM006, that are located on sheep Chr 5 (OAR5). Additional markers in the region were identified from the genetic linkage map of BTA7 and by in silico analyses of the draft bovine genome sequence, three of which were informative. Interval mapping of all autosomes produced an F value of 21.97 (p < 0.01) for a causative locus in the region of OAR5 previously flagged by pairwise linkage analysis. Inspection of the orthologous region of HSA5 highlighted a previously fine-mapped locus for human arthrogryposis multiplex congenita neurogenic type (AMCN). A survey of the HSA5 genome sequence identified plausible candidate genes for both IOA and human AMCN.
Collapse
Affiliation(s)
- Angela M Murphy
- Animal Genomics Laboratory, School of Agriculture, Food Science and Veterinary Medicine, College of Life Sciences, University College Dublin, Belfield, Dublin 4, Ireland
| | | | | | | | | | | | | | | |
Collapse
|
155
|
Fink RC, Evans MR, Porwollik S, Vazquez-Torres A, Jones-Carson J, Troxell B, Libby SJ, McClelland M, Hassan HM. FNR is a global regulator of virulence and anaerobic metabolism in Salmonella enterica serovar Typhimurium (ATCC 14028s). J Bacteriol 2007; 189:2262-73. [PMID: 17220229 PMCID: PMC1899381 DOI: 10.1128/jb.00726-06] [Citation(s) in RCA: 112] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Salmonella enterica serovar Typhimurium must successfully transition the broad fluctuations in oxygen concentrations encountered in the host. In Escherichia coli, FNR is one of the main regulatory proteins involved in O2 sensing. To assess the role of FNR in serovar Typhimurium, we constructed an isogenic fnr mutant in the virulent wild-type strain (ATCC 14028s) and compared their transcriptional profiles and pathogenicities in mice. Here, we report that, under anaerobic conditions, 311 genes (6.80% of the genome) are regulated directly or indirectly by FNR; of these, 87 genes (28%) are poorly characterized. Regulation by FNR in serovar Typhimurium is similar to, but distinct from, that in E. coli. Thus, genes/operons involved in aerobic metabolism, NO. detoxification, flagellar biosynthesis, motility, chemotaxis, and anaerobic carbon utilization are regulated by FNR in a fashion similar to that in E. coli. However, genes/operons existing in E. coli but regulated by FNR only in serovar Typhimurium include those coding for ethanolamine utilization, a universal stress protein, a ferritin-like protein, and a phosphotransacetylase. Interestingly, Salmonella-specific genes/operons regulated by FNR include numerous virulence genes within Salmonella pathogenicity island 1 (SPI-1), newly identified flagellar genes (mcpAC, cheV), and the virulence operon (srfABC). Furthermore, the role of FNR as a positive regulator of motility, flagellar biosynthesis, and pathogenesis was confirmed by showing that the mutant is nonmotile, lacks flagella, is attenuated in mice, and does not survive inside macrophages. The inability of the mutant to survive inside macrophages is likely due to its sensitivity to the reactive oxygen species generated by NADPH phagocyte oxidase.
Collapse
Affiliation(s)
- Ryan C Fink
- Department of Microbiology, North Carolina State University, Raleigh, NC 27695-7615, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
156
|
Benos PV, Corcoran DL, Feingold E. Web-based identification of evolutionary conserved DNA cis-regulatory elements. Methods Mol Biol 2007; 395:425-436. [PMID: 17993689 DOI: 10.1007/978-1-59745-514-5_26] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Transcription regulation on a gene-by-gene basis is achieved through transcription factors, the DNA-binding proteins that recognize short DNA sequences in the proximity of the genes. Unlike other DNA-binding proteins, each transcription factor recognizes a number of sequences, usually variants of a preferred, "consensus" sequence. The degree of dissimilarity of a given target sequence from the consensus is indicative of the binding affinity of the transcription factor-DNA interaction. Because of the short size and the degeneracy of the patterns, it is frequently difficult for a computational algorithm to distinguish between the true sites and the background genomic "noise." One way to overcome this problem of low signal-to-noise ratio is to use evolutionary information to detect signals that are conserved in two or more species. FOOTER is an algorithm that uses this phylogenetic footprinting concept and evaluates putative mammalian transcription factor binding sites in a quantitative way. The user is asked to upload the human and mouse promoter sequences and select the transcription factors to be analyzed. The results' page presents an alignment of the two sequences (color-coded by degree of conservation) and information about the predicted sites and single-nucleotide polymorphisms found around the predicted sites. This chapter presents the main aspects of the underlying method and gives detailed instructions and tips on the use of this web-based tool.
Collapse
Affiliation(s)
- Panayiotis V Benos
- Department of Computational Biology, University of Pittsburgh School of Medicine, USA
| | | | | |
Collapse
|
157
|
Thielen JL, Volzing KG, Collier LS, Green LE, Largaespada DA, Marker PC. Markers of prostate region-specific epithelial identity define anatomical locations in the mouse prostate that are molecularly similar to human prostate cancers. Differentiation 2007; 75:49-61. [PMID: 17244021 DOI: 10.1111/j.1432-0436.2006.00115.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Although the basic functions of the prostate gland are conserved among mammals, its morphology varies greatly among species. Comparative studies between mouse and human are important because mice are widely used to study prostate cancer, a disease that occurs in a region-restricted manner within the human prostate. An informatics-based approach was used to identify prostate-specific human genes as candidate markers of region-specific identity that might distinguish prostatic ducts prone to prostate cancer from ducts that rarely give rise to cancer. Subsequent analysis of normal and cancerous human prostates demonstrated that the genes microseminoprotein-beta (MSMB) and transglutaminase 4 (TGM4) were expressed in distinct groups of ducts in the normal human prostate, and only MSMB was detected in areas of prostate cancer. The mouse orthologs of MSMB and TGM4 were then used for expression studies in mice along with the mouse ventrally expressed gene spermine binding protein (SBP). All three genes were informative markers of region-specific epithelial identity with distinct expression patterns that collectively accounted for all ducts in the mouse prostate. Together with the human data, this suggested that MSMB expression defines an anatomical domain in the mouse prostate that is molecularly most similar to human prostate cancers. Computer-assisted serial section reconstruction was used to visualize the complete expression domains for MSMB, SBP, and TGM4 in the mouse prostate. This showed that MSMB is expressed in prostatic ducts that comprise 21% of the mouse dorso-lateral prostate. Finally, the expression of MSMB, SBP, and TGM4 was evaluated in a mouse prostate cancer model created by the prostate epithelium-specific deletion of the tumor suppressor PTEN. MSMB and TGM4 were rapidly and dramatically down-regulated in response to PTEN deletion suggesting that this model of prostate cancer includes a more rapid de-differentiation of the prostatic epithelium than is observed in organ-confined human prostate cancers.
Collapse
Affiliation(s)
- Joshua L Thielen
- Department of Genetics, University of Minnesota, Minneapolis, MN 55455, USA
| | | | | | | | | | | |
Collapse
|
158
|
Abstract
All protein coding genes have a phylogenetic history that when understood can lead to deep insights into the diversification or conservation of function, the evolution of developmental complexity, and the molecular basis of disease. One important part to reconstructing the relationships among genes in different organisms is an accurate method to find orthologs as well as an accurate measure of evolutionary diversification. The present chapter details such a method, called the reciprocal smallest distance algorithm (RSD). This approach improves upon the common procedure of taking reciprocal best Basic Local Alignment Search Tool hits (RBH) in the identification of orthologs by using global sequence alignment and maximum likelihood estimation of evolutionary distances to detect orthologs between two genomes. RSD finds many putative orthologs missed by RBH because it is less likely to be misled by the presence of close paralogs in genomes. The package offers a tremendous amount of flexibility in investigating parameter settings allowing the user to search for increasingly distant orthologs between highly divergent species, among other advantages. The flexibility of this tool makes it a unique and powerful addition to other available approaches for ortholog detection.
Collapse
|
159
|
Khan AM, Heiny AT, Lee KX, Srinivasan KN, Tan TW, August JT, Brusic V. Large-scale analysis of antigenic diversity of T-cell epitopes in dengue virus. BMC Bioinformatics 2006; 7 Suppl 5:S4. [PMID: 17254309 PMCID: PMC1764481 DOI: 10.1186/1471-2105-7-s5-s4] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Background Antigenic diversity in dengue virus strains has been studied, but large-scale and detailed systematic analyses have not been reported. In this study, we report a bioinformatics method for analyzing viral antigenic diversity in the context of T-cell mediated immune responses. We applied this method to study the relationship between short-peptide antigenic diversity and protein sequence diversity of dengue virus. We also studied the effects of sequence determinants on viral antigenic diversity. Short peptides, principally 9-mers were studied because they represent the predominant length of binding cores of T-cell epitopes, which are important for formulation of vaccines. Results Our analysis showed that the number of unique protein sequences required to represent complete antigenic diversity of short peptides in dengue virus is significantly smaller than that required to represent complete protein sequence diversity. Short-peptide antigenic diversity shows an asymptotic relationship to the number of unique protein sequences, indicating that for large sequence sets (~200) the addition of new protein sequences has marginal effect to increasing antigenic diversity. A near-linear relationship was observed between the extent of antigenic diversity and the length of protein sequences, suggesting that, for the practical purpose of vaccine development, antigenic diversity of short peptides from dengue virus can be represented by short regions of sequences (~<100 aa) within viral antigens that are specific targets of immune responses (such as T-cell epitopes specific to particular human leukocyte antigen alleles). Conclusion This study provides evidence that there are limited numbers of antigenic combinations in protein sequence variants of a viral species and that short regions of the viral protein are sufficient to capture antigenic diversity of T-cell epitopes. The approach described herein has direct application to the analysis of other viruses, in particular those that show high diversity and/or rapid evolution, such as influenza A virus and human immunodeficiency virus (HIV).
Collapse
Affiliation(s)
- Asif M Khan
- The Division of Biomedical Sciences, Johns Hopkins Singapore, 31 Biopolis Way, #02-01 The Nanos, Singapore 138669, Singapore
- Department of Microbiology, The Yong Loo Lin School of Medicine, National University of Singapore, 5 Science Drive 2, Singapore 117597, Singapore
| | - AT Heiny
- The Division of Biomedical Sciences, Johns Hopkins Singapore, 31 Biopolis Way, #02-01 The Nanos, Singapore 138669, Singapore
- Department of Biochemistry, The Yong Loo Lin School of Medicine, National University of Singapore, 5 Science Drive 2, Singapore 117597, Singapore
| | - Kenneth X Lee
- The Division of Biomedical Sciences, Johns Hopkins Singapore, 31 Biopolis Way, #02-01 The Nanos, Singapore 138669, Singapore
- Department of Microbiology, The Yong Loo Lin School of Medicine, National University of Singapore, 5 Science Drive 2, Singapore 117597, Singapore
| | - KN Srinivasan
- The Division of Biomedical Sciences, Johns Hopkins Singapore, 31 Biopolis Way, #02-01 The Nanos, Singapore 138669, Singapore
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, USA
| | - Tin Wee Tan
- Department of Biochemistry, The Yong Loo Lin School of Medicine, National University of Singapore, 5 Science Drive 2, Singapore 117597, Singapore
| | - J Thomas August
- The Division of Biomedical Sciences, Johns Hopkins Singapore, 31 Biopolis Way, #02-01 The Nanos, Singapore 138669, Singapore
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, USA
| | - Vladimir Brusic
- Department of Microbiology, The Yong Loo Lin School of Medicine, National University of Singapore, 5 Science Drive 2, Singapore 117597, Singapore
- School of Land and Food Sciences, and Institute for Molecular Biosciences, University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
160
|
Abstract
With the explosion in genomic and functional genomics information, methods for disease gene identification are rapidly evolving. Databases are now essential to the process of selecting candidate disease genes. Combining positional information with disease characteristics and functional information is the usual strategy by which candidate disease genes are selected. Enrichment for candidate disease genes, however, depends on the skills of the operating researcher. Over the past few years, a number of bioinformatics methods that enrich for the most likely candidate disease genes have been developed. Such in silico prioritisation methods may further improve by completion of datasets, by development of standardised ontologies across databases and species and, ultimately, by the integration of different strategies.
Collapse
Affiliation(s)
- Marc A van Driel
- Molecular Biology Department, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - Han G Brunner
- Department of Human Genetics, University Medical Centre Nijmegen, Geert Grooteplein 10, Nijmegen, The Netherlands
| |
Collapse
|
161
|
Abstract
Alternative splicing increases transcriptome and proteome diversification. Previous analyses aiming at comparing the rate of alternative splicing between different organisms provided contradicting results. These contradicting results were attributed to the fact that both analyses were dependent on the expressed sequence tag (EST) coverage, which varies greatly between the tested organisms. In this study we compare the level of alternative splicing among eight different organisms. By employing an EST independent approach we reveal that the percentage of genes and exons undergoing alternative splicing is higher in vertebrates compared with invertebrates. We also find that alternative exons of the skipping type are flanked by longer introns compared to constitutive ones, whereas alternative 5′ and 3′ splice sites events are generally not. In addition, although the regulation of alternative splicing and sizes of introns and exons have changed during metazoan evolution, intron retention remained the rarest type of alternative splicing, whereas exon skipping is more prevalent and exhibits a slight increase, from invertebrates to vertebrates. The difference in the level of alternative splicing suggests that alternative splicing may contribute greatly to the mammal higher level of phenotypic complexity, and that accumulation of introns confers an evolutionary advantage as it allows increasing the number of alternative splicing forms.
Collapse
Affiliation(s)
| | | | - Gil Ast
- To whom correspondence should be addressed. Tel: +972 3 640 9900; Fax: +972 3 640 6893;
| |
Collapse
|
162
|
|
163
|
Ren Q, Chen K, Paulsen IT. TransportDB: a comprehensive database resource for cytoplasmic membrane transport systems and outer membrane channels. Nucleic Acids Res 2006; 35:D274-9. [PMID: 17135193 PMCID: PMC1747178 DOI: 10.1093/nar/gkl925] [Citation(s) in RCA: 307] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
TransportDB () is a comprehensive database resource of information on cytoplasmic membrane transporters and outer membrane channels in organisms whose complete genome sequences are available. The complete set of membrane transport systems and outer membrane channels of each organism are annotated based on a series of experimental and bioinformatic evidence and classified into different types and families according to their mode of transport, bioenergetics, molecular phylogeny and substrate specificities. User-friendly web interfaces are designed for easy access, query and download of the data. Features of the TransportDB website include text-based and BLAST search tools against known transporter and outer membrane channel proteins; comparison of transporter and outer membrane channel contents from different organisms; known 3D structures of transporters, and phylogenetic trees of transporter families. On individual protein pages, users can find detailed functional annotation, supporting bioinformatic evidence, protein/DNA sequences, publications and cross-referenced external online resource links. TransportDB has now been in existence for over 10 years and continues to be regularly updated with new evidence and data from newly sequenced genomes, as well as having new features added periodically.
Collapse
Affiliation(s)
| | | | - Ian T. Paulsen
- To whom correspondence should be addressed. Tel: +1 301 795 7531; Fax: +1 301 838 0208;
| |
Collapse
|
164
|
Abstract
The GeneSpeed database () is an online database and resource tool facilitating the detailed study of protein domain homology in the transcriptomes of Homo sapiens, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans. The population schema for the GeneSpeed database takes advantage of HOWARD™ parallel cluster technology () and performs exhaustive tBLASTn searches covering all pre-assigned PFAM domain classes in all species (currently 7973 domain families) against the respective Unigene EST databases of the selected four transcriptomes. The resulting database provides a complete annotation of presumed protein domain presence for each Unigene cluster. To complement this domain annotation we have also performed a custom transcription factor-family curation of all Pfam domains, incorporated the Gene Ontology classifications for these domains as well as integrated the Novartis SymAtlas2 dataset for both human and mouse which provides rapid and easy access to tissue-based expression analysis. Consequently, the GeneSpeed database provides the user with the capability to browse or search the database by any of these specialized criteria as well as more traditional means (gene identifier, gene symbol, etc.), thereby enabling a supervised analysis of gene families through a top-down hierarchical basis defined by domain content, all directly linked to an optimized gene expression dataset.
Collapse
Affiliation(s)
| | | | - Jan Jensen
- To whom correspondence should be addressed. Tel: +1 303 724 6844; Fax: +1 303 724 6830;
| |
Collapse
|
165
|
Wishart DS. Discovering drug targets through the web. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY D-GENOMICS & PROTEOMICS 2006; 2:9-17. [PMID: 20483274 DOI: 10.1016/j.cbd.2006.01.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2005] [Revised: 01/28/2006] [Accepted: 01/30/2006] [Indexed: 11/25/2022]
Abstract
Traditionally, drug-target discovery is a "wet-bench" experimental process, depending on carefully designed genetic screens, biochemical tests and cellular assays to identify proteins and genes that are associated with a particular disease or condition. However, recent advances in DNA sequencing, transcript profiling, protein identification and protein quantification are leading to a flood of genomic and proteomic data that is, or potentially could be, linked to disease data. The quantity of data generated by these high throughput methods is forcing scientists to re-think the way they do traditional drug-target discovery. In particular it is leading them more and more towards identifying potential drug targets using computers. In fact, drug-target identification is now being done as much on the desk-top as on the bench-top. This review focuses on describing how drug-target discovery can be done in silico (i.e. via computer) using a variety of bioinformatic resources that are freely available on the web. Specifically, it highlights a number of web-accessible sequence databases, automated genome annotation tools, text mining tools; and integrated drug/sequence databases that can be used to identify drug targets for both endogenous (genetic and epigenetic) diseases as well as exogenous (infectious) diseases.
Collapse
Affiliation(s)
- David S Wishart
- Departments of Computing Science and Biological Sciences, University of Alberta, Edmonton, AB, Canada T6G 2E8
| |
Collapse
|
166
|
Intra J, Perotti ME, Pavesi G, Horner D. Comparative and phylogenetic analysis of alpha-L-fucosidase genes. Gene 2006; 392:34-46. [PMID: 17175120 DOI: 10.1016/j.gene.2006.11.002] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2006] [Revised: 10/20/2006] [Accepted: 11/06/2006] [Indexed: 12/25/2022]
Abstract
Fucosylated glycoconjugates play a role in a wide variety of biological processes, including immune responses, signal transduction, ontogenic events and pathogenesis of several human diseases. Alpha-L-fucosidases, which are responsible for their processing, have been demonstrated to be involved in lysosomal storage disease, inflammation, cystic fibrosis, cancer development and in the interactions between gametes in vertebrates as well as invertebrates. The sequence and comparative genomic analysis of these glycosyl hydrolases and the study of their evolutionary relationships appear therefore to be of considerable interest. In this work we carried out extensive similarity searches and comparative analyses to identify sequences encoding alpha-L-fucosidases. We have identified novel alpha-L-fucosidase coding sequences in worms, insects, sea urchin, ascidians, fish, chicken, amphibians, mammals and various bacteria resulting in a total of 39 alpha-L-fucosidase sequences. Two alpha-L-fucosidases that are present in all vertebrates likely reflect a distinct biological role for paralogous genes. Comparative sequence analysis of all metazoan alpha-L-fucosidases reveals a broad conservation of features, including the aspartate residue that constitutes the catalytic nucleophile. However, a cysteine which is thought to be part of the active site is also conserved in metazoa but not in arthropods, where it is replaced by an alanine. Phylogenetic analysis suggests a gene duplication event very early in metazoan evolution with the subsequent differential loss of isoforms in various metazoan lineages.
Collapse
Affiliation(s)
- Jari Intra
- Dipartimento di Scienze Biomolecolari e Biotecnologie, Università di Milano, Via Celoria 26, 20133 Milano, Italy.
| | | | | | | |
Collapse
|
167
|
Gioia J, Qin X, Jiang H, Clinkenbeard K, Lo R, Liu Y, Fox GE, Yerrapragada S, McLeod MP, McNeill TZ, Hemphill L, Sodergren E, Wang Q, Muzny DM, Homsi FJ, Weinstock GM, Highlander SK. The genome sequence of Mannheimia haemolytica A1: insights into virulence, natural competence, and Pasteurellaceae phylogeny. J Bacteriol 2006; 188:7257-66. [PMID: 17015664 PMCID: PMC1636238 DOI: 10.1128/jb.00675-06] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The draft genome sequence of Mannheimia haemolytica A1, the causative agent of bovine respiratory disease complex (BRDC), is presented. Strain ATCC BAA-410, isolated from the lung of a calf with BRDC, was the DNA source. The annotated genome includes 2,839 coding sequences, 1,966 of which were assigned a function and 436 of which are unique to M. haemolytica. Through genome annotation many features of interest were identified, including bacteriophages and genes related to virulence, natural competence, and transcriptional regulation. In addition to previously described virulence factors, M. haemolytica encodes adhesins, including the filamentous hemagglutinin FhaB and two trimeric autotransporter adhesins. Two dual-function immunoglobulin-protease/adhesins are also present, as is a third immunoglobulin protease. Genes related to iron acquisition and drug resistance were identified and are likely important for survival in the host and virulence. Analysis of the genome indicates that M. haemolytica is naturally competent, as genes for natural competence and DNA uptake signal sequences (USS) are present. Comparison of competence loci and USS in other species in the family Pasteurellaceae indicates that M. haemolytica, Actinobacillus pleuropneumoniae, and Haemophilus ducreyi form a lineage distinct from other Pasteurellaceae. This observation was supported by a phylogenetic analysis using sequences of predicted housekeeping genes.
Collapse
Affiliation(s)
- Jason Gioia
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas 77030, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
168
|
Abstract
Natural antisense transcripts (NATs) are reverse complementary at least in part to the sequences of other endogenous sense transcripts. Most NATs are transcribed from opposite strands of their sense partners. They regulate sense genes at multiple levels and are implicated in various diseases. Using an improved whole-genome computational pipeline, we identified abundant cis-encoded exon-overlapping sense-antisense (SA) gene pairs in human (7356), mouse (6806), fly (1554), and eight other eukaryotic species (total 6534). We developed NATsDB (Natural Antisense Transcripts DataBase, http://natsdb.cbi.pku.edu.cn/) to enable efficient browsing, searching and downloading of this currently most comprehensive collection of SA genes, grouped into six classes based on their overlapping patterns. NATsDB also includes non-exon-overlapping bidirectional (NOB) genes and non-bidirectional (NBD) genes. To facilitate the study of functions, regulations and possible pathological implications, NATsDB includes extensive information about gene structures, poly(A) signals and tails, phastCons conservation, homologues in other species, repeat elements, expressed sequence tag (EST) expression profiles and OMIM disease association. NATsDB supports interactive graphical display of the alignment of all supporting EST and mRNA transcripts of the SA and NOB genes to the genomic loci. It supports advanced search by species, gene name, sequence accession number, chromosome location, coding potential, OMIM association and sequence similarity.
Collapse
Affiliation(s)
| | | | | | | | - Qing-Rong Liu
- Molecular Neurobiology Branch, National Institute on Drug Abuse-Intramural Research Program (NIDA-IRP), NIH, Department of Health and Human Services (DHHS)Box 5180, Baltimore, MD 21224, USA
| | - Liping Wei
- To whom correspondence should be addressed. Tel: +1 86 10 6276 4970; Fax: +1 86 10 6275 2438;
| |
Collapse
|
169
|
Abstract
Human tissue-specific genes were reported to be longer than housekeeping genes (both in coding and intronic parts). The competing neutralist and adaptationist models were proposed to explain this observation. Here I show that in human genome the longest are genes with the intermediate expression pattern. From the standpoint of information theory, the regulation of such genes should be most complex. In the genomewide context, they are found here to have the higher informational load on all available levels: from participation in protein interaction networks, pathways and modules reflected in Gene Ontology categories through transcription factor regulatory sets and protein functional domains to amino acid tuples (words) in encoded proteins and nucleotide tuples in introns and promoter regions. Thus, the intermediately expressed genes have the higher functional and regulatory complexity that is reflected in their greater length (which is consistent with the 'genome design' model). The dichotomy of housekeeping versus tissue-specific entities is more pronounced on the modular level than on the molecular level. There are much lesser intermediate-specific modules (modules overrepresented in the intermediately expressed genes) than housekeeping or tissue-specific modules (normalized to gene number). The dichotomy of housekeeping versus tissue-specific genes and modules in multicellular organisms is probably caused by the burden of regulatory complexity acted on the intermediately expressed genes.
Collapse
|
170
|
Storchová R, Divina P. Nonrandom representation of sex-biased genes on chicken Z chromosome. J Mol Evol 2006; 63:676-81. [PMID: 17031459 DOI: 10.1007/s00239-006-0022-1] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2006] [Accepted: 07/25/2006] [Indexed: 10/24/2022]
Abstract
Several lines of evidence suggest that the X chromosome of various animal species has an unusual complement of genes with sex-biased or sex-specific expression. However, the study of the X chromosome gene content in different organisms provided conflicting results. The most striking contrast concerns the male-biased genes, which were reported to be almost depleted from the X chromosome in Drosophila but overrepresented on the X chromosome in mammals. To elucidate the reason for these discrepancies, we analysed the gene content of the Z chromosome in chicken. Our analysis of the publicly available expressed sequence tags (EST) data and genome draft sequence revealed a significant underrepresentation of ovary-specific genes on the chicken Z chromosome. For the brain-expressed genes, we found a significant enrichment of male-biased genes but an indication of underrepresentation of female-biased genes on the Z chromosome. This is the first report on the nonrandom gene content in a homogametic sex chromosome of a species with heterogametic female individuals. Further comparison of gene contents of the independently evolved X and Z sex chromosomes may offer new insight into the evolutionary processes leading to the nonrandom genomic distribution of sex-biased and sex-specific genes.
Collapse
Affiliation(s)
- R Storchová
- Institute of Molecular Genetics, Academy of Sciences of Czech Republic and Center for Applied Genomics, Vídenská 1083, CZ-142 20, Prague 4, Czech Republic.
| | | |
Collapse
|
171
|
Abstract
In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database-specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2-DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.
Collapse
Affiliation(s)
- György Babnigg
- Protein Mapping Group, Biosceinces Division, Argonne National Laboratory, IL 60439, USA
| | | |
Collapse
|
172
|
Xu Q, Canutescu A, Obradovic Z, Dunbrack RL. ProtBuD: a database of biological unit structures of protein families and superfamilies. Bioinformatics 2006; 22:2876-82. [PMID: 17018535 DOI: 10.1093/bioinformatics/btl490] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Modeling of protein interactions is often possible from known structures of related complexes. It is often time-consuming to find the most appropriate template. Hypothesized biological units (BUs) often differ from the asymmetric units and it is usually preferable to model from the BUs. RESULTS ProtBuD is a database of BUs for all structures in the Protein Data Bank (PDB). We use both the PDBs BUs and those from the Protein Quaternary Server. ProtBuD is searchable by PDB entry, the Structural Classification of Proteins (SCOP) designation or pairs of SCOP designations. The database provides the asymmetric and BU contents of related proteins in the PDB as identified in SCOP and Position-Specific Iterated BLAST (PSI-BLAST). The asymmetric unit is different from PDB and/or Protein Quaternary Server (PQS) BUs for 52% of X-ray structures, and the PDB and PQS BUs disagree on 18% of entries. AVAILABILITY The database is provided as a standalone program and a web server from http://dunbrack.fccc.edu/ProtBuD.php.
Collapse
Affiliation(s)
- Qifang Xu
- Institute for Cancer Research, Fox Chase Cancer Center 333 Cottman Avenue, Philadelphia, PA 19111 USA
| | | | | | | |
Collapse
|
173
|
Friedman C, Borlawsky T, Shagina L, Xing HR, Lussier YA. Bio-Ontology and text: bridging the modeling gap. Bioinformatics 2006; 22:2421-9. [PMID: 16870928 PMCID: PMC2879055 DOI: 10.1093/bioinformatics/btl405] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Natural language processing (NLP) techniques are increasingly being used in biology to automate the capture of new biological discoveries in text, which are being reported at a rapid rate. Yet, information represented in NLP data structures is classically very different from information organized with ontologies as found in model organisms or genetic databases. To facilitate the computational reuse and integration of information buried in unstructured text with that of genetic databases, we propose and evaluate a translational schema that represents a comprehensive set of phenotypic and genetic entities, as well as their closely related biomedical entities and relations as expressed in natural language. In addition, the schema connects different scales of biological information, and provides mappings from the textual information to existing ontologies, which are essential in biology for integration, organization, dissemination and knowledge management of heterogeneous phenotypic information. A common comprehensive representation for otherwise heterogeneous phenotypic and genetic datasets, such as the one proposed, is critical for advancing systems biology because it enables acquisition and reuse of unprecedented volumes of diverse types of knowledge and information from text. RESULTS A novel representational schema, PGschema, was developed that enables translation of phenotypic, genetic and their closely related information found in textual narratives to a well-defined data structure comprising phenotypic and genetic concepts from established ontologies along with modifiers and relationships. Evaluation for coverage of a selected set of entities showed that 90% of the information could be represented (95% confidence interval: 86-93%; n = 268). Moreover, PGschema can be expressed automatically in an XML format using natural language techniques to process the text. To our knowledge, we are providing the first evaluation of a translational schema for NLP that contains declarative knowledge about genes and their associated biomedical data (e.g. phenotypes). AVAILABILITY http://zellig.cpmc.columbia.edu/PGschema
Collapse
Affiliation(s)
- Carol Friedman
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| | | | | | | | | |
Collapse
|
174
|
Barrett T, Edgar R. Mining microarray data at NCBI's Gene Expression Omnibus (GEO)*. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2006; 338:175-90. [PMID: 16888359 PMCID: PMC1619899 DOI: 10.1385/1-59745-097-9:175] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) has emerged as the leading fully public repository for gene expression data. This chapter describes how to use Web-based interfaces, applications, and graphics to effectively explore, visualize, and interpret the hundreds of microarray studies and millions of gene expression patterns stored in GEO. Data can be examined from both experiment-centric and gene-centric perspectives using user-friendly tools that do not require specialized expertise in microarray analysis or time-consuming download of massive data sets. The GEO database is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.
Collapse
Affiliation(s)
- Tanya Barrett
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, USA
| | | |
Collapse
|
175
|
Dolan ME, Holden CC, Beard MK, Bult CJ. Genomes as geography: using GIS technology to build interactive genome feature maps. BMC Bioinformatics 2006; 7:416. [PMID: 16984652 PMCID: PMC1599760 DOI: 10.1186/1471-2105-7-416] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2006] [Accepted: 09/19/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many commonly used genome browsers display sequence annotations and related attributes as horizontal data tracks that can be toggled on and off according to user preferences. Most genome browsers use only simple keyword searches and limit the display of detailed annotations to one chromosomal region of the genome at a time. We have employed concepts, methodologies, and tools that were developed for the display of geographic data to develop a Genome Spatial Information System (GenoSIS) for displaying genomes spatially, and interacting with genome annotations and related attribute data. In contrast to the paradigm of horizontally stacked data tracks used by most genome browsers, GenoSIS uses the concept of registered spatial layers composed of spatial objects for integrated display of diverse data. In addition to basic keyword searches, GenoSIS supports complex queries, including spatial queries, and dynamically generates genome maps. Our adaptation of the geographic information system (GIS) model in a genome context supports spatial representation of genome features at multiple scales with a versatile and expressive query capability beyond that supported by existing genome browsers. RESULTS We implemented an interactive genome sequence feature map for the mouse genome in GenoSIS, an application that uses ArcGIS, a commercially available GIS software system. The genome features and their attributes are represented as spatial objects and data layers that can be toggled on and off according to user preferences or displayed selectively in response to user queries. GenoSIS supports the generation of custom genome maps in response to complex queries about genome features based on both their attributes and locations. Our example application of GenoSIS to the mouse genome demonstrates the powerful visualization and query capability of mature GIS technology applied in a novel domain. CONCLUSION Mapping tools developed specifically for geographic data can be exploited to display, explore and interact with genome data. The approach we describe here is organism independent and is equally useful for linear and circular chromosomes. One of the unique capabilities of GenoSIS compared to existing genome browsers is the capacity to generate genome feature maps dynamically in response to complex attribute and spatial queries.
Collapse
Affiliation(s)
- Mary E Dolan
- National Center for Geographic Information and Analysis, University of Maine, Orono, ME 04469, USA
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Constance C Holden
- National Center for Geographic Information and Analysis, University of Maine, Orono, ME 04469, USA
| | - M Kate Beard
- National Center for Geographic Information and Analysis, University of Maine, Orono, ME 04469, USA
| | - Carol J Bult
- National Center for Geographic Information and Analysis, University of Maine, Orono, ME 04469, USA
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| |
Collapse
|
176
|
Kastenmayer JP, Ni L, Chu A, Kitchen LE, Au WC, Yang H, Carter CD, Wheeler D, Davis RW, Boeke JD, Snyder MA, Basrai MA. Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. Genome Res 2006; 16:365-73. [PMID: 16510898 PMCID: PMC1415214 DOI: 10.1101/gr.4355406] [Citation(s) in RCA: 157] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Genes with small open reading frames (sORFs; <100 amino acids) represent an untapped source of important biology. sORFs largely escaped analysis because they were difficult to predict computationally and less likely to be targeted by genetic screens. Thus, the substantial number of sORFs and their potential importance have only recently become clear. To investigate sORF function, we undertook the first functional studies of sORFs in any system, using the model eukaryote Saccharomyces cerevisiae. Based on independent experimental approaches and computational analyses, evidence exists for 299 sORFs in the S. cerevisiae genome, representing approximately 5% of the annotated ORFs. We determined that a similar percentage of sORFs are annotated in other eukaryotes, including humans, and 184 of the S. cerevisiae sORFs exhibit similarity with ORFs in other organisms. To investigate sORF function, we constructed a collection of gene-deletion mutants of 140 newly identified sORFs, each of which contains a strain-specific "molecular barcode," bringing the total number of sORF deletion strains to 247. Phenotypic analyses of the new gene-deletion strains identified 22 sORFs required for haploid growth, growth at high temperature, growth in the presence of a nonfermentable carbon source, or growth in the presence of DNA damage and replication-arrest agents. We provide a collection of sORF deletion strains that can be integrated into the existing deletion collection as a resource for the yeast community for elucidating gene function. Moreover, our analyses of the S. cerevisiae sORFs establish that sORFs are conserved across eukaryotes and have important biological functions.
Collapse
Affiliation(s)
- James P Kastenmayer
- Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20889, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
177
|
Shiue YL, Chen LR, Chen CF, Chen YL, Ju JP, Chao CH, Lin YP, Kuo YM, Tang PC, Lee YP. Identification of transcripts related to high egg production in the chicken hypothalamus and pituitary gland. Theriogenology 2006; 66:1274-83. [PMID: 16725186 DOI: 10.1016/j.theriogenology.2006.03.037] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2005] [Accepted: 03/05/2006] [Indexed: 11/27/2022]
Abstract
To identify transcripts related to high egg production expressed specifically in the hypothalamus and pituitary gland of the chicken, two subtracted cDNA libraries were constructed. Two divergently selected strains of Taiwan Country Chickens (TCCs), B (sire line) and L2 (dam line) were used; they had originated from a single population and were further subjected (since 1982) to selection for egg production to 40 wk of age and body weight/comb size, respectively. A total of 324 and 370 clones were identified from the L2-B (L2-subtract-B) and the B-L2 subtracted cDNA libraries, respectively. After sequencing and annotation, 175 and 136 transcripts that represented 53 known and 65 unknown non-redundant sequences were characterized in the L2-B subtracted cDNA library. Quantitative reverse-transcription (RT)-PCR was used to screen the mRNA expression levels of 32 randomly selected transcripts in another 78 laying hens from five different strains. These strains included the two original strains (B and L2) used to construct the subtracted cDNA libraries and an additional three commercial strains, i.e., Black- and Red-feather TCCs and Single-Comb White Leghorn (WL) layer. The mRNA expression levels of 16 transcripts were significantly higher in the L2 than in the B strain, whereas the mRNA expression levels of nine transcripts, BDH, NCAM1, PCDHA@, PGDS, PLAG1, PRL, SAR1A, SCG2 and STMN2, were significantly higher in two high egg production strains, L2 and Single-Comb WL; this indicated their usefulness as molecular markers of high egg production.
Collapse
Affiliation(s)
- Yow-Ling Shiue
- Institute of Biomedical Science, National Sun Yat-sen University, Kaohsiung, Taiwan
| | | | | | | | | | | | | | | | | | | |
Collapse
|
178
|
Ko WY, Piao S, Akashi H. Strong regional heterogeneity in base composition evolution on the Drosophila X chromosome. Genetics 2006; 174:349-62. [PMID: 16547109 PMCID: PMC1569809 DOI: 10.1534/genetics.105.054346] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2005] [Accepted: 05/08/2006] [Indexed: 11/18/2022] Open
Abstract
Fluctuations in base composition appear to be prevalent in Drosophila and mammal genome evolution, but their timescale, genomic breadth, and causes remain obscure. Here, we study base composition evolution within the X chromosomes of Drosophila melanogaster and five of its close relatives. Substitutions were inferred on six extant and two ancestral lineages for 14 near-telomeric and 9 nontelomeric genes. GC content evolution is highly variable both within the genome and within the phylogenetic tree. In the lineages leading to D. yakuba and D. orena, GC content at silent sites has increased rapidly near telomeres, but has decreased in more proximal (nontelomeric) regions. D. orena shows a 17-fold excess of GC-increasing vs. AT-increasing synonymous changes within a small (approximately 130-kb) region close to the telomeric end. Base composition changes within introns are consistent with changes in mutation patterns, but stronger GC elevation at synonymous sites suggests contributions of natural selection or biased gene conversion. The Drosophila yakuba lineage shows a less extreme elevation of GC content distributed over a wider genetic region (approximately 1.2 Mb). A lack of change in GC content for most introns within this region suggests a role of natural selection in localized base composition fluctuations.
Collapse
Affiliation(s)
- Wen-Ya Ko
- Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | | | | |
Collapse
|
179
|
Penkett CJ, Morris JA, Wood V, Bähler J. YOGY: a web-based, integrated database to retrieve protein orthologs and associated Gene Ontology terms. Nucleic Acids Res 2006; 34:W330-4. [PMID: 16845020 PMCID: PMC1538793 DOI: 10.1093/nar/gkl311] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We present YOGY a web-based resource for orthologous proteins from nine eukaryotic organisms: Homo sapiens, Mus musculus, Rattus norvegicus, Arabidopsis thaliana, Drosophila melanogaster, Caenorhabditis elegans, Plasmodium falciparum, Schizosaccharomyces pombe and Saccharomyces cerevisiae. Using a gene name from any of these organisms as a query, this database provides comprehensive, combined information on orthologs in other species using data from five independent resources: KOGs, Inparanoid, HomoloGene, OrthoMCL and a table of curated fission and budding yeast orthologs. Associated Gene Ontology (GO) terms of orthologs can also be retrieved for functional inference. Integrating these different and complementary datasets provides a straightforward tool to identify known and predicted orthologs of proteins from a variety of species. This resource should be useful for bench scientists looking for functional clues for their genes of interest as well as for curators looking for information that can be transferred based on orthology and for rapidly identifying the relevant GO terms as an aid to literature curation. YOGY is accessible online at http://www.sanger.ac.uk/PostGenomics/S_pombe/YOGY/.
Collapse
Affiliation(s)
| | | | | | - Jürg Bähler
- To whom correspondence should be addressed. Tel: +44 0 1223 496948; Fax: +44 0 1223 496802;
| |
Collapse
|
180
|
Pollard DA, Iyer VN, Moses AM, Eisen MB. Widespread discordance of gene trees with species tree in Drosophila: evidence for incomplete lineage sorting. PLoS Genet 2006; 2:e173. [PMID: 17132051 PMCID: PMC1626107 DOI: 10.1371/journal.pgen.0020173] [Citation(s) in RCA: 254] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2005] [Accepted: 08/28/2006] [Indexed: 11/19/2022] Open
Abstract
The phylogenetic relationship of the now fully sequenced species Drosophila erecta and D. yakuba with respect to the D. melanogaster species complex has been a subject of controversy. All three possible groupings of the species have been reported in the past, though recent multi-gene studies suggest that D. erecta and D. yakuba are sister species. Using the whole genomes of each of these species as well as the four other fully sequenced species in the subgenus Sophophora, we set out to investigate the placement of D. erecta and D. yakuba in the D. melanogaster species group and to understand the cause of the past incongruence. Though we find that the phylogeny grouping D. erecta and D. yakuba together is the best supported, we also find widespread incongruence in nucleotide and amino acid substitutions, insertions and deletions, and gene trees. The time inferred to span the two key speciation events is short enough that under the coalescent model, the incongruence could be the result of incomplete lineage sorting. Consistent with the lineage-sorting hypothesis, substitutions supporting the same tree were spatially clustered. Support for the different trees was found to be linked to recombination such that adjacent genes support the same tree most often in regions of low recombination and substitutions supporting the same tree are most enriched roughly on the same scale as linkage disequilibrium, also consistent with lineage sorting. The incongruence was found to be statistically significant and robust to model and species choice. No systematic biases were found. We conclude that phylogenetic incongruence in the D. melanogaster species complex is the result, at least in part, of incomplete lineage sorting. Incomplete lineage sorting will likely cause phylogenetic incongruence in many comparative genomics datasets. Methods to infer the correct species tree, the history of every base in the genome, and comparative methods that control for and/or utilize this information will be valuable advancements for the field of comparative genomics. To take full advantage of the growing number of genome sequences from different organisms, it is necessary to understand the evolutionary relationships (phylogeny) between organisms. Unfortunately, phylogenies inferred from individual genes often conflict, reflecting either poor inferences or real variation in the history of genes. In this study, the authors examine relationships within the Drosophila melanogaster species subgroup, a group of flies with three fully sequenced species in which phylogeny has been a source of controversy. Although the bulk of the data support a phylogeny with Drosophila melanogaster as an outgroup to sister species Drosophila erecta and Drosophila yakuba, large portions of their genes support alternative phylogenies. According to the authors, the most plausible explanation for these observations is that polymorphisms in the ancestral population were maintained during the two rapid speciation events that led to these species. Subsequent to speciation, polymorphisms were randomly fixed in each species, and in some cases non-sister species fixed the same ancestral polymorphisms, while sister species did not. In these cases the genes are correctly inferred to have conflicting phylogenies. The authors note that rapid speciation events will often lead to such conflict, which needs to be accounted for in evolutionary analyses.
Collapse
Affiliation(s)
- Daniel A Pollard
- Graduate Group in Biophysics, University of California Berkeley, Berkeley, California, United States of America
| | - Venky N Iyer
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
| | - Alan M Moses
- Graduate Group in Biophysics, University of California Berkeley, Berkeley, California, United States of America
| | - Michael B Eisen
- Graduate Group in Biophysics, University of California Berkeley, Berkeley, California, United States of America
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
- Department of Genome Sciences, Genomics Division, Ernest Orlando Lawrence Berkeley National Lab, Berkeley, California, United States of America
- Center for Integrative Genomics, University of California Berkeley, Berkeley, California, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
181
|
Liu BA, Jablonowski K, Raina M, Arcé M, Pawson T, Nash PD. The human and mouse complement of SH2 domain proteins-establishing the boundaries of phosphotyrosine signaling. Mol Cell 2006; 22:851-868. [PMID: 16793553 DOI: 10.1016/j.molcel.2006.06.001] [Citation(s) in RCA: 222] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2006] [Revised: 05/19/2006] [Accepted: 06/02/2006] [Indexed: 01/07/2023]
Abstract
SH2 domains are interaction modules uniquely dedicated to the recognition of phosphotyrosine sites and are embedded in proteins that couple protein-tyrosine kinases to intracellular signaling pathways. Here, we report a comprehensive bioinformatics, structural, and functional view of the human and mouse complement of SH2 domain proteins. This information delimits the set of SH2-containing effectors available for PTK signaling and will facilitate the systems-level analysis of pTyr-dependent protein-protein interactions and PTK-mediated signal transduction. The domain-based architecture of SH2-containing proteins is of more general relevance for understanding the large family of protein interaction domains and the modular organization of the majority of human proteins.
Collapse
Affiliation(s)
- Bernard A Liu
- Ben May Institute for Cancer Research and the Committee on Cancer Biology, The University of Chicago, Chicago, Illinois 60637
| | - Karl Jablonowski
- Ben May Institute for Cancer Research and the Committee on Cancer Biology, The University of Chicago, Chicago, Illinois 60637
| | - Monica Raina
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto M5G 1X5, Canada
| | - Michael Arcé
- Ben May Institute for Cancer Research and the Committee on Cancer Biology, The University of Chicago, Chicago, Illinois 60637
| | - Tony Pawson
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto M5G 1X5, Canada.
| | - Piers D Nash
- Ben May Institute for Cancer Research and the Committee on Cancer Biology, The University of Chicago, Chicago, Illinois 60637.
| |
Collapse
|
182
|
Vider-Shalit T, Raffaeli S, Louzoun Y. Virus-epitope vaccine design: informatic matching the HLA-I polymorphism to the virus genome. Mol Immunol 2006; 44:1253-61. [PMID: 16930710 DOI: 10.1016/j.molimm.2006.06.003] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2006] [Revised: 06/07/2006] [Accepted: 06/08/2006] [Indexed: 12/01/2022]
Abstract
Attempts to develop peptide vaccines, based on a limited number of peptides face two problems: HLA polymorphism and the high mutation rate of viral epitopes. We have developed a new genomic method that ensures maximal coverage and thus maximal applicability of the peptide vaccine. The same method also promises a large number of epitopes per HLA to prevent escape via mutations. Our design can be applied swiftly in order to face rapidly emerging viral diseases. We use a genomic scan of all candidate peptides and join them optimally. For a given virus, we use algorithms computing: peptide cleavage probability, transfer through TAP and MHC binding for a large number of HLA alleles. The resulting peptide libraries are pruned for peptides that are not conserved or are too similar to self peptides. We then use a genetic algorithm to produce an optimal protein composed of peptides from this list properly ordered for cleavage. The selected peptides represent an optimal combination to cover all HLA alleles and all viral proteins. We have applied this method to HCV and found that some HCV proteins (mainly envelope proteins) represent much less peptide than expected. A more detailed analysis of the peptide variability shows a balance between the attempts of the immune system to detect less mutating peptides, and the attempts of viruses to mutate peptides and avoid detection by the immune system. In order to show the applicability of our method, we have further used it on HIV-I, Influenza H3N2 and the Avian Flu Viruses.
Collapse
|
183
|
Chen LR, Chao CH, Chen CF, Lee YP, Chen YL, Shiue YL. Expression of 25 high egg production related transcripts that identified from hypothalamus and pituitary gland in red-feather Taiwan country chickens. Anim Reprod Sci 2006; 100:172-85. [PMID: 16919900 DOI: 10.1016/j.anireprosci.2006.07.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2006] [Revised: 06/19/2006] [Accepted: 07/07/2006] [Indexed: 10/24/2022]
Abstract
Expression levels of 33 high egg production candidate transcripts in Red-feather Taiwan country chickens (TCCs) were examined by quantitative reverse-transcription (RT) polymerase chain reactions (PCR) in this study. Candidate transcripts were previously identified from a L2-B (L2-subtract-B) hypothalamus/pituitary gland subtractive cDNA library. In this subtractive cDNA library, two divergently selected strains of TCCs, B and L2 were used. These two strains were originated from one single population and were further subjected (since 1982) to the selections of body weight/comb size (B) and eggs to 40wk of age (L2), respectively. Hypothalamuses and pituitary glands that sampled from Red-feather TCCs were previously grouped into high (Red-high; n=20) and low (Red-low; n=20) egg productions based on the rate of lay after 1st egg (hen-day laying rate; %). Rates of lay after 1st egg (mean+/-S.E.) in the Red-high and the Red-low subpopulations were 72.2+/-0.6 and 23.0+/-3.5, respectively (P<0.01). Quantitative RT-PCR validated that 25 candidate transcripts were significantly higher expressed in the Red-high than in the Red-low hens. These transcripts were ANP32A, BDH, CDC42, CNTN1, COMT, CPE, CTNNB1, DIO2, EIF4E, GARNL1, HSPCA, LAPTM4B, MBP, NAP1L4, NCAM1, PARK7, PCDHA@, PGDS, PLAG1, PRL, RAD21, SAR1A, SCG2, STMN1 and UFM1. Among these transcripts, 15 (79.0%), 13 (68.4%), and 12 (63.2%) genes were annotated to involve in cellular physiological process (GO:0050875), metabolism (GO:0008152) and cell communication (GO:0007154). Identified transcripts that related to high egg production are most active in focal adhesion, adherens junction, MAPK signaling, tight junction and cell adhesion pathways.
Collapse
Affiliation(s)
- Lih-Ren Chen
- Division of Physiology, Livestock Research Institute, Council of Agriculture, Tainan, Taiwan
| | | | | | | | | | | |
Collapse
|
184
|
Lenzi L, Frabetti F, Facchin F, Casadei R, Vitale L, Canaider S, Carinci P, Zannotti M, Strippoli P. UniGene Tabulator: a full parser for the UniGene format. Bioinformatics 2006; 22:2570-1. [PMID: 16895929 DOI: 10.1093/bioinformatics/btl425] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED UniGene Tabulator 1.0 provides a solution for full parsing of UniGene flat file format; it implements a structured graphical representation of each data field present in UniGene following import into a common database managing system usable in a personal computer. This database includes related tables for sequence, protein similarity, sequence-tagged site (STS) and transcript map interval (TXMAP) data, plus a summary table where each record represents a UniGene cluster. UniGene Tabulator enables full local management of UniGene data, allowing parsing, querying, indexing, retrieving, exporting and analysis of UniGene data in a relational database form, usable on Macintosh (OS X 10.3.9 or later) and Windows (2000, with service pack 4, XP, with service pack 2 or later) operating systems-based computers. AVAILABILITY The current release, including both the FileMaker runtime applications, is freely available at http://apollo11.isto.unibo.it/software/
Collapse
Affiliation(s)
- Luca Lenzi
- Department of Histology, Embryology and Applied Biology University of Bologna, 40126 Bologna, Italy
| | | | | | | | | | | | | | | | | |
Collapse
|
185
|
Brown JT, Lahey C, Laosinchai-Wolf W, Hadd AG. Polymorphisms in the glucocerebrosidase gene and pseudogene urge caution in clinical analysis of Gaucher disease allele c.1448T>C (L444P). BMC MEDICAL GENETICS 2006; 7:69. [PMID: 16887033 PMCID: PMC1559599 DOI: 10.1186/1471-2350-7-69] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/17/2006] [Accepted: 08/03/2006] [Indexed: 11/24/2022]
Abstract
Background Gaucher disease is a potentially severe lysosomal storage disorder caused by mutations in the human glucocerebrosidase gene (GBA). We have developed a multiplexed genetic assay for eight diseases prevalent in the Ashkenazi population: Tay-Sachs, Gaucher type I, Niemann-Pick types A and B, mucolipidosis type IV, familial dysautonomia, Canavan, Bloom syndrome, and Fanconi anemia type C. This assay includes an allelic determination for GBA allele c.1448T>C (L444P). The goal of this study was to clinically evaluate this assay. Methods Biotinylated, multiplex PCR products were directly hybridized to capture probes immobilized on fluorescently addressed microspheres. After incubation with streptavidin-conjugated fluorophore, the reactions were analyzed by Luminex IS100. Clinical evaluations were conducted using de-identified patient DNA samples. Results We evaluated a multiplexed suspension array assay that includes wild-type and mutant genetic determinations for Gaucher disease allele c.1448T>C. Two percent of samples reported to be wild-type by conventional methods were observed to be c.1448T>C heterozygous using our assay. Sequence analysis suggested that this phenomenon was due to co-amplification of the functional gene and a paralogous pseudogene (ΨGBA) due to a polymorphism in the primer-binding site of the latter. Primers for the amplification of this allele were then repositioned to span an upstream deletion in the pseudogene, yielding a much longer amplicon. Although it is widely reported that long amplicons negatively impact amplification or detection efficiency in recently adopted multiplex techniques, this assay design functioned properly and resolved the occurrence of false heterozygosity. Conclusion Although previously available sequence information suggested GBA gene/pseudogene discrimination capabilities with a short amplified product, we identified common single-nucleotide polymorphisms in the pseudogene that required amplification of a larger region for effective discrimination.
Collapse
|
186
|
Bult CJ. From information to understanding: the role of model organism databases in comparative and functional genomics. Anim Genet 2006; 37 Suppl 1:28-40. [PMID: 16887000 DOI: 10.1111/j.1365-2052.2006.01475.x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Data integration is key to functional and comparative genomics because integration allows diverse data types to be evaluated in new contexts. To achieve data integration in a scalable and sensible way, semantic standards are needed, both for naming things (standardized nomenclatures, use of key words) and also for knowledge representation. The Mouse Genome Informatics database and other model organism databases help to close the gap between information and understanding of biological processes because these resources enforce well-defined nomenclature and knowledge representation standards. Model organism databases have a critical role to play in ensuring that diverse kinds of data, especially genome-scale data sets and information, remain useful to the biological community in the long-term. The efforts of model organism database groups ensure not only that organism-specific data are integrated, curated and accessible but also that the information is structured in such a way that comparison of biological knowledge across model organisms is facilitated.
Collapse
Affiliation(s)
- C J Bult
- The Jackson Laboratory, Bar Harbor, ME 04609, USA.
| |
Collapse
|
187
|
Abstract
The sequence of the human genome provides a scaffold on which numerous annotations, such the locations of genes, can be laid. Genome browsers have been created to allow the simultaneous display of multiple annotations within a graphical interface. In addition, they provide the ability to search for markers and sequences, to extract annotations for specific regions or for the whole genome and to act as a central starting point for genomic research. This review describes the basic functionality of genome browsers and compares three of them: the University of California Santa Cruz (UCSC) Genome Browser, the Ensembl Genome Browser and the NCBI MapViewer.
Collapse
Affiliation(s)
- Terrence S Furey
- Institute for Genome Sciences and Policy, Duke University, Box 3382, Durham, NC 27708, USA.
| |
Collapse
|
188
|
Jiang C, Zhao Z. Mutational spectrum in the recent human genome inferred by single nucleotide polymorphisms. Genomics 2006; 88:527-34. [PMID: 16860534 DOI: 10.1016/j.ygeno.2006.06.003] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2006] [Revised: 06/01/2006] [Accepted: 06/06/2006] [Indexed: 01/09/2023]
Abstract
So far, there is no genome-wide estimation of the mutational spectrum in humans. In this study, we systematically examined the directionality of the point mutations and maintenance of GC content in the human genome using approximately 1.8 million high-quality human single nucleotide polymorphisms and their ancestral sequences in chimpanzees. The frequency of C-->T (G-->A) changes was the highest among all mutation types and the frequency of each type of transition was approximately fourfold that of each type of transversion. In intergenic regions, when the GC content increased, the frequency of changes from G or C increased. In exons, the frequency of G:C-->A:T was the highest among the genomic categories and contributed mainly by the frequent mutations at the CpG sites. In contrast, mutations at the CpG sites, or CpG-->TpG/CpA mutations, occurred less frequently in the CpG islands relative to intergenic regions with similar GC content. Our results suggest that the GC content is overall not in equilibrium in the human genome, with a trend toward shifting the human genome to be AT rich and shifting the GC content of a region to approach the genome average. Our results, which differ from previous estimates based on limited loci or on the rodent lineage, provide the first representative and reliable mutational spectrum in the recent human genome and categorized genomic regions.
Collapse
Affiliation(s)
- Cizhong Jiang
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23298-0126, USA
| | | |
Collapse
|
189
|
Zhang Y, Liu XS, Liu QR, Wei L. Genome-wide in silico identification and analysis of cis natural antisense transcripts (cis-NATs) in ten species. Nucleic Acids Res 2006; 34:3465-75. [PMID: 16849434 PMCID: PMC1524920 DOI: 10.1093/nar/gkl473] [Citation(s) in RCA: 134] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
We developed a fast, integrative pipeline to identify cis natural antisense transcripts (cis-NATs) at genome scale. The pipeline mapped mRNAs and ESTs in UniGene to genome sequences in GoldenPath to find overlapping transcripts and combining information from coding sequence, poly(A) signal, poly(A) tail and splicing sites to deduce transcription orientation. We identified cis-NATs in 10 eukaryotic species, including 7830 candidate sense–antisense (SA) genes in 3915 SA pairs in human. The abundance of SA genes is remarkably low in worm and does not seem to be caused by the prevalence of operons. Hundreds of SA pairs are conserved across different species, even maintaining the same overlapping patterns. The convergent SA class is prevalent in fly, worm and sea squirt, but not in human or mouse as reported previously. The percentage of SA genes among imprinted genes in human and mouse is 24–47%, a range between the two previous reports. There is significant shortage of SA genes on Chromosome X in human and mouse but not in fly or worm, supporting X-inactivation in mammals as a possible cause. SA genes are over-represented in the catalytic activities and basic metabolism functions. All candidate cis-NATs can be downloaded from .
Collapse
Affiliation(s)
| | - X. Shirley Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard School of Public Health44 Binney Street, M1B22, Boston, MA 02115, USA
| | - Qing-Rong Liu
- Molecular Neurobiology Branch, National Institute on Drug Abuse-Intramural Research Program (NIDA-IRP), NIH, Department of Health and Human Services (DHHS)Box 5180, Baltimore, MD 21224, USA
| | - Liping Wei
- To whom correspondence should be addressed. Tel: +86 10 6276 4970; Fax: +86 10 6275 2438;
| |
Collapse
|
190
|
Xiang Z, Zheng W, He Y. BBP: Brucella genome annotation with literature mining and curation. BMC Bioinformatics 2006; 7:347. [PMID: 16842628 PMCID: PMC1539029 DOI: 10.1186/1471-2105-7-347] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2006] [Accepted: 07/16/2006] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Brucella species are Gram-negative, facultative intracellular bacteria that cause brucellosis in humans and animals. Sequences of four Brucella genomes have been published, and various Brucella gene and genome data and analysis resources exist. A web gateway to integrate these resources will greatly facilitate Brucella research. Brucella genome data in current databases is largely derived from computational analysis without experimental validation typically found in peer-reviewed publications. It is partially due to the lack of a literature mining and curation system able to efficiently incorporate the large amount of literature data into genome annotation. It is further hypothesized that literature-based Brucella gene annotation would increase understanding of complicated Brucella pathogenesis mechanisms. RESULTS The Brucella Bioinformatics Portal (BBP) is developed to integrate existing Brucella genome data and analysis tools with literature mining and curation. The BBP InterBru database and Brucella Genome Browser allow users to search and analyze genes of 4 currently available Brucella genomes and link to more than 20 existing databases and analysis programs. Brucella literature publications in PubMed are extracted and can be searched by a TextPresso-powered natural language processing method, a MeSH browser, a keywords search, and an automatic literature update service. To efficiently annotate Brucella genes using the large amount of literature publications, a literature mining and curation system coined Limix is developed to integrate computational literature mining methods with a PubSearch-powered manual curation and management system. The Limix system is used to quickly find and confirm 107 Brucella gene mutations including 75 genes shown to be essential for Brucella virulence. The 75 genes are further clustered using COG. In addition, 62 Brucella genetic interactions are extracted from literature publications. These results make possible more comprehensive investigation of Brucella pathogenesis. Other BBP features include publication email alert service, Brucella researchers' contact database, and discussion forum. CONCLUSION BBP is a gateway for Brucella researchers to search, analyze, and curate Brucella genome data originated from public databases and literature. Brucella gene mutations and genetic interactions are annotated using Limix leading to better understanding of Brucella pathogenesis.
Collapse
Affiliation(s)
- Zuoshuang Xiang
- Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | | | - Yongqun He
- Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Bioinformatics Program, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
191
|
Walraven JM, Doll MA, Hein DW. Identification and Characterization of Functional Rat Arylamine N-Acetyltransferase 3: Comparisons with Rat Arylamine N-Acetyltransferases 1 and 2. J Pharmacol Exp Ther 2006; 319:369-75. [PMID: 16829624 DOI: 10.1124/jpet.106.108399] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Arylamine N-acetyltransferases (NATs; EC 2.3.1.5) catalyze both the N-acetylation and O-acetylation of arylamines and N-hydroxyarylamines. Humans possess two functional N-acetyltransferase genes, NAT1 and NAT2, as well as a nonfunctional pseudogene, NATP. Previous studies have identified Nat1 and Nat2 genes in the rat. In this study, we identified and characterized a third rat N-acetyltransferase gene (Nat3) consisting of a single open reading frame of 870 base pairs encoding a 290-amino acid protein, analogous to the previously identified human and rat N-acetyltransferase genes. Rat Nat3 nucleotide sequence was 77.2 and 75.9% identical to human NAT1 and NAT2, respectively. Rat Nat3 amino acid sequence was 68.6 and 67.2% identical to human NAT1 and NAT2, respectively. Rat Nat1, Nat2, and Nat3 were each cloned and recombinantly expressed in Escherichia coli. Recombinant rat Nat3 exhibited thermostability intermediate between recombinant rat Nat1 and Nat2. Recombinant rat Nat3 was functional and catalyzed the N-acetylation of several arylamine substrates, including 3-ethylaniline, 3,5-dimethylaniline, 5-aminosalicylic acid, 4-aminobiphenyl, 4,4'-methylenedianiline, 4,4'-methylenebis(2-chloroaniline), and 2-aminofluorene, and the O-acetylation of N-hydroxy-4-aminobiphenyl. The relative affinities of arylamine carcinogens such as 4-aminobiphenyl, N-hydroxy-4-aminobiphenyl, and 2-aminofluorene for N- and O-acetylation via recombinant rat Nat3 were comparable with recombinant rat Nat1 and higher than for recombinant rat Nat2. This study is the first to report a third arylamine N-acetyltransferase isozyme with significant functional capacity.
Collapse
Affiliation(s)
- Jason M Walraven
- Department of Pharmacology and Toxicology, University of Louisville School of Medicine, Louisville, KY 40292, USA
| | | | | |
Collapse
|
192
|
Hecht J, Kuhl H, Haas SA, Bauer S, Poustka AJ, Lienau J, Schell H, Stiege AC, Seitz V, Reinhardt R, Duda GN, Mundlos S, Robinson PN. Gene identification and analysis of transcripts differentially regulated in fracture healing by EST sequencing in the domestic sheep. BMC Genomics 2006; 7:172. [PMID: 16822315 PMCID: PMC1578570 DOI: 10.1186/1471-2164-7-172] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2006] [Accepted: 07/05/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The sheep is an important model animal for testing novel fracture treatments and other medical applications. Despite these medical uses and the well known economic and cultural importance of the sheep, relatively little research has been performed into sheep genetics, and DNA sequences are available for only a small number of sheep genes. RESULTS In this work we have sequenced over 47 thousand expressed sequence tags (ESTs) from libraries developed from healing bone in a sheep model of fracture healing. These ESTs were clustered with the previously available 10 thousand sheep ESTs to a total of 19087 contigs with an average length of 603 nucleotides. We used the newly identified sequences to develop RT-PCR assays for 78 sheep genes and measured differential expression during the course of fracture healing between days 7 and 42 postfracture. All genes showed significant shifts at one or more time points. 23 of the genes were differentially expressed between postfracture days 7 and 10, which could reflect an important role for these genes for the initiation of osteogenesis. CONCLUSION The sequences we have identified in this work are a valuable resource for future studies on musculoskeletal healing and regeneration using sheep and represent an important head-start for genomic sequencing projects for Ovis aries, with partial or complete sequences being made available for over 5,800 previously unsequenced sheep genes.
Collapse
Affiliation(s)
- Jochen Hecht
- Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
193
|
Fire A, Alcazar R, Tan F. Unusual DNA structures associated with germline genetic activity in Caenorhabditis elegans. Genetics 2006; 173:1259-73. [PMID: 16648589 PMCID: PMC1526662 DOI: 10.1534/genetics.106.057364] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2006] [Accepted: 04/21/2006] [Indexed: 11/18/2022] Open
Abstract
We describe a surprising long-range periodicity that underlies a substantial fraction of C. elegans genomic sequence. Extended segments (up to several hundred nucleotides) of the C. elegans genome show a strong bias toward occurrence of AA/TT dinucleotides along one face of the helix while little or no such constraint is evident on the opposite helical face. Segments with this characteristic periodicity are highly overrepresented in intron sequences and are associated with a large fraction of genes with known germline expression in C. elegans. In addition to altering the path and flexibility of DNA in vitro, sequences of this character have been shown by others to constrain DNA::nucleosome interactions, potentially producing a structure that could resist the assembly of highly ordered (phased) nucleosome arrays that have been proposed as a precursor to heterochromatin. We propose a number of ways that the periodic occurrence of An/Tn clusters could reflect evolution and function of genes that express in the germ cell lineage of C. elegans.
Collapse
Affiliation(s)
- Andrew Fire
- Department of Pathology, Stanford University School of Medicine, Stanford, California 94305-5324, USA.
| | | | | |
Collapse
|
194
|
Perco P, Rapberger R, Siehs C, Lukas A, Oberbauer R, Mayer G, Mayer B. Transforming omics data into context: Bioinformatics on genomics and proteomics raw data. Electrophoresis 2006; 27:2659-75. [PMID: 16739231 DOI: 10.1002/elps.200600064] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Differential gene expression analysis and proteomics have exerted significant impact on the elucidation of concerted cellular processes, as simultaneous measurement of hundreds to thousands of individual objects on the level of RNA and protein ensembles became technically feasible. The availability of such data sets has promised a profound understanding of phenomena on an aggregate level, expressed as the phenotypic response (observables) of cells, e.g., in the presence of drugs, or characterization of cells and tissue displaying distinct patho-physiological states. However, the step of transforming these data into context, i.e., linking distinct expression or abundance patterns with phenotypic observables - and furthermore enabling a sound biological interpretation on the level of reaction networks and concerted pathways, is still a major shortcoming. This finding is certainly based on the enormous complexity embedded in cellular reaction networks, but a variety of computational approaches have been developed over the last few years to overcome these issues. This review provides an overview on computational procedures for analysis of genomic and proteomic data introducing a sequential analysis workflow: Explorative statistics for deriving a first, from the purely statistical viewpoint, relevant candidate gene/protein list, followed by co-regulation and network analysis to biologically expand this core list toward functional networks and pathways. The review on these procedures is complemented by example applications tailored at identification of disease-associated proteins. Optimization of computational procedures involved, in conjunction with the continuous increase in additional biological data, clearly has the potential of boosting our understanding of processes on a cell-wide level.
Collapse
Affiliation(s)
- Paul Perco
- Department of Nephrology, Medical University of Vienna, Austria
| | | | | | | | | | | | | |
Collapse
|
195
|
Liu H, Kho AT, Kohane IS, Sun Y. Predicting survival within the lung cancer histopathological hierarchy using a multi-scale genomic model of development. PLoS Med 2006; 3:e232. [PMID: 16800721 PMCID: PMC1483910 DOI: 10.1371/journal.pmed.0030232] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/07/2005] [Accepted: 03/02/2006] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND The histopathologic heterogeneity of lung cancer remains a significant confounding factor in its diagnosis and prognosis-spurring numerous recent efforts to find a molecular classification of the disease that has clinical relevance. METHODS AND FINDINGS Molecular profiles of tumors from 186 patients representing four different lung cancer subtypes (and 17 normal lung tissue samples) were compared with a mouse lung development model using principal component analysis in both temporal and genomic domains. An algorithm for the classification of lung cancers using a multi-scale developmental framework was developed. Kaplan-Meier survival analysis was conducted for lung adenocarcinoma patient subgroups identified via their developmental association. We found multi-scale genomic similarities between four human lung cancer subtypes and the developing mouse lung that are prognostically meaningful. Significant association was observed between the localization of human lung cancer cases along the principal mouse lung development trajectory and the corresponding patient survival rate at three distinct levels of classical histopathologic resolution: among different lung cancer subtypes, among patients within the adenocarcinoma subtype, and within the stage I adenocarcinoma subclass. The earlier the genomic association between a human tumor profile and the mouse lung development sequence, the poorer the patient's prognosis. Furthermore, decomposing this principal lung development trajectory identified a gene set that was significantly enriched for pyrimidine metabolism and cell-adhesion functions specific to lung development and oncogenesis. CONCLUSIONS From a multi-scale disease modeling perspective, the molecular dynamics of murine lung development provide an effective framework that is not only data driven but also informed by the biology of development for elucidating the mechanisms of human lung cancer biology and its clinical outcome.
Collapse
MESH Headings
- Adenocarcinoma/chemistry
- Adenocarcinoma/classification
- Adenocarcinoma/genetics
- Adenocarcinoma/mortality
- Adenocarcinoma/pathology
- Algorithms
- Animals
- Carcinoid Tumor/chemistry
- Carcinoid Tumor/genetics
- Carcinoid Tumor/mortality
- Carcinoid Tumor/pathology
- Carcinoma, Non-Small-Cell Lung/chemistry
- Carcinoma, Non-Small-Cell Lung/genetics
- Carcinoma, Non-Small-Cell Lung/mortality
- Carcinoma, Non-Small-Cell Lung/pathology
- Carcinoma, Small Cell/chemistry
- Carcinoma, Small Cell/genetics
- Carcinoma, Small Cell/mortality
- Carcinoma, Small Cell/pathology
- Cell Adhesion/genetics
- Cell Transformation, Neoplastic/genetics
- Gene Expression Profiling
- Gene Expression Regulation, Developmental
- Gene Expression Regulation, Neoplastic
- Genes, cdc
- Genomics
- Humans
- Kaplan-Meier Estimate
- Lung/chemistry
- Lung/embryology
- Lung/growth & development
- Lung Neoplasms/chemistry
- Lung Neoplasms/classification
- Lung Neoplasms/genetics
- Lung Neoplasms/mortality
- Lung Neoplasms/pathology
- Mice
- Models, Biological
- Neoplasm Metastasis/genetics
- Neoplasm Staging
- Prognosis
- Pyrimidines/metabolism
- RNA, Messenger/biosynthesis
- RNA, Messenger/genetics
- RNA, Neoplasm/biosynthesis
- RNA, Neoplasm/genetics
- Species Specificity
Collapse
Affiliation(s)
- Hongye Liu
- Children's Hospital Informatics Program, Children's Hospital Boston, Boston, Massachusetts, United States of America.
| | | | | | | |
Collapse
|
196
|
Messersmith DJ, Benson DA, Geer RC. A Web-based assessment of bioinformatics end-user support services at US universities. J Med Libr Assoc 2006; 94:299-305, E156-87. [PMID: 16888663 PMCID: PMC1525314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023] Open
Abstract
OBJECTIVES This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. METHODS Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. RESULTS Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. CONCLUSIONS This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group.
Collapse
Affiliation(s)
| | - Dennis A. Benson
- National Center for Biotechnology Information, National Library of Medicine, 8600 Rockville Pike, Building 38A, Room 3N307, Bethesda, Maryland 20894
| | - Renata C. Geer
- National Center for Biotechnology Information, National Library of Medicine, 8600 Rockville Pike, Building 38A, Room 35314, Bethesda, Maryland 20894
| |
Collapse
|
197
|
Shin JH, Krapfenbauer K, Lubec G. Large-scale identification of cytosolic mouse brain proteins by chromatographic prefractionation. Electrophoresis 2006; 27:2799-813. [PMID: 16739224 DOI: 10.1002/elps.200500804] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Proteomic studies on mouse brain protein expression are still holding center stage as the generation of a reference database for the brain proteome, a need for designing expressional studies at the protein level. We therefore decided to extend the amount of identified brain proteins by the use of prefractionation. In order to reduce the complexity of mouse brain proteome we applied chromatographic prefractionations, ion-exchange and hydrophobic interaction chromatography, prior to 2-DE, followed by mass spectrometric identification (2-DE MALDI-MS). We analyzed about 17,000 protein spots in cytosolic fractions of mouse brain and identified about 10,000 spots. A total of 1841 proteins showing different pI or M(r), representing probably post-translational modifications or splice variants, were products of 789 different genes. Numerous proteins were clearly identified as metabolic, antioxidant, cytoskeleton, signaling, transcription/translation, nucleic acid-binding, proteolysis-related proteins. We additionally provided evidence for the existence of hypothetical proteins predicted from nucleic acid sequences. Moreover, observed pIs of proteins are listed thus enabling localization of proteins in a gel, information that cannot be obtained from theoretical pI's in databases. The results represent so far the largest database of mouse brain proteins and provide valuable information for the design of proteomic studies in the mouse.
Collapse
Affiliation(s)
- Joo-Ho Shin
- Department of Pediatrics, Medical University of Vienna, Austria
| | | | | |
Collapse
|
198
|
Grimes GR, Wen TQ, Mewissen M, Baxter RM, Moodie S, Beattie JS, Ghazal P. PDQ Wizard: automated prioritization and characterization of gene and protein lists using biomedical literature. Bioinformatics 2006; 22:2055-7. [PMID: 16809392 DOI: 10.1093/bioinformatics/btl342] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SUMMARY PDQ Wizard automates the process of interrogating biomedical references using large lists of genes, proteins or free text. Using the principle of linkage through co-citation biologists can mine PubMed with these proteins or genes to identify relationships within a biological field of interest. In addition, PDQ Wizard provides novel features to define more specific relationships, highlight key publications describing those activities and relationships, and enhance protein queries. PDQ Wizard also outputs a metric that can be used for prioritization of genes and proteins for further research. AVAILABILITY PDQ Wizard is freely available from http://www.gti.ed.ac.uk/pdqwizard/.
Collapse
Affiliation(s)
- G R Grimes
- The Scottish Centre for Genomic Technology and Informatics, University of Edinburgh 49 Little France Crescent, Edinburgh EH16 4SB, UK.
| | | | | | | | | | | | | |
Collapse
|
199
|
Tiffin N, Adie E, Turner F, Brunner HG, van Driel MA, Oti M, Lopez-Bigas N, Ouzounis C, Perez-Iratxeta C, Andrade-Navarro MA, Adeyemo A, Patti ME, Semple CAM, Hide W. Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes. Nucleic Acids Res 2006; 34:3067-81. [PMID: 16757574 PMCID: PMC1475747 DOI: 10.1093/nar/gkl381] [Citation(s) in RCA: 101] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Genome-wide experimental methods to identify disease genes, such as linkage analysis and association studies, generate increasingly large candidate gene sets for which comprehensive empirical analysis is impractical. Computational methods employ data from a variety of sources to identify the most likely candidate disease genes from these gene sets. Here, we review seven independent computational disease gene prioritization methods, and then apply them in concert to the analysis of 9556 positional candidate genes for type 2 diabetes (T2D) and the related trait obesity. We generate and analyse a list of nine primary candidate genes for T2D genes and five for obesity. Two genes, LPL and BCKDHA, are common to these two sets. We also present a set of secondary candidates for T2D (94 genes) and for obesity (116 genes) with 58 genes in common to both diseases.
Collapse
Affiliation(s)
- Nicki Tiffin
- South African National Bioinformatics Institute, University of the Western Cape, Bellville, 7535, South Africa.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
200
|
Whitfield EJ, Pruess M, Apweiler R. Bioinformatics database infrastructure for biotechnology research. J Biotechnol 2006; 124:629-39. [PMID: 16757051 DOI: 10.1016/j.jbiotec.2006.04.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2005] [Revised: 03/06/2006] [Accepted: 04/03/2006] [Indexed: 10/24/2022]
Abstract
Many databases are available that provide valuable data resources for the biotechnological researcher. According to their core data, they can be divided into different types. Some databases provide primary data, like all published nucleotide sequences, others deal with protein sequences. In addition to these two basic types of databases, a huge number of more specialized resources are available, like databases about protein structures, protein identification, special features of genes and/or proteins, or certain organisms. Furthermore, some resources offer integrated views on different types of data, allowing the user to do easy customized queries over large datasets and to compare different types of data.
Collapse
Affiliation(s)
- Eleanor J Whitfield
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambs CB10 1SD, UK.
| | | | | |
Collapse
|