301
|
Grün D, Wang YL, Langenberger D, Gunsalus KC, Rajewsky N. microRNA target predictions across seven Drosophila species and comparison to mammalian targets. PLoS Comput Biol 2005; 1:e13. [PMID: 16103902 PMCID: PMC1183519 DOI: 10.1371/journal.pcbi.0010013] [Citation(s) in RCA: 368] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2005] [Accepted: 06/02/2005] [Indexed: 12/19/2022] Open
Abstract
microRNAs are small noncoding genes that regulate the protein production of genes by binding to partially complementary sites in the mRNAs of targeted genes. Here, using our algorithm PicTar, we exploit cross-species comparisons to predict, on average, 54 targeted genes per microRNA above noise in Drosophila melanogaster. Analysis of the functional annotation of target genes furthermore suggests specific biological functions for many microRNAs. We also predict combinatorial targets for clustered microRNAs and find that some clustered microRNAs are likely to coordinately regulate target genes. Furthermore, we compare microRNA regulation between insects and vertebrates. We find that the widespread extent of gene regulation by microRNAs is comparable between flies and mammals but that certain microRNAs may function in clade-specific modes of gene regulation. One of these microRNAs (miR-210) is predicted to contribute to the regulation of fly oogenesis. We also list specific regulatory relationships that appear to be conserved between flies and mammals. Our findings provide the most extensive microRNA target predictions in Drosophila to date, suggest specific functional roles for most microRNAs, indicate the existence of coordinate gene regulation executed by clustered microRNAs, and shed light on the evolution of microRNA function across large evolutionary distances. All predictions are freely accessible at our searchable Web site http://pictar.bio.nyu.edu. MicroRNA genes are a recently discovered large class of small noncoding genes. These genes have been shown to regulate the expression of target genes by binding to partially complementary sites in the mRNAs of the targets. To understand microRNA function it is thus important to identify their targets. Here, the authors use their bioinformatic method, PicTar, and cross-species comparisons of several newly sequenced fly species to predict, genome wide, targets of microRNAs in Drosophila. They find that known fly microRNAs control at least 15% of all genes in D. melanogaster. They also show that genomic clusters of microRNAs are likely to coordinately regulate target genes. Analysis of the functional annotation of target genes furthermore suggests specific biological functions for many microRNAs. All predictions are freely accessible at http://pictar.bio.nyu.edu. Finally, Grün et al. compare the function of microRNAs across flies and mammals. They find that (a) the overall extent of microRNA gene regulation is comparable between both clades, (b) the number of targets for a conserved microRNA in flies correlates with the number of targets in mammals, (c) some conserved microRNAs may function in clade-specific modes of gene regulation, and (d) some specific microRNA–target regulatory relationships may be conserved between both clades.
Collapse
Affiliation(s)
- Dominic Grün
- Center for Comparative Functional Genomics, Department of Biology, New York University, New York, New York, United States of America
| | - Yi-Lu Wang
- Center for Comparative Functional Genomics, Department of Biology, New York University, New York, New York, United States of America
| | - David Langenberger
- Center for Comparative Functional Genomics, Department of Biology, New York University, New York, New York, United States of America
| | - Kristin C Gunsalus
- Center for Comparative Functional Genomics, Department of Biology, New York University, New York, New York, United States of America
| | - Nikolaus Rajewsky
- Center for Comparative Functional Genomics, Department of Biology, New York University, New York, New York, United States of America
- *To whom correspondence should be addressed. E-mail:
| |
Collapse
|
302
|
Reiss DJ, Avila-Campillo I, Thorsson V, Schwikowski B, Galitski T. Tools enabling the elucidation of molecular pathways active in human disease: application to Hepatitis C virus infection. BMC Bioinformatics 2005; 6:154. [PMID: 15967031 PMCID: PMC1181626 DOI: 10.1186/1471-2105-6-154] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2005] [Accepted: 06/20/2005] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND The extraction of biological knowledge from genome-scale data sets requires its analysis in the context of additional biological information. The importance of integrating experimental data sets with molecular interaction networks has been recognized and applied to the study of model organisms, but its systematic application to the study of human disease has lagged behind due to the lack of tools for performing such integration. RESULTS We have developed techniques and software tools for simplifying and streamlining the process of integration of diverse experimental data types in molecular networks, as well as for the analysis of these networks. We applied these techniques to extract, from genomic expression data from Hepatitis C virus-infected liver tissue, potentially useful hypotheses related to the onset of this disease. Our integration of the expression data with large-scale molecular interaction networks and subsequent analyses identified molecular pathways that appear to be induced or repressed in the response to Hepatitis C viral infection. CONCLUSION The methods and tools we have implemented allow for the efficient dynamic integration and analysis of diverse data in a major human disease system. This integrated data set in turn enabled simple analyses to yield hypotheses related to the response to Hepatitis C viral infection.
Collapse
Affiliation(s)
- David J Reiss
- Institute for Systems Biology, 1441 N. 34Street, Seattle, WA 98103, USA
| | | | - Vesteinn Thorsson
- Institute for Systems Biology, 1441 N. 34Street, Seattle, WA 98103, USA
| | - Benno Schwikowski
- Institute for Systems Biology, 1441 N. 34Street, Seattle, WA 98103, USA
- Institut Pasteur, 25–28 Rue du Dr. Roux, 75724 Paris CEDEX 15, France
| | - Timothy Galitski
- Institute for Systems Biology, 1441 N. 34Street, Seattle, WA 98103, USA
| |
Collapse
|
303
|
Smith M, Kunin V, Goldovsky L, Enright AJ, Ouzounis CA. MagicMatch--cross-referencing sequence identifiers across databases. Bioinformatics 2005; 21:3429-30. [PMID: 15961438 DOI: 10.1093/bioinformatics/bti548] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION At present, mapping of sequence identifiers across databases is a daunting, time-consuming and computationally expensive process, usually achieved by sequence similarity searches with strict threshold values. SUMMARY We present a rapid and efficient method to map sequence identifiers across databases. The method uses the MD5 checksum algorithm for message integrity to generate sequence fingerprints and uses these fingerprints as hash strings to map sequences across databases. The program, called MagicMatch, is able to cross-link any of the major sequence databases within a few seconds on a modest desktop computer.
Collapse
Affiliation(s)
- Mike Smith
- Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK
| | | | | | | | | |
Collapse
|
304
|
Agarwal SM, Gupta J. Comparative analysis of human intronless proteins. Biochem Biophys Res Commun 2005; 331:512-9. [PMID: 15850789 DOI: 10.1016/j.bbrc.2005.03.209] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2005] [Indexed: 11/24/2022]
Abstract
The availability of the complete genome sequences of Homo sapiens together with those of taxonomically diverse organisms provides an opportunity to carry out cross-species comparison. Comparisons of protein sequences from different organisms are significant source of information as these could help in answering questions regarding the fraction of proteins that are shared by humans and organisms representing the three domains of life, viz., archaea, bacteria, and eukaryota. In the present study, a comparative analysis of the proteins encoded by intronless genes in humans was undertaken. We identified 1125 human intronless proteins that are solely present in eukaryotic lineage. More than two-thirds of these eukaryotic specific proteins appear to be mammalia specific while a small fraction of proteins are conserved in bilateria and coelomata, indicating that diversification of these proteins occurred after the divergence of the major lineages of the eukaryotic crown group. A large fraction of mammalia specific proteins are enriched in proteins responsible for transport and binding, cell envelope, and housekeeping function particularly translation. Another 228 intronless proteins are observed that do not exhibit homology to any of the proteins in the database. The distribution of human intronless proteins suggests that lineage specific expansion is one of the most important sources of organizational diversity in crown-group eukaryotes. The presence of these eukaryotic as well as human specific intronless proteins provides the foundation for rapid analysis of some of the basic processes involved in human genome.
Collapse
Affiliation(s)
- Subhash Mohan Agarwal
- Department of Chemistry, Jamia Millia Islamia, Jamia Nagar, New Delhi 110025, India.
| | | |
Collapse
|
305
|
Suh Y, Vijg J. SNP discovery in associating genetic variation with human disease phenotypes. Mutat Res 2005; 573:41-53. [PMID: 15829236 DOI: 10.1016/j.mrfmmm.2005.01.005] [Citation(s) in RCA: 127] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2004] [Revised: 01/10/2005] [Accepted: 01/11/2005] [Indexed: 11/24/2022]
Abstract
With the completion of the human genome project, attention is now rapidly shifting towards the study of individual genetic variation. The most abundant source of genetic variation in the human genome is represented by single nucleotide polymorphisms (SNPs), which can account for heritable inter-individual differences in complex phenotypes. Identification of SNPs that contribute to susceptibility to common diseases will provide highly accurate diagnostic information that will facilitate early diagnosis, prevention, and treatment of human diseases. Over the past several years, the advancement of increasingly high-throughput and cost-effective methods to discover and measure SNPs has begun to open the door towards this endeavor. Genetic association studies are considered to be an effective approach towards the detection of SNPs with moderate effects, as in most common diseases with complex phenotypes. This requires careful study design, analysis and interpretation. In this review, we discuss genetic association studies and address the prospect for candidate gene association studies, comparing the strengths and weaknesses of indirect and direct study designs. Our focus is on the continuous need for SNP discovery methods and the use of currently available prescreening methods for large-scale genetic epidemiological research until more advanced sequencing methods currently under development will become available.
Collapse
Affiliation(s)
- Yousin Suh
- Department of Physiology, Barshop Institute for Longevity and Aging Studies, University of Texas Health Science Center, 15355 Lambda Drive, San Antonio, TX 78245, USA.
| | | |
Collapse
|
306
|
Abstract
More than ever, life science researchers depend on information from multiple sources. The Semantic Web offers a powerful new strategy for consolidating both text and structured data into a comprehensive collections and views. In addition, these aggregates are readable by both humans and machines and could be the basis of information management and knowledge exchange.
Collapse
Affiliation(s)
- Eric Neumann
- Sanofi-Aventis Pharmaceuticals, 1041 Route 202-206 Bridgewater, NJ 08807, USA.
| |
Collapse
|
307
|
Abstract
Entrez Gene (www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene) is NCBI's database for gene-specific information. It does not include all known or predicted genes; instead Entrez Gene focuses on the genomes that have been completely sequenced, that have an active research community to contribute gene-specific information, or that are scheduled for intense sequence analysis. The content of Entrez Gene represents the result of curation and automated integration of data from NCBI's Reference Sequence project (RefSeq), from collaborating model organism databases, and from many other databases available from NCBI. Records are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, map location, gene products and their attributes, markers, phenotypes, and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is updated as new information becomes available. Entrez Gene is a step forward from NCBI's LocusLink, with both a major increase in taxonomic scope and improved access through the many tools associated with NCBI Entrez.
Collapse
Affiliation(s)
- Donna Maglott
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Room 5AS.13B, 45 Center Drive, Bethesda, MD 20892-6510, USA.
| | | | | | | |
Collapse
|
308
|
Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2005; 33:D501-4. [PMID: 15608248 PMCID: PMC539979 DOI: 10.1093/nar/gki025] [Citation(s) in RCA: 1193] [Impact Index Per Article: 62.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff.
Collapse
Affiliation(s)
- Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Rm 6An.12J, 45 Center Drive, Bethesda, MD 20892-6510, USA.
| | | | | |
Collapse
|
309
|
Abstract
GenBank® is a comprehensive database that contains publicly available DNA sequences for more than 165 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in the UK and the DNA Data Bank of Japan helps to ensure worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at http://www.ncbi.nlm.nih.gov.
Collapse
Affiliation(s)
- Dennis A Benson
- Department of Health and Human Services, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | | | | | | | | |
Collapse
|
310
|
Brooksbank C, Cameron G, Thornton J. The European Bioinformatics Institute's data resources: towards systems biology. Nucleic Acids Res 2005; 33:D46-53. [PMID: 15608238 PMCID: PMC539980 DOI: 10.1093/nar/gki026] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Genomic and post-genomic biological research has provided fine-grain insights into the molecular processes of life, but also threatens to drown biomedical researchers in data. Moreover, as new high-throughput technologies are developed, the types of data that are gathered en masse are diversifying. The need to collect, store and curate all this information in ways that allow its efficient retrieval and exploitation is greater than ever. The European Bioinformatics Institute's (EBI's) databases and tools have evolved to meet the changing needs of molecular biologists: since we last wrote about our services in the 2003 issue of Nucleic Acids Research, we have launched new databases covering protein–protein interactions (IntAct), pathways (Reactome) and small molecules (ChEBI). Our existing core databases have continued to evolve to meet the changing needs of biomedical researchers, and we have developed new data-access tools that help biologists to move intuitively through the different data types, thereby helping them to put the parts together to understand biology at the systems level. The EBI's data resources are all available on our website at http://www.ebi.ac.uk.
Collapse
Affiliation(s)
- Catherine Brooksbank
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
311
|
Perez-Iratxeta C, Palidwor G, Porter CJ, Sanche NA, Huska MR, Suomela BP, Muro EM, Krzyzanowski PM, Hughes E, Campbell PA, Rudnicki MA, Andrade MA. Study of stem cell function using microarray experiments. FEBS Lett 2005; 579:1795-801. [PMID: 15763554 DOI: 10.1016/j.febslet.2005.02.020] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2005] [Revised: 02/10/2005] [Accepted: 02/10/2005] [Indexed: 11/17/2022]
Abstract
DNA Microarrays are used to simultaneously measure the levels of thousands of mRNAs in a sample. We illustrate here that a collection of such measurements in different cell types and states is a sound source of functional predictions, provided the microarray experiments are analogous and the cell samples are appropriately diverse. We have used this approach to study stem cells, whose identity and mechanisms of control are not well understood, generating Affymetrix microarray data from more than 200 samples, including stem cells and their derivatives, from human and mouse. The data can be accessed online (StemBase; http://www.scgp.ca:8080/StemBase/).
Collapse
Affiliation(s)
- Carolina Perez-Iratxeta
- Ontario Genomics Innovation Centre, Ottawa Health Research Institute, Molecular Medicine Program, 501 Smyth Road, Ottawa, Canada K1H 8L6
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
312
|
Fiehn O, Wohlgemuth G, Scholz M. Setup and Annotation of Metabolomic Experiments by Integrating Biological and Mass Spectrometric Metadata. LECTURE NOTES IN COMPUTER SCIENCE 2005. [DOI: 10.1007/11530084_18] [Citation(s) in RCA: 111] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|