351
|
Edwards YJK, Bryson K, Jones DT. A meta-analysis of microarray gene expression in mouse stem cells: redefining stemness. PLoS One 2008; 3:e2712. [PMID: 18628962 PMCID: PMC2444034 DOI: 10.1371/journal.pone.0002712] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2008] [Accepted: 05/27/2008] [Indexed: 11/21/2022] Open
Abstract
Background While much progress has been made in understanding stem cell (SC) function, a complete description of the molecular mechanisms regulating SCs is not yet established. This lack of knowledge is a major barrier holding back the discovery of therapeutic uses of SCs. We investigated the value of a novel meta-analysis of microarray gene expression in mouse SCs to aid the elucidation of regulatory mechanisms common to SCs and particular SC types. Methodology/Principal Findings We added value to previously published microarray gene expression data by characterizing the promoter type likely to regulate transcription. Promoters of up-regulated genes in SCs were characterized in terms of alternative promoter (AP) usage and CpG-richness, with the aim of correlating features known to affect transcriptional control with SC function. We found that SCs have a higher proportion of up-regulated genes using CpG-rich promoters compared with the negative controls. Comparing subsets of SC type with the controls a slightly different story unfolds. The differences between the proliferating adult SCs and the embryonic SCs versus the negative controls are statistically significant. Whilst the difference between the quiescent adult SCs compared with the negative controls is not. On examination of AP usage, no difference was observed between SCs and the controls. However, comparing the subsets of SC type with the controls, the quiescent adult SCs are found to up-regulate a larger proportion of genes that have APs compared to the controls and the converse is true for the proliferating adult SCs and the embryonic SCs. Conclusions/Significance These findings suggest that looking at features associated with control of transcription is a promising future approach for characterizing “stemness” and that further investigations of stemness could benefit from separate considerations of different SC states. For example, “proliferating-stemness” is shown here, in terms of promoter usage, to be distinct from “quiescent-stemness”.
Collapse
Affiliation(s)
- Yvonne J. K. Edwards
- Bioinformatics Group, Department of Computer Science, University College London, London, United Kingdom
| | - Kevin Bryson
- Bioinformatics Group, Department of Computer Science, University College London, London, United Kingdom
| | - David T. Jones
- Bioinformatics Group, Department of Computer Science, University College London, London, United Kingdom
- * E-mail:
| |
Collapse
|
352
|
Cameron J, Holla ØL, Berge KE, Kulseth MA, Ranheim T, Leren TP, Laerdahl JK. Investigations on the evolutionary conservation of PCSK9 reveal a functionally important protrusion. FEBS J 2008; 275:4121-33. [DOI: 10.1111/j.1742-4658.2008.06553.x] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
353
|
Ortutay C, Vihinen M. PseudoGeneQuest - service for identification of different pseudogene types in the human genome. BMC Bioinformatics 2008; 9:299. [PMID: 18597685 PMCID: PMC2453144 DOI: 10.1186/1471-2105-9-299] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2008] [Accepted: 07/02/2008] [Indexed: 01/29/2023] Open
Abstract
Background Pseudogenes, nonfunctional copies of genes, evolve fast due the lack of evolutionary pressures and thus appear in several different forms. PseudoGeneQuest is an online tool to search the human genome for a given query sequence and to identify different types of pseudogenes as well as novel genes and gene fragments. Description The service can detect pseudogenes, that have arisen either by retrotransposition or segmental genome duplication, many of which are not listed in the public pseudogene databases. The service has a user-friendly web interface and uses a powerful computer cluster in order to perform parallel searches and provide relatively fast runtimes despite exhaustive database searches and analyses. Conclusion PseudoGeneQuest is a versatile tool for detecting novel pseudogene candidates from the human genome. The service searches human genome sequences for five types of pseudogenes and provides an output that allows easy further analysis of observations. In addition to the result file the system provides visualization of the results linked to Ensembl Genome Browser. PseudoGeneQuest service is freely available.
Collapse
Affiliation(s)
- Csaba Ortutay
- Institute of Medical Technology, University of Tampere, FI-33014 Tampere, Finland.
| | | |
Collapse
|
354
|
Schmidt T, Frishman D. Assignment of isochores for all completely sequenced vertebrate genomes using a consensus. Genome Biol 2008; 9:R104. [PMID: 18590563 PMCID: PMC2481423 DOI: 10.1186/gb-2008-9-6-r104] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Revised: 05/22/2008] [Accepted: 06/30/2008] [Indexed: 11/16/2022] Open
Abstract
A new consensus isochore assignment method and a database of isochore maps for all completely sequenced vertebrate genomes are presented. We show that although the currently available isochore mapping methods agree on the isochore classification of about two-thirds of the human DNA, they produce significantly different results with regard to the location of isochore boundaries and isochore length distribution. We present a new consensus isochore assignment method based on majority voting and provide IsoBase, a comprehensive on-line database of isochore maps for all completely sequenced vertebrate genomes.
Collapse
Affiliation(s)
- Thorsten Schmidt
- Department of Genome-Oriented Bioinformatics, Wissenschaftszentrum Weihenstephan, Technische Universität München, D-85350 Freising, Germany
| | | |
Collapse
|
355
|
Jordan JJ, Menendez D, Inga A, Nourredine M, Bell D, Resnick MA. Noncanonical DNA motifs as transactivation targets by wild type and mutant p53. PLoS Genet 2008; 4:e1000104. [PMID: 18714371 PMCID: PMC2518093 DOI: 10.1371/journal.pgen.1000104] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2008] [Accepted: 05/22/2008] [Indexed: 12/31/2022] Open
Abstract
Sequence-specific binding by the human p53 master regulator is critical to its tumor suppressor activity in response to environmental stresses. p53 binds as a tetramer to two decameric half-sites separated by 0–13 nucleotides (nt), originally defined by the consensus RRRCWWGYYY (n = 0–13) RRRCWWGYYY. To better understand the role of sequence, organization, and level of p53 on transactivation at target response elements (REs) by wild type (WT) and mutant p53, we deconstructed the functional p53 canonical consensus sequence using budding yeast and human cell systems. Contrary to early reports on binding in vitro, small increases in distance between decamer half-sites greatly reduces p53 transactivation, as demonstrated for the natural TIGER RE. This was confirmed with human cell extracts using a newly developed, semi–in vitro microsphere binding assay. These results contrast with the synergistic increase in transactivation from a pair of weak, full-site REs in the MDM2 promoter that are separated by an evolutionary conserved 17 bp spacer. Surprisingly, there can be substantial transactivation at noncanonical ½-(a single decamer) and ¾-sites, some of which were originally classified as biologically relevant canonical consensus sequences including PIDD and Apaf-1. p53 family members p63 and p73 yielded similar results. Efficient transactivation from noncanonical elements requires tetrameric p53, and the presence of the carboxy terminal, non-specific DNA binding domain enhanced transactivation from noncanonical sequences. Our findings demonstrate that RE sequence, organization, and level of p53 can strongly impact p53-mediated transactivation, thereby changing the view of what constitutes a functional p53 target. Importantly, inclusion of ½- and ¾-site REs greatly expands the p53 master regulatory network. Within human cells, the tumor suppressor p53 is the central node of regulation required to elicit multiple biological responses that include cell cycle arrest and death in response to stress or DNA damage, where mutations in p53 are a hallmark of cancer. As a master regulatory gene, p53 controls the action of target genes within its network by directly interacting with a widely accepted consensus DNA binding sequence, composed of two decamer ½-sites that can be separated by up to 13 bases. While mismatches from consensus sequence are frequent, the canonical consensus sequence places a limitation upon the organization and number of target genes within the p53 transcriptional network. Using yeast and human cell systems, our goal was to further understand how the DNA sequence, DNA organization, and level of p53 expression might influence the inclusion of genes within the p53 regulatory network. We found that increases in spacer beyond a few bases greatly reduce responsiveness to p53. Importantly, we established that p53 can function from noncanonical sequences comprising only a decamer ½-site or a ¾-site. These findings further define and expand the universe of potential downstream target genes which may be regulated by p53 and bring further diversity into the p53 regulatory network.
Collapse
Affiliation(s)
- Jennifer J. Jordan
- Laboratory of Molecular Genetics, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, North Carolina, United States of America
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Daniel Menendez
- Laboratory of Molecular Genetics, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, North Carolina, United States of America
| | - Alberto Inga
- Laboratory of Molecular Genetics, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, North Carolina, United States of America
- Unit of Molecular Mutagenesis and DNA Repair, National Institute for Cancer Research, IST, Genoa, Italy
| | - Maher Nourredine
- Laboratory of Molecular Genetics, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, North Carolina, United States of America
| | - Douglas Bell
- Laboratory of Molecular Genetics, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, North Carolina, United States of America
| | - Michael A. Resnick
- Laboratory of Molecular Genetics, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, North Carolina, United States of America
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
356
|
Desai S, Heffelfinger AK, Orcutt TM, Litman GW, Yoder JA. The medaka novel immune-type receptor (NITR) gene clusters reveal an extraordinary degree of divergence in variable domains. BMC Evol Biol 2008; 8:177. [PMID: 18565225 PMCID: PMC2442602 DOI: 10.1186/1471-2148-8-177] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2008] [Accepted: 06/19/2008] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Novel immune-type receptor (NITR) genes are members of diversified multigene families that are found in bony fish and encode type I transmembrane proteins containing one or two extracellular immunoglobulin (Ig) domains. The majority of NITRs can be classified as inhibitory receptors that possess cytoplasmic immunoreceptor tyrosine-based inhibition motifs (ITIMs). A much smaller number of NITRs can be classified as activating receptors by the lack of cytoplasmic ITIMs and presence of a positively charged residue within their transmembrane domain, which permits partnering with an activating adaptor protein. RESULTS Forty-four NITR genes in medaka (Oryzias latipes) are located in three gene clusters on chromosomes 10, 18 and 21 and can be organized into 24 families including inhibitory and activating forms. The particularly large dataset acquired in medaka makes direct comparison possible to another complete dataset acquired in zebrafish in which NITRs are localized in two clusters on different chromosomes. The two largest medaka NITR gene clusters share conserved synteny with the two zebrafish NITR gene clusters. Shared synteny between NITRs and CD8A/CD8B is limited but consistent with a potential common ancestry. CONCLUSION Comprehensive phylogenetic analyses between the complete datasets of NITRs from medaka and zebrafish indicate multiple species-specific expansions of different families of NITRs. The patterns of sequence variation among gene family members are consistent with recent birth-and-death events. Similar effects have been observed with mammalian immunoglobulin (Ig), T cell antigen receptor (TCR) and killer cell immunoglobulin-like receptor (KIR) genes. NITRs likely diverged along an independent pathway from that of the somatically rearranging antigen binding receptors but have undergone parallel evolution of V family diversity.
Collapse
Affiliation(s)
- Salil Desai
- Department of Molecular Biomedical Sciences and Center for Comparative Medicine and Translational Research, College of Veterinary Medicine, North Carolina State University, 4700 Hillsborough Street, Raleigh, NC 27606, USA.
| | | | | | | | | |
Collapse
|
357
|
van Dijk ADJ, Bosch D, ter Braak CJF, van der Krol AR, van Ham RCHJ. Predicting sub-Golgi localization of type II membrane proteins. ACTA ACUST UNITED AC 2008; 24:1779-86. [PMID: 18562268 PMCID: PMC7110242 DOI: 10.1093/bioinformatics/btn309] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Motivation: Recent research underlines the importance of finegrained knowledge on protein localization. In particular, subcompartmental localization in the Golgi apparatus is important, for example, for the order of reactions performed in glycosylation pathways or the sorting functions of SNAREs, but is currently poorly understood. Results: We assemble a dataset of type II transmembrane proteins with experimentally determined sub-Golgi localizations and use this information to develop a predictor based on the transmembrane domain of these proteins, making use of a dedicated proteinstructure based kernel in an SVM. Various applications demonstrate the power of our approach. In particular, comparison with a large set of glycan structures illustrates the applicability of our predictions on a ‘glycomic’ scale and demonstrates a significant correlation between sub-Golgi localization and the ordering of different steps in glycan biosynthesis. Contact:roeland.vanham@wur.nl Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- A D J van Dijk
- Applied Bioinformatics, PRI, Wageningen UR, Wageningen, The Netherlands
| | | | | | | | | |
Collapse
|
358
|
Abstract
MOTIVATION Limited availability of data has hindered the development of algorithms that can identify functionally meaningful regulatory single nucleotide polymorphisms (rSNPs). Given the large number of common polymorphisms known to reside in the human genome, the identification of functional rSNPs via laboratory assays will be costly and time-consuming. Therefore appropriate bioinformatics strategies for predicting functional rSNPs are necessary. Recent data from the Encyclopedia of DNA Elements (ENCODE) Project has significantly expanded the amount of available functional information relevant to non-coding regions of the genome, and, importantly, led to the conclusion that many functional elements in the human genome are not conserved. RESULTS In this article we describe how ENCODE data can be leveraged to probabilistically determine the functional and phenotypic significance of non-coding SNPs (ncSNPs). The method achieves excellent sensitivity ( approximately 80%) and speci.city ( approximately 99%) based on a set of known phenotypically relevant and non-functional SNPs. In addition, we show that our method is not overtrained through the use of cross-validation analyses. AVAILABILITY The software platforms used in our analyses are freely available (http://www.cs.waikato.ac.nz/ml/weka/). In addition, we provide the training dataset (Supplementary Table 3), and our predictions (Supplementary Table 6), in the Supplementary Material. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ali Torkamani
- Department of Molecular and Experimental Medicine, Scripps Genomic Medicine and the Scripps Translational Science Institute, The Scripps Research Institute, La Jolla, CA 92037, USA
| | | |
Collapse
|
359
|
Keller O, Odronitz F, Stanke M, Kollmar M, Waack S. Scipio: using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species. BMC Bioinformatics 2008; 9:278. [PMID: 18554390 PMCID: PMC2442105 DOI: 10.1186/1471-2105-9-278] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2008] [Accepted: 06/13/2008] [Indexed: 11/10/2022] Open
Abstract
Background For many types of analyses, data about gene structure and locations of non-coding regions of genes are required. Although a vast amount of genomic sequence data is available, precise annotation of genes is lacking behind. Finding the corresponding gene of a given protein sequence by means of conventional tools is error prone, and cannot be completed without manual inspection, which is time consuming and requires considerable experience. Results Scipio is a tool based on the alignment program BLAT to determine the precise gene structure given a protein sequence and a genome sequence. It identifies intron-exon borders and splice sites and is able to cope with sequencing errors and genes spanning several contigs in genomes that have not yet been assembled to supercontigs or chromosomes. Instead of producing a set of hits with varying confidence, Scipio gives the user a coherent summary of locations on the genome that code for the query protein. The output contains information about discrepancies that may result from sequencing errors. Scipio has also successfully been used to find homologous genes in closely related species. Scipio was tested with 979 protein queries against 16 arthropod genomes (intra species search). For cross-species annotation, Scipio was used to annotate 40 genes from Homo sapiens in the primates Pongo pygmaeus abelii and Callithrix jacchus. The prediction quality of Scipio was tested in a comparative study against that of BLAT and the well established program Exonerate. Conclusion Scipio is able to precisely map a protein query onto a genome. Even in cases when there are many sequencing errors, or when incomplete genome assemblies lead to hits that stretch across multiple target sequences, it very often provides the user with the correct determination of intron-exon borders and splice sites, showing an improved prediction accuracy compared to BLAT and Exonerate. Apart from being able to find genes in the genome that encode the query protein, Scipio can also be used to annotate genes in closely related species.
Collapse
Affiliation(s)
- Oliver Keller
- Universität Göttingen, Institut für Informatik, Lotzestr. 16-18, 37083 Göttingen, Germany.
| | | | | | | | | |
Collapse
|
360
|
Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 2008; 18:1509-17. [PMID: 18550803 DOI: 10.1101/gr.079558.108] [Citation(s) in RCA: 1971] [Impact Index Per Article: 123.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.
Collapse
Affiliation(s)
- John C Marioni
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | | | | | | | | |
Collapse
|
361
|
Angiuoli SV, Gussman A, Klimke W, Cochrane G, Field D, Garrity G, Kodira CD, Kyrpides N, Madupu R, Markowitz V, Tatusova T, Thomson N, White O. Toward an online repository of Standard Operating Procedures (SOPs) for (meta)genomic annotation. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2008; 12:137-41. [PMID: 18416670 PMCID: PMC3196215 DOI: 10.1089/omi.2008.0017] [Citation(s) in RCA: 548] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The methodologies used to generate genome and metagenome annotations are diverse and vary between groups and laboratories. Descriptions of the annotation process are helpful in interpreting genome annotation data. Some groups have produced Standard Operating Procedures (SOPs) that describe the annotation process, but standards are lacking for structure and content of these descriptions. In addition, there is no central repository to store and disseminate procedures and protocols for genome annotation. We highlight the importance of SOPs for genome annotation and endorse an online repository of SOPs.
Collapse
Affiliation(s)
- Samuel V Angiuoli
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland 21201, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
362
|
Al-Shahrour F, Carbonell J, Minguez P, Goetz S, Conesa A, Tárraga J, Medina I, Alloza E, Montaner D, Dopazo J. Babelomics: advanced functional profiling of transcriptomics, proteomics and genomics experiments. Nucleic Acids Res 2008; 36:W341-6. [PMID: 18515841 PMCID: PMC2447758 DOI: 10.1093/nar/gkn318] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
We present a new version of Babelomics, a complete suite of web tools for the functional profiling of genome scale experiments, with new and improved methods as well as more types of functional definitions. Babelomics includes different flavours of conventional functional enrichment methods as well as more advanced gene set analysis methods that makes it a unique tool among the similar resources available. In addition to the well-known functional definitions (GO, KEGG), Babelomics includes new ones such as Biocarta pathways or text mining-derived functional terms. Regulatory modules implemented include transcriptional control (Transfac, CisRed) and other levels of regulation such as miRNA-mediated interference. Moreover, Babelomics allows for sub-selection of terms in order to test more focused hypothesis. Also gene annotation correspondence tables can be imported, which allows testing with user-defined functional modules. Finally, a tool for the ‘de novo’ functional annotation of sequences has been included in the system. This allows using yet unannotated organisms in the program. Babelomics has been extensively re-engineered and now it includes the use of web services and Web 2.0 technology features, a new user interface with persistent sessions and a new extended database of gene identifiers. Babelomics is available at http://www.babelomics.org
Collapse
Affiliation(s)
- Fátima Al-Shahrour
- Department of Bioinformatics, Centro de Investigación Príncipe Felipe (CIPF), Autopista del Saler 16, E46013 Valencia, Spain
| | | | | | | | | | | | | | | | | | | |
Collapse
|
363
|
Tárraga J, Medina I, Carbonell J, Huerta-Cepas J, Minguez P, Alloza E, Al-Shahrour F, Vegas-Azcárate S, Goetz S, Escobar P, Garcia-Garcia F, Conesa A, Montaner D, Dopazo J. GEPAS, a web-based tool for microarray data analysis and interpretation. Nucleic Acids Res 2008; 36:W308-14. [PMID: 18508806 PMCID: PMC2447723 DOI: 10.1093/nar/gkn303] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Gene Expression Profile Analysis Suite (GEPAS) is one of the most complete and extensively used web-based packages for microarray data analysis. During its more than 5 years of activity it has continuously been updated to keep pace with the state-of-the-art in the changing microarray data analysis arena. GEPAS offers diverse analysis options that include well established as well as novel algorithms for normalization, gene selection, class prediction, clustering and functional profiling of the experiment. New options for time-course (or dose-response) experiments, microarray-based class prediction, new clustering methods and new tests for differential expression have been included. The new pipeliner module allows automating the execution of sequential analysis steps by means of a simple but powerful graphic interface. An extensive re-engineering of GEPAS has been carried out which includes the use of web services and Web 2.0 technology features, a new user interface with persistent sessions and a new extended database of gene identifiers. GEPAS is nowadays the most quoted web tool in its field and it is extensively used by researchers of many countries and its records indicate an average usage rate of 500 experiments per day. GEPAS, is available at http://www.gepas.org.
Collapse
Affiliation(s)
- Joaquín Tárraga
- Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF), Autopista del Saler 16, E46013, Valencia, Spain
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
364
|
Tranchevent LC, Barriot R, Yu S, Van Vooren S, Van Loo P, Coessens B, De Moor B, Aerts S, Moreau Y. ENDEAVOUR update: a web resource for gene prioritization in multiple species. Nucleic Acids Res 2008; 36:W377-84. [PMID: 18508807 PMCID: PMC2447805 DOI: 10.1093/nar/gkn325] [Citation(s) in RCA: 179] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Endeavour (http://www.esat.kuleuven.be/endeavourweb; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes. Using a training set of genes known to be involved in a biological process of interest, our approach consists of (i) inferring several models (based on various genomic data sources), (ii) applying each model to the candidate genes to rank those candidates against the profile of the known genes and (iii) merging the several rankings into a global ranking of the candidate genes. In the present article, we describe the latest developments of Endeavour. First, we provide a web-based user interface, besides our Java client, to make Endeavour more universally accessible. Second, we support multiple species: in addition to Homo sapiens, we now provide gene prioritization for three major model organisms: Mus musculus, Rattus norvegicus and Caenorhabditis elegans. Third, Endeavour makes use of additional data sources and is now including numerous databases: ontologies and annotations, protein–protein interactions, cis-regulatory information, gene expression data sets, sequence information and text-mining data. We tested the novel version of Endeavour on 32 recent disease gene associations from the literature. Additionally, we describe a number of recent independent studies that made use of Endeavour to prioritize candidate genes for obesity and Type II diabetes, cleft lip and cleft palate, and pulmonary fibrosis.
Collapse
|
365
|
Approaches to comparative sequence analysis: towards a functional view of vertebrate genomes. Nat Rev Genet 2008; 9:303-13. [PMID: 18347593 DOI: 10.1038/nrg2185] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The comparison of genomic sequences is now a common approach to identifying and characterizing functional regions in vertebrate genomes. However, for theoretical reasons and because of practical issues, the generation of these data sets is non-trivial and can have many pitfalls. We are currently seeing an explosion of comparative sequence data, the benefits and limitations of which need to be disseminated to the scientific community. This Review provides a critical overview of the different types of sequence data that are available for analysis and of contemporary comparative sequence analysis methods, highlighting both their strengths and limitations. Approaches to determining the biological significance of constrained sequence are also explored.
Collapse
|
366
|
The methylome: approaches for global DNA methylation profiling. Trends Genet 2008; 24:231-7. [PMID: 18325624 DOI: 10.1016/j.tig.2008.01.006] [Citation(s) in RCA: 201] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2007] [Revised: 01/31/2008] [Accepted: 01/31/2008] [Indexed: 12/27/2022]
Abstract
DNA methylation plays a critical role in genome function both in health and disease. Almost 60 years after the discovery of 5-methyl cytosine and approximately 25 years since the discovery that altered DNA methylation plays a role in disease, the first high-resolution DNA methylation profile (or methylome) of any genome--Arabidopsis thaliana--was determined. Although only approximately 20% of the typical size of mammalian genomes, this milestone demonstrated that the methylomes of the human and similarly large genomes are now within reach. Here, we review current and emerging technologies that hold promise to deliver the first mammalian methylome and to facilitate comprehensive profiling of essentially any cell type in the context of development, disease and the environment.
Collapse
|
367
|
Lucitt MB, Price TS, Pizarro A, Wu W, Yocum AK, Seiler C, Pack MA, Blair IA, Fitzgerald GA, Grosser T. Analysis of the zebrafish proteome during embryonic development. Mol Cell Proteomics 2008; 7:981-94. [PMID: 18212345 DOI: 10.1074/mcp.m700382-mcp200] [Citation(s) in RCA: 101] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The model organism zebrafish (Danio rerio) is particularly amenable to studies deciphering regulatory genetic networks in vertebrate development, biology, and pharmacology. Unraveling the functional dynamics of such networks requires precise quantitation of protein expression during organismal growth, which is incrementally challenging with progressive complexity of the systems. In an approach toward such quantitative studies of dynamic network behavior, we applied mass spectrometric methodology and rigorous statistical analysis to create comprehensive, high quality profiles of proteins expressed at two stages of zebrafish development. Proteins of embryos 72 and 120 h postfertilization (hpf) were isolated and analyzed both by two-dimensional (2D) LC followed by ESI-MS/MS and by 2D PAGE followed by MALDI-TOF/TOF protein identification. We detected 1384 proteins from 327,906 peptide sequence identifications at 72 and 120 hpf with false identification rates of less than 1% using 2D LC-ESI-MS/MS. These included only approximately 30% of proteins that were identified by 2D PAGE-MALDI-TOF/TOF. Roughly 10% of all detected proteins were derived from hypothetical or predicted gene models or were entirely unannotated. Comparison of proteins expression by 2D DIGE revealed that proteins involved in energy production and transcription/translation were relatively more abundant at 72 hpf consistent with faster synthesis of cellular proteins during organismal growth at this time compared with 120 hpf. The data are accessible in a database that links protein identifications to existing resources including the Zebrafish Information Network database. This new resource should facilitate the selection of candidate proteins for targeted quantitation and refine systematic genetic network analysis in vertebrate development and biology.
Collapse
Affiliation(s)
- Margaret B Lucitt
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
368
|
Miller NA, Kingsmore SF, Farmer A, Langley RJ, Mudge J, Crow JA, Gonzalez AJ, Schilkey FD, Kim RJ, van Velkinburgh J, May GD, Black CF, Myers MK, Utsey JP, Frost NS, Sugarbaker DJ, Bueno R, Gullans SR, Baxter SM, Day SW, Retzel EF. Management of High-Throughput DNA Sequencing Projects: Alpheus. ACTA ACUST UNITED AC 2008; 1:132. [PMID: 20151039 DOI: 10.4172/jcsb.1000013] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
High-throughput DNA sequencing has enabled systems biology to begin to address areas in health, agricultural and basic biological research. Concomitant with the opportunities is an absolute necessity to manage significant volumes of high-dimensional and inter-related data and analysis. Alpheus is an analysis pipeline, database and visualization software for use with massively parallel DNA sequencing technologies that feature multi-gigabase throughput characterized by relatively short reads, such as Illumina-Solexa (sequencing-by-synthesis), Roche-454 (pyrosequencing) and Applied Biosystem's SOLiD (sequencing-by-ligation). Alpheus enables alignment to reference sequence(s), detection of variants and enumeration of sequence abundance, including expression levels in transcriptome sequence. Alpheus is able to detect several types of variants, including non-synonymous and synonymous single nucleotide polymorphisms (SNPs), insertions/deletions (indels), premature stop codons, and splice isoforms. Variant detection is aided by the ability to filter variant calls based on consistency, expected allele frequency, sequence quality, coverage, and variant type in order to minimize false positives while maximizing the identification of true positives. Alpheus also enables comparisons of genes with variants between cases and controls or bulk segregant pools. Sequence-based differential expression comparisons can be developed, with data export to SAS JMP Genomics for statistical analysis.
Collapse
Affiliation(s)
- Neil A Miller
- National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
369
|
Cochrane G, Akhtar R, Aldebert P, Althorpe N, Baldwin A, Bates K, Bhattacharyya S, Bonfield J, Bower L, Browne P, Castro M, Cox T, Demiralp F, Eberhardt R, Faruque N, Hoad G, Jang M, Kulikova T, Labarga A, Leinonen R, Leonard S, Lin Q, Lopez R, Lorenc D, McWilliam H, Mukherjee G, Nardone F, Plaister S, Robinson S, Sobhany S, Vaughan R, Wu D, Zhu W, Apweiler R, Hubbard T, Birney E. Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database. Nucleic Acids Res 2008; 36:D5-12. [PMID: 18039715 PMCID: PMC2238915 DOI: 10.1093/nar/gkm1018] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2007] [Revised: 10/23/2007] [Accepted: 10/27/2007] [Indexed: 11/29/2022] Open
Abstract
The Ensembl Trace Archive (http://trace.ensembl.org/) and the EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), known together as the European Nucleotide Archive, continue to see growth in data volume and diversity. Selected major developments of 2007 are presented briefly, along with data submission and retrieval information. In the face of increasing requirements for nucleotide trace, sequence and annotation data archiving, data capture priority decisions have been taken at the European Nucleotide Archive. Priorities are discussed in terms of how reliably information can be captured, the long-term benefits of its capture and the ease with which it can be captured.
Collapse
Affiliation(s)
- Guy Cochrane
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
370
|
Kim DH, Shreenivasaiah PK, Hong S, Kim T, Song HK. Current research trends in systems biology. Anim Cells Syst (Seoul) 2008. [DOI: 10.1080/19768354.2008.9647172] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
|
371
|
DNA methylation profiling of the human major histocompatibility complex: a pilot study for the human epigenome project. PLoS Biol 2004; 18:1518-29. [PMID: 15550986 DOI: 10.1101/gr.077479.108] [Citation(s) in RCA: 283] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
The Human Epigenome Project aims to identify, catalogue, and interpret genome-wide DNA methylation phenomena. Occurring naturally on cytosine bases at cytosine-guanine dinucleotides, DNA methylation is intimately involved in diverse biological processes and the aetiology of many diseases. Differentially methylated cytosines give rise to distinct profiles, thought to be specific for gene activity, tissue type, and disease state. The identification of such methylation variable positions will significantly improve our understanding of genome biology and our ability to diagnose disease. Here, we report the results of the pilot study for the Human Epigenome Project entailing the methylation analysis of the human major histocompatibility complex. This study involved the development of an integrated pipeline for high-throughput methylation analysis using bisulphite DNA sequencing, discovery of methylation variable positions, epigenotyping by matrix-assisted laser desorption/ionisation mass spectrometry, and development of an integrated public database available at http://www.epigenome.org. Our analysis of DNA methylation levels within the major histocompatibility complex, including regulatory exonic and intronic regions associated with 90 genes in multiple tissues and individuals, reveals a bimodal distribution of methylation profiles (i.e., the vast majority of the analysed regions were either hypo- or hypermethylated), tissue specificity, inter-individual variation, and correlation with independent gene expression data.
Collapse
|