201
|
Tiffin N, Andrade-Navarro MA, Perez-Iratxeta C. Linking genes to diseases: it's all in the data. Genome Med 2009; 1:77. [PMID: 19678910 PMCID: PMC2768963 DOI: 10.1186/gm77] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Genome-wide association analyses on large patient cohorts are generating large sets of candidate disease genes. This is coupled with the availability of ever-increasing genomic databases and a rapidly expanding repository of biomedical literature. Computational approaches to disease-gene association attempt to harness these data sources to identify the most likely disease gene candidates for further empirical analysis by translational researchers, resulting in efficient identification of genes of diagnostic, prognostic and therapeutic value. Existing computational methods analyze gene structure and sequence, functional annotation of candidate genes, characteristics of known disease genes, gene regulatory networks, protein-protein interactions, data from animal models and disease phenotype. To date, a few studies have successfully applied computational analysis of clinical phenotype data for specific diseases and shown genetic associations. In the near future, computational strategies will be facilitated by improved integration of clinical and computational research, and by increased availability of clinical phenotype data in a format accessible to computational approaches.
Collapse
Affiliation(s)
- Nicki Tiffin
- MRC/UWC/SANBI Bioinformatics Capacity Development Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville 7535, South Africa.
| | | | | |
Collapse
|
202
|
Kosmrlj A, Chakraborty AK, Kardar M, Shakhnovich EI. Thymic selection of T-cell receptors as an extreme value problem. PHYSICAL REVIEW LETTERS 2009; 103:068103. [PMID: 19792616 DOI: 10.1103/physrevlett.103.068103] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2009] [Indexed: 05/28/2023]
Abstract
T lymphocytes (T cells) orchestrate adaptive immune responses upon activation. T-cell activation requires sufficiently strong binding of T-cell receptors on their surface to short peptides (p) derived from foreign proteins, which are bound to major histocompatibility gene products (displayed on antigen-presenting cells). A diverse and self-tolerant T-cell repertoire is selected in the thymus. We map thymic selection processes to an extreme value problem and provide an analytic expression for the amino acid compositions of selected T-cell receptors (which enable its recognition functions).
Collapse
Affiliation(s)
- Andrej Kosmrlj
- Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | | | | | | |
Collapse
|
203
|
Ying M, Zhan Z, Wang W, Chen D. Origin and evolution of ubiquitin-conjugating enzymes from Guillardia theta nucleomorph to hominoid. Gene 2009; 447:72-85. [PMID: 19664694 DOI: 10.1016/j.gene.2009.07.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2009] [Revised: 07/24/2009] [Accepted: 07/29/2009] [Indexed: 11/19/2022]
Abstract
The origin of eukaryotic ubiquitin-conjugating enzymes (E2s) can be traced back to the Guillardia theta nucleomorph about 2500 million years ago (Mya). E2s are largely vertically inherited over eukaryotic evolution [Lespinet, O., Wolf, Y.I., Koonin, E.V., Aravind, L., 2002. The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res. 1048-1059], while mammal E2s experienced evolution of multigene families by gene duplications which have been accompanied by the increase in the species complexity. Because of alternatively splicing, primate-specific expansions of E2s happened once again at a transcriptional level. Both of them resulted in increasing genomic complexity and diversity of primate E2 proteomic function. The evolutionary processes of human E2 gene structure during expansions were accompanied by exon duplication and exonization of intronic sequences. Exonizations of Transposable Elements (TEs) in UBE2D3, UBE2L3 and UBE2V1 genes from primates indicate that exaptation of TEs also plays important roles in the structural innovation of primate-specific E2s and may create alternative splicing isoforms at a transcriptional level. Estimates for the ratio of dN/dS suggest that a strong purifying selection had acted upon protein-coding sequences of their orthologous UBE2D2, UBE2A, UBE2N, UBE2I and Rbx1 genes from animals, plants and fungi. The similar rates of synonymous substitutions are in accordance with the neutral mutation-random drift hypothesis of molecular evolution. Systematic detection of the origin and evolution of E2s, analyzing the evolution of E2 multigene families by gene duplications and the evolutionary processes of E2s during expansions, and testing its evolutionary force using E2s from distant phylogenetic lineages may advance our distinguishing of ancestral E2s from created E2s, and reveal previously unknown relationships between E2s and metazoan complexity. Analysis of these conserved proteins provides strong support for a close relationship between social amoeba and eukaryote, choanoflagellate and metazoan, and for the central roles of social amoeba and choanoflagellate in the origin and evolution of eukaryote and metazoan. Retracing the different stages of primate E2 exonization by monitoring genomic events over 63 Myr of primate evolution will advance our understanding of how TEs dynamically modified primate transcriptome and proteome in the past, and continue to do so.
Collapse
Affiliation(s)
- Muying Ying
- State Key Laboratory of Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, PR China
| | | | | | | |
Collapse
|
204
|
Polak P, Arndt PF. Long-range bidirectional strand asymmetries originate at CpG islands in the human genome. Genome Biol Evol 2009; 1:189-97. [PMID: 20333189 PMCID: PMC2817419 DOI: 10.1093/gbe/evp024] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/22/2009] [Indexed: 12/24/2022] Open
Abstract
In the human genome, CpG islands (CGIs), which are GC- and CpG-rich sequences, are associated with transcription starting sites (TSSs); in addition, there is evidence that CGIs harbor origins of bidirectional replication (OBRs) and are preferred sites for heteroduplex formation during recombination. Transcription, replication, and recombination processes are known to induce specific mutational patterns in various genomes, and therefore, these patterns are expected to be found around CGIs. We use triple alignments of human, chimp, and macaque to compute the rates of nucleotide substitutions in up to 1 Mbps long intergenic regions on both sides of CGIs. Our analysis revealed that around a CGI there is an asymmetry between complementary substitution rates that is similar to the one that found around the OBR in bacteria. We hypothesize that these asymmetries are induced by differences in the replication of the leading and lagging strand and that a significant number of CGIs overlap OBRs. Within CGIs, we observed a mutational signature of GC-biased gene conversion that is associated with recombination. We suggest that recombination has played a major role in the creation of CGIs.
Collapse
Affiliation(s)
- Paz Polak
- Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | | |
Collapse
|
205
|
Hilger M, Bonaldi T, Gnad F, Mann M. Systems-wide analysis of a phosphatase knock-down by quantitative proteomics and phosphoproteomics. Mol Cell Proteomics 2009; 8:1908-20. [PMID: 19429919 PMCID: PMC2722773 DOI: 10.1074/mcp.m800559-mcp200] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2008] [Revised: 03/26/2009] [Indexed: 12/11/2022] Open
Abstract
Signal transduction in metazoans regulates almost all aspects of biological function, and aberrant signaling is involved in many diseases. Perturbations in phosphorylation-based signaling networks are typically studied in a hypothesis-driven approach, using phospho-specific antibodies. Here we apply quantitative, high-resolution mass spectrometry to determine the systems response to the depletion of one signaling component. Drosophila cells were metabolically labeled using stable isotope labeling by amino acids in cell culture (SILAC) and the phosphatase Ptp61F, the ortholog of mammalian PTB1B, a drug target for diabetes, was knocked down by RNAi. In total we detected more than 10,000 phosphorylation sites in the phosphoproteome of Drosophila Schneider cells and trained a phosphorylation site predictor with this data. SILAC-based quantitation after phosphatase knock-down showed that apart from the phosphatase, the proteome was minimally affected whereas 288 of 6,478 high-confidence phosphorylation sites changed significantly. Responses at the phosphotyrosine level included the already described Ptp61F substrates Stat92E and Abi. Our analysis highlights a connection of Ptp61F to cytoskeletal regulation through GTPase regulating proteins and focal adhesion components.
Collapse
Affiliation(s)
- Maximiliane Hilger
- From the ‡Proteomics and Signal Transduction, Max-Planck Institute for Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany and
| | - Tiziana Bonaldi
- From the ‡Proteomics and Signal Transduction, Max-Planck Institute for Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany and
- §Experimental Oncology, European Institute of Oncology, Via Adamello 16, 20139 Milano, Italy
| | - Florian Gnad
- From the ‡Proteomics and Signal Transduction, Max-Planck Institute for Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany and
| | - Matthias Mann
- From the ‡Proteomics and Signal Transduction, Max-Planck Institute for Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany and
| |
Collapse
|
206
|
Schmidt T, Mewes HW, Stümpflen V. A novel putative miRNA target enhancer signal. PLoS One 2009; 4:e6473. [PMID: 19649282 PMCID: PMC2714067 DOI: 10.1371/journal.pone.0006473] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2009] [Accepted: 06/29/2009] [Indexed: 11/21/2022] Open
Abstract
It is known that miRNA target sites are very short and the effect of miRNA-target site interaction alone appears as being unspecific. Recent experiments suggest further context signals involved in miRNA target site recognition and regulation. Here, we present a novel GC-rich RNA motif downstream of experimentally supported miRNA target sites in human mRNAs with no similarity to previously reported functional motifs. We demonstrate that the novel motif can be found in at least one third of all transcripts regulated by miRNAs. Furthermore, we show that motif occurrence and the frequency of miRNA target sites as well as the stability of their duplex structures correlate. The finding, that the novel motif is significantly associated with miRNA target sites, suggests a functional role of the motif in miRNA target site biology. Beyond, the novel motif has the impact to improve prediction of miRNA target sites significantly.
Collapse
Affiliation(s)
- Thorsten Schmidt
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Bioinformatics and Systems Biology (MIPS), Neuherberg, Germany
| | - Hans-Werner Mewes
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Bioinformatics and Systems Biology (MIPS), Neuherberg, Germany
- Chair for Genome-oriented Bioinformatics, Technische Universität München, Life and Food Science Center Weihenstephan, Freising-Weihenstephan, Germany
| | - Volker Stümpflen
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Bioinformatics and Systems Biology (MIPS), Neuherberg, Germany
- * E-mail:
| |
Collapse
|
207
|
Tan CSH, Bodenmiller B, Pasculescu A, Jovanovic M, Hengartner MO, Jørgensen C, Bader GD, Aebersold R, Pawson T, Linding R. Comparative analysis reveals conserved protein phosphorylation networks implicated in multiple diseases. Sci Signal 2009; 2:ra39. [PMID: 19638616 DOI: 10.1126/scisignal.2000316] [Citation(s) in RCA: 151] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Protein kinases enable cellular information processing. Although numerous human phosphorylation sites and their dynamics have been characterized, the evolutionary history and physiological importance of many signaling events remain unknown. Using target phosphoproteomes determined with a similar experimental and computational pipeline, we investigated the conservation of human phosphorylation events in distantly related model organisms (fly, worm, and yeast). With a sequence-alignment approach, we identified 479 phosphorylation events in 344 human proteins that appear to be positionally conserved over approximately 600 million years of evolution and hence are likely to be involved in fundamental cellular processes. This sequence-alignment analysis suggested that many phosphorylation sites evolve rapidly and therefore do not display strong evolutionary conservation in terms of sequence position in distantly related organisms. Thus, we devised a network-alignment approach to reconstruct conserved kinase-substrate networks, which identified 778 phosphorylation events in 698 human proteins. Both methods identified proteins tightly regulated by phosphorylation as well as signal integration hubs, and both types of phosphoproteins were enriched in proteins encoded by disease-associated genes. We analyzed the cellular functions and structural relationships for these conserved signaling events, noting the incomplete nature of current phosphoproteomes. Assessing phosphorylation conservation at both site and network levels proved useful for exploring both fast-evolving and ancient signaling events. We reveal that multiple complex diseases seem to converge within the conserved networks, suggesting that disease development might rely on common molecular networks.
Collapse
Affiliation(s)
- Chris Soon Heng Tan
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada M5G 1X5
| | | | | | | | | | | | | | | | | | | |
Collapse
|
208
|
Van Raamsdonk CD, Barsh GS, Wakamatsu K, Ito S. Independent regulation of hair and skin color by two G protein-coupled pathways. Pigment Cell Melanoma Res 2009; 22:819-26. [PMID: 19627560 DOI: 10.1111/j.1755-148x.2009.00609.x] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Hair color and skin color are frequently coordinated in mammalian species. To explore this, we have studied mutations in two different G protein coupled pathways, each of which affects the darkness of both hair and skin color. In each mouse mutant (Gnaq(Dsk1), Gna11(Dsk7), and Mc1r(e)), we analyzed the melanocyte density and the concentrations of eumelanin (black pigment) and pheomelanin (yellow pigment) in the hair or skin to determine the mechanisms regulating pigmentation. Surprisingly, we discovered that each mutation affects hair and skin color differently. Furthermore, we have found that in the epidermis, the melanocortin signaling pathway does not couple the synthesis of eumelanin with pheomelanin, as it does in hair follicles. Even by shared signaling pathways, hair and skin melanocytes are regulated quite independently.
Collapse
|
209
|
Neerincx PB, Casel P, Prickett D, Nie H, Watson M, Leunissen JA, Groenen MA, Klopp C. Comparison of three microarray probe annotation pipelines: differences in strategies and their effect on downstream analysis. BMC Proc 2009; 3 Suppl 4:S1. [PMID: 19615109 PMCID: PMC2712739 DOI: 10.1186/1753-6561-3-s4-s1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Reliable annotation linking oligonucleotide probes to target genes is essential for functional biological analysis of microarray experiments. We used the IMAD, OligoRAP and sigReannot pipelines to update the annotation for the ARK-Genomics Chicken 20 K array as part of a joined EADGENE/SABRE workshop. In this manuscript we compare their annotation strategies and results. Furthermore, we analyse the effect of differences in updated annotation on functional analysis for an experiment involving Eimeria infected chickens and finally we propose guidelines for optimal annotation strategies. Results IMAD, OligoRAP and sigReannot update both annotation and estimated target specificity. The 3 pipelines can assign oligos to target specificity categories although with varying degrees of resolution. Target specificity is judged based on the amount and type of oligo versus target-gene alignments (hits), which are determined by filter thresholds that users can adjust based on their experimental conditions. Linking oligos to annotation on the other hand is based on rigid rules, which differ between pipelines. For 52.7% of the oligos from a subset selected for in depth comparison all pipelines linked to one or more Ensembl genes with consensus on 44.0%. In 31.0% of the cases none of the pipelines could assign an Ensembl gene to an oligo and for the remaining 16.3% the coverage differed between pipelines. Differences in updated annotation were mainly due to different thresholds for hybridisation potential filtering of oligo versus target-gene alignments and different policies for expanding annotation using indirect links. The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms. Conclusion In addition to flexible thresholds to determine target specificity, annotation tools should provide metadata describing the relationships between oligos and the annotation assigned to them. These relationships can then be used to judge the varying degrees of reliability allowing users to fine-tune the balance between reliability and coverage. This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotation.
Collapse
Affiliation(s)
- Pieter Bt Neerincx
- Laboratory of Bioinformatics, Wageningen University and Research centre (WUR), P.O. Box 569, 6700 AN Wageningen, The Netherlands
| | - Pierrot Casel
- Sigenae UR875 Biométrie et Intelligence Artificielle/Génétique Cellulaire, Institut National de la Recherche Agrinomique (INRA), BP 52627, 31326 Castanet-Tolosan Cedex, France
| | - Dennis Prickett
- Institute for Animal Health (IAH), Compton, nr Newbury, RG20 7NN, UK
| | - Haisheng Nie
- Animal Breeding and Genomics Centre, Wageningen University and Research centre (WUR), P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Michael Watson
- Institute for Animal Health (IAH), Compton, nr Newbury, RG20 7NN, UK
| | - Jack Am Leunissen
- Laboratory of Bioinformatics, Wageningen University and Research centre (WUR), P.O. Box 569, 6700 AN Wageningen, The Netherlands
| | - Martien Am Groenen
- Animal Breeding and Genomics Centre, Wageningen University and Research centre (WUR), P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Christophe Klopp
- Sigenae UR875 Biométrie et Intelligence Artificielle/Génétique Cellulaire, Institut National de la Recherche Agrinomique (INRA), BP 52627, 31326 Castanet-Tolosan Cedex, France
| |
Collapse
|
210
|
Casel P, Moreews F, Lagarrigue S, Klopp C. sigReannot: an oligo-set re-annotation pipeline based on similarities with the Ensembl transcripts and Unigene clusters. BMC Proc 2009; 3 Suppl 4:S3. [PMID: 19615116 PMCID: PMC2712746 DOI: 10.1186/1753-6561-3-s4-s3] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background Microarray is a powerful technology enabling to monitor tens of thousands of genes in a single experiment. Most microarrays are now using oligo-sets. The design of the oligo-nucleotides is time consuming and error prone. Genome wide microarray oligo-sets are designed using as large a set of transcripts as possible in order to monitor as many genes as possible. Depending on the genome sequencing state and on the assembly state the knowledge of the existing transcripts can be very different. This knowledge evolves with the different genome builds and gene builds. Once the design is done the microarrays are often used for several years. The biologists working in EADGENE expressed the need of up-to-dated annotation files for the oligo-sets they share including information about the orthologous genes of model species, the Gene Ontology, the corresponding pathways and the chromosomal location. Results The results of SigReannot on a chicken micro-array used in the EADGENE project compared to the initial annotations show that 23% of the oligo-nucleotide gene annotations were not confirmed, 2% were modified and 1% were added. The interest of this up-to-date annotation procedure is demonstrated through the analysis of real data previously published. Conclusion SigReannot uses the oligo-nucleotide design procedure criteria to validate the probe-gene link and the Ensembl transcripts as reference for annotation. It therefore produces a high quality annotation based on reference gene sets.
Collapse
Affiliation(s)
- Pierrot Casel
- Sigenae UR875 Biométrie et Intelligence Artificielle, Institut National de la Recherche Agronomique (INRA), BP 52627, 31326 Castanet-Tolosan Cedex, France.
| | | | | | | |
Collapse
|
211
|
Neerincx PBT, Rauwerda H, Nie H, Groenen MAM, Breit TM, Leunissen JAM. OligoRAP - an Oligo Re-Annotation Pipeline to improve annotation and estimate target specificity. BMC Proc 2009; 3 Suppl 4:S4. [PMID: 19615117 PMCID: PMC2712747 DOI: 10.1186/1753-6561-3-s4-s4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Background High throughput gene expression studies using oligonucleotide microarrays depend on the specificity of each oligonucleotide (oligo or probe) for its target gene. However, target specific probes can only be designed when a reference genome of the species at hand were completely sequenced, when this genome were completely annotated and when the genetic variation of the sampled individuals were completely known. Unfortunately there is not a single species for which such a complete data set is available. Therefore, it is important that probe annotation can be updated frequently for optimal interpretation of microarray experiments. Results In this paper we present OligoRAP, a pipeline to automatically update the annotation of oligo libraries and estimate oligo target specificity. OligoRAP uses a reference genome assembly with Ensembl and Entrez Gene annotation supplemented with a set of unmapped transcripts derived from RefSeq and UniGene to handle assembly gaps. OligoRAP produces alignments of each oligo with the reference assembly as well as with unmapped transcripts. These alignments are re-mapped to the annotation sources, which results in a concise, as complete as possible and up-to-date annotation of the oligo library. The building blocks of this pipeline are BioMoby web services creating a highly modular and distributed system with a robust, remote programmatic interface. OligoRAP was used to update the annotation for a subset of 791 oligos from the ARK-Genomics 20 K chicken array, which were selected as starting material for the oligo annotation session of the EADGENE/SABRE Post-analysis workshop. Based on the updated annotation about one third of these oligos is problematic with regard to target specificity. In addition, the accession numbers or ids the oligos were originally designed for no longer exist in the updated annotation for almost half of the oligos. Conclusion As microarrays are designed on incomplete data, it is important to update probe annotation and check target specificity regularly. OligoRAP provides both and due to its design based on BioMoby web services it can easily be embedded as an oligo annotation engine in customised applications for microarray data analysis. The dramatic difference in updated annotation and target specificity for the ARK-Genomics 20 K chicken array as compared to the original data emphasises the need for regular updates.
Collapse
Affiliation(s)
- Pieter B T Neerincx
- Laboratory of Bioinformatics, Wageningen University and Research centre (WUR), P,O, Box 569, 6700 AN Wageningen, The Netherlands.
| | | | | | | | | | | |
Collapse
|
212
|
Burt DW, Carrë W, Fell M, Law AS, Antin PB, Maglott DR, Weber JA, Schmidt CJ, Burgess SC, McCarthy FM. The Chicken Gene Nomenclature Committee report. BMC Genomics 2009; 10 Suppl 2:S5. [PMID: 19607656 PMCID: PMC2966335 DOI: 10.1186/1471-2164-10-s2-s5] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Comparative genomics is an essential component of the post-genomic era. The chicken genome is the first avian genome to be sequenced and it will serve as a model for other avian species. Moreover, due to its unique evolutionary niche, the chicken genome can be used to understand evolution of functional elements and gene regulation in mammalian species. However comparative biology both within avian species and within amniotes is hampered due to the difficulty of recognising functional orthologs. This problem is compounded as different databases and sequence repositories proliferate and the names they assign to functional elements proliferate along with them. Currently, genes can be published under more than one name and one name sometimes refers to unrelated genes. Standardized gene nomenclature is necessary to facilitate communication between scientists and genomic resources. Moreover, it is important that this nomenclature be based on existing nomenclature efforts where possible to truly facilitate studies between different species. We report here the formation of the Chicken Gene Nomenclature Committee (CGNC), an international and centralized effort to provide standardized nomenclature for chicken genes. The CGNC works in conjunction with public resources such as NCBI and Ensembl and in consultation with existing nomenclature committees for human and mouse. The CGNC will develop standardized nomenclature in consultation with the research community and relies on the support of the research community to ensure that the nomenclature facilitates comparative and genomic studies.
Collapse
Affiliation(s)
- David W Burt
- Department of Genomics and Genetics, Roslin Institute and Royal (Dick) School of Veterinary Studies, Midlothian, UK.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
213
|
Essack M, Radovanovic A, Schaefer U, Schmeier S, Seshadri SV, Christoffels A, Kaur M, Bajic VB. DDEC: Dragon database of genes implicated in esophageal cancer. BMC Cancer 2009; 9:219. [PMID: 19580656 PMCID: PMC2711974 DOI: 10.1186/1471-2407-9-219] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2008] [Accepted: 07/06/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Esophageal cancer ranks eighth in order of cancer occurrence. Its lethality primarily stems from inability to detect the disease during the early organ-confined stage and the lack of effective therapies for advanced-stage disease. Moreover, the understanding of molecular processes involved in esophageal cancer is not complete, hampering the development of efficient diagnostics and therapy. Efforts made by the scientific community to improve the survival rate of esophageal cancer have resulted in a wealth of scattered information that is difficult to find and not easily amendable to data-mining. To reduce this gap and to complement available cancer related bioinformatic resources, we have developed a comprehensive database (Dragon Database of Genes Implicated in Esophageal Cancer) with esophageal cancer related information, as an integrated knowledge database aimed at representing a gateway to esophageal cancer related data. DESCRIPTION Manually curated 529 genes differentially expressed in EC are contained in the database. We extracted and analyzed the promoter regions of these genes and complemented gene-related information with transcription factors that potentially control them. We further, precompiled text-mined and data-mined reports about each of these genes to allow for easy exploration of information about associations of EC-implicated genes with other human genes and proteins, metabolites and enzymes, toxins, chemicals with pharmacological effects, disease concepts and human anatomy. The resulting database, DDEC, has a useful feature to display potential associations that are rarely reported and thus difficult to identify. Moreover, DDEC enables inspection of potentially new 'association hypotheses' generated based on the precompiled reports. CONCLUSION We hope that this resource will serve as a useful complement to the existing public resources and as a good starting point for researchers and physicians interested in EC genetics. DDEC is freely accessible to academic and non-profit users at http://apps.sanbi.ac.za/ddec/. DDEC will be updated twice a year.
Collapse
Affiliation(s)
- Magbubah Essack
- South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.
| | | | | | | | | | | | | | | |
Collapse
|
214
|
Clustering of codons with rare cognate tRNAs in human genes suggests an extra level of expression regulation. PLoS Genet 2009; 5:e1000548. [PMID: 19578405 PMCID: PMC2697378 DOI: 10.1371/journal.pgen.1000548] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2008] [Accepted: 06/03/2009] [Indexed: 12/31/2022] Open
Abstract
In species with large effective population sizes, highly expressed genes tend to be encoded by codons with highly abundant cognate tRNAs to maximize translation rate. However, there has been little evidence for a similar bias of synonymous codons in highly expressed human genes. Here, we ask instead whether there is evidence for the selection for codons associated with low abundance tRNAs. Rather than averaging the codon usage of complete genes, we scan the genes for windows with deviating codon usage. We show that there is a significant over representation of human genes that contain clusters of codons with low abundance cognate tRNAs. We name these regions, which on average have a 50% reduction in the amount of cognate tRNA available compared to the remainder of the gene, RTS (rare tRNA score) clusters. We observed a significant reduction in the substitution rate between the human RTS clusters and their orthologous chimp sequence, when compared to non-RTS cluster sequences. Overall, the genes with an RTS cluster have higher tissue specificity than the non-RTS cluster genes. Furthermore, these genes are functionally enriched for transcription regulation. As genes that regulate transcription in lower eukaryotes are known to be involved in translation on demand, this suggests that the mechanism of translation level expression regulation also exists within the human genome.
Collapse
|
215
|
Ahmed S, Valen E, Sandelin A, Matthews J. Dioxin increases the interaction between aryl hydrocarbon receptor and estrogen receptor alpha at human promoters. Toxicol Sci 2009; 111:254-66. [PMID: 19574409 DOI: 10.1093/toxsci/kfp144] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Recent studies have shown that activated aryl hydrocarbon receptor (AHR) induced the recruitment of estrogen receptor-alpha (ERalpha) to AHR-regulated genes and that AHR is recruited to ERalpha-regulated genes. However, these findings were limited to a small number of well-characterized AHR- or ERalpha-responsive genes with little knowledge of what was occurring at other genomic regions. In this study, we showed using chromatin immunoprecipitation followed by hybridization to promoter focused microarrays (ChIP-chip) that 2,3,7,8-tetrachlorodibenzo-p-dioxin treatment significantly increased the overlap of genomic regions bound by both AHR and ERalpha. Conventional and sequential ChIPs confirmed the recruitment of AHR and ERalpha to many of the identified regions. Transcription factor binding site analysis revealed an overrepresentation of aryl hydrocarbon receptor response elements in regions bound by both AHR and ERalpha, suggesting that AHR was the important factor determining the recruitment of ERalpha to these regions. RNA interference-mediated knockdown of AHR confirmed its requirement for the recruitment of ERalpha to some, but not all, of the shared regions. Our findings demonstrate not only that dioxin induces the recruitment of ERalpha to AHR target genes but also that AHR is recruited to estrogen-responsive regions in a gene-specific manner, suggesting that AHR utilizes both of these mechanisms to modulate estrogen-dependent signaling.
Collapse
Affiliation(s)
- Shaimaa Ahmed
- Department of Pharmacology & Toxicology, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | | | | | | |
Collapse
|
216
|
Taft RJ, Glazov EA, Lassmann T, Hayashizaki Y, Carninci P, Mattick JS. Small RNAs derived from snoRNAs. RNA (NEW YORK, N.Y.) 2009; 15:1233-40. [PMID: 19474147 PMCID: PMC2704076 DOI: 10.1261/rna.1528909] [Citation(s) in RCA: 337] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Small nucleolar RNAs (snoRNAs) guide RNA modification and are localized in nucleoli and Cajal bodies in eukaryotic cells. Components of the RNA silencing pathway associate with these structures, and two recent reports have revealed that a human and a protozoan snoRNA can be processed into miRNA-like RNAs. Here we show that small RNAs with evolutionary conservation of size and position are derived from the vast majority of snoRNA loci in animals (human, mouse, chicken, fruit fly), Arabidopsis, and fission yeast. In animals, sno-derived RNAs (sdRNAs) from H/ACA snoRNAs are predominantly 20-24 nucleotides (nt) in length and originate from the 3' end. Those derived from C/D snoRNAs show a bimodal size distribution at approximately 17-19 nt and >27 nt and predominantly originate from the 5' end. SdRNAs are associated with AGO7 in Arabidopsis and Ago1 in fission yeast with characteristic 5' nucleotide biases and show altered expression patterns in fly loquacious and Dicer-2 and mouse Dicer1 and Dgcr8 mutants. These findings indicate that there is interplay between the RNA silencing and snoRNA-mediated RNA processing systems, and that sdRNAs comprise a novel and ancient class of small RNAs in eukaryotes.
Collapse
Affiliation(s)
- Ryan J Taft
- Australian Research Council Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, The University of Queensland, St. Lucia, QLD 4072, Australia
| | | | | | | | | | | |
Collapse
|
217
|
Mochida K, Yoshida T, Sakurai T, Ogihara Y, Shinozaki K. TriFLDB: a database of clustered full-length coding sequences from Triticeae with applications to comparative grass genomics. PLANT PHYSIOLOGY 2009; 150:1135-46. [PMID: 19448038 PMCID: PMC2705016 DOI: 10.1104/pp.109.138214] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2009] [Accepted: 05/08/2009] [Indexed: 05/19/2023]
Abstract
The Triticeae Full-Length CDS Database (TriFLDB) contains available information regarding full-length coding sequences (CDSs) of the Triticeae crops wheat (Triticum aestivum) and barley (Hordeum vulgare) and includes functional annotations and comparative genomics features. TriFLDB provides a search interface using keywords for gene function and related Gene Ontology terms and a similarity search for DNA and deduced translated amino acid sequences to access annotations of Triticeae full-length CDS (TriFLCDS) entries. Annotations consist of similarity search results against several sequence databases and domain structure predictions by InterProScan. The deduced amino acid sequences in TriFLDB are grouped with the proteome datasets for Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and sorghum (Sorghum bicolor) by hierarchical clustering in stepwise thresholds of sequence identity, providing hierarchical clustering results based on full-length protein sequences. The database also provides sequence similarity results based on comparative mapping of TriFLCDSs onto the rice and sorghum genome sequences, which together with current annotations can be used to predict gene structures for TriFLCDS entries. To provide the possible genetic locations of full-length CDSs, TriFLCDS entries are also assigned to the genetically mapped cDNA sequences of barley and diploid wheat, which are currently accommodated in the Triticeae Mapped EST Database. These relational data are searchable from the search interfaces of both databases. The current TriFLDB contains 15,871 full-length CDSs from barley and wheat and includes putative full-length cDNAs for barley and wheat, which are publicly accessible. This informative content provides an informatics gateway for Triticeae genomics and grass comparative genomics. TriFLDB is publicly available at http://TriFLDB.psc.riken.jp/.
Collapse
|
218
|
Ng MP, Vergara IA, Frech C, Chen Q, Zeng X, Pei J, Chen N. OrthoClusterDB: an online platform for synteny blocks. BMC Bioinformatics 2009; 10:192. [PMID: 19549318 PMCID: PMC2711082 DOI: 10.1186/1471-2105-10-192] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2009] [Accepted: 06/23/2009] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The recent availability of an expanding collection of genome sequences driven by technological advances has facilitated comparative genomics and in particular the identification of synteny among multiple genomes. However, the development of effective and easy-to-use methods for identifying such conserved gene clusters among multiple genomes-synteny blocks-as well as databases, which host synteny blocks from various groups of species (especially eukaryotes) and also allow users to run synteny-identification programs, lags behind. DESCRIPTIONS OrthoClusterDB is a new online platform for the identification and visualization of synteny blocks. OrthoClusterDB consists of two key web pages: Run OrthoCluster and View Synteny. The Run OrthoCluster page serves as web front-end to OrthoCluster, a recently developed program for synteny block detection. Run OrthoCluster offers full control over the functionalities of OrthoCluster, such as specifying synteny block size, considering order and strandedness of genes within synteny blocks, including or excluding nested synteny blocks, handling one-to-many orthologous relationships, and comparing multiple genomes. In contrast, the View Synteny page gives access to perfect and imperfect synteny blocks precomputed for a large number of genomes, without the need for users to retrieve and format input data. Additionally, genes are cross-linked with public databases for effective browsing. For both Run OrthoCluster and View Synteny, identified synteny blocks can be browsed at the whole genome, chromosome, and individual gene level. OrthoClusterDB is freely accessible. CONCLUSION We have developed an online system for the identification and visualization of synteny blocks among multiple genomes. The system is freely available at (http://genome.sfu.ca/orthoclusterdb/).
Collapse
Affiliation(s)
- Man-Ping Ng
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, Canada.
| | | | | | | | | | | | | |
Collapse
|
219
|
Eggle D, Debey-Pascher S, Beyer M, Schultze JL. The development of a comparison approach for Illumina bead chips unravels unexpected challenges applying newest generation microarrays. BMC Bioinformatics 2009; 10:186. [PMID: 19538710 PMCID: PMC2711080 DOI: 10.1186/1471-2105-10-186] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2008] [Accepted: 06/18/2009] [Indexed: 01/21/2023] Open
Abstract
Background The MAQC project demonstrated that microarrays with comparable content show inter- and intra-platform reproducibility. However, since the content of gene databases still increases, the development of new generations of microarrays covering new content is mandatory. To better understand the potential challenges updated microarray content might pose on clinical and biological projects we developed a methodology consisting of in silico analyses combined with performance analysis using real biological samples. Results Here we clearly demonstrate that not only oligonucleotide design but also database content and annotation strongly influence comparability and performance of subsequent generations of microarrays. Additionally, using human blood samples and purified T lymphocyte subsets as two independent examples, we show that a performance analysis using biological samples is crucial for the assessment of consistency and differences. Conclusion This study provides an important resource assisting investigators in comparing microarrays of updated content especially when working in a clinical or regulatory setting.
Collapse
Affiliation(s)
- Daniela Eggle
- Molecular Tumor Biology and Tumor Immunology, Department of Internal Medicine I, University of Cologne, Cologne, Germany.
| | | | | | | |
Collapse
|
220
|
Freeman JD, Warren RL, Webb JR, Nelson BH, Holt RA. Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing. Genome Res 2009; 19:1817-24. [PMID: 19541912 DOI: 10.1101/gr.092924.109] [Citation(s) in RCA: 287] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
T-cell receptor (TCR) genomic loci undergo somatic V(D)J recombination, plus the addition/subtraction of nontemplated bases at recombination junctions, in order to generate the repertoire of structurally diverse T cells necessary for antigen recognition. TCR beta subunits can be unambiguously identified by their hypervariable CDR3 (Complement Determining Region 3) sequence. This is the site of V(D)J recombination encoding the principal site of antigen contact. The complexity and dynamics of the T-cell repertoire remain unknown because the potential repertoire size has made conventional sequence analysis intractable. Here, we use 5'-RACE, Illumina sequencing, and a novel short read assembly strategy to sample CDR3(beta) diversity in human T lymphocytes from peripheral blood. Assembly of 40.5 million short reads identified 33,664 distinct TCR(beta) clonotypes and provides precise measurements of CDR3(beta) length diversity, usage of nontemplated bases, sequence convergence, and preferences for TRBV (T-cell receptor beta variable gene) and TRBJ (T-cell receptor beta joining gene) gene usage and pairing. CDR3 length between conserved residues of TRBV and TRBJ ranged from 21 to 81 nucleotides (nt). TRBV gene usage ranged from 0.01% for TRBV17 to 24.6% for TRBV20-1. TRBJ gene usage ranged from 1.6% for TRBJ2-6 to 17.2% for TRBJ2-1. We identified 1573 examples of convergence where the same amino acid translation was specified by distinct CDR3(beta) nucleotide sequences. Direct sequence-based immunoprofiling will likely prove to be a useful tool for understanding repertoire dynamics in response to immune challenge, without a priori knowledge of antigen.
Collapse
Affiliation(s)
- J Douglas Freeman
- Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada
| | | | | | | | | |
Collapse
|
221
|
Penel S, Arigon AM, Dufayard JF, Sertier AS, Daubin V, Duret L, Gouy M, Perrière G. Databases of homologous gene families for comparative genomics. BMC Bioinformatics 2009; 10 Suppl 6:S3. [PMID: 19534752 PMCID: PMC2697650 DOI: 10.1186/1471-2105-10-s6-s3] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Background Comparative genomics is a central step in many sequence analysis studies, from gene annotation and the identification of new functional regions in genomes, to the study of evolutionary processes at the molecular level (speciation, single gene or whole genome duplications, etc.) and phylogenetics. In that context, databases providing users high quality homologous families and sequence alignments as well as phylogenetic trees based on state of the art algorithms are becoming indispensable. Methods We developed an automated procedure allowing massive all-against-all similarity searches, gene clustering, multiple alignments computation, and phylogenetic trees construction and reconciliation. The application of this procedure to a very large set of sequences is possible through parallel computing on a large computer cluster. Results Three databases were developed using this procedure: HOVERGEN, HOGENOM and HOMOLENS. These databases share the same architecture but differ in their content. HOVERGEN contains sequences from vertebrates, HOGENOM is mainly devoted to completely sequenced microbial organisms, and HOMOLENS is devoted to metazoan genomes from Ensembl. Access to the databases is provided through Web query forms, a general retrieval system and a client-server graphical interface. The later can be used to perform tree-pattern based searches allowing, among other uses, to retrieve sets of orthologous genes. The three databases, as well as the software required to build and query them, can be used or downloaded from the PBIL (Pôle Bioinformatique Lyonnais) site at .
Collapse
Affiliation(s)
- Simon Penel
- Laboratoire de Biométrie et Biologie Evolutive, CNRS, Université Claude Bernard - Lyon 1, 43 bd, du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.
| | | | | | | | | | | | | | | |
Collapse
|
222
|
Sebestyén E, Nagy T, Suhai S, Barta E. DoOPSearch: a web-based tool for finding and analysing common conserved motifs in the promoter regions of different chordate and plant genes. BMC Bioinformatics 2009; 10 Suppl 6:S6. [PMID: 19534755 PMCID: PMC2697653 DOI: 10.1186/1471-2105-10-s6-s6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Background The comparative genomic analysis of a large number of orthologous promoter regions of the chordate and plant genes from the DoOP databases shows thousands of conserved motifs. Most of these motifs differ from any known transcription factor binding site (TFBS). To identify common conserved motifs, we need a specific tool to be able to search amongst them. Since conserved motifs from the DoOP databases are linked to genes, the result of such a search can give a list of genes that are potentially regulated by the same transcription factor(s). Results We have developed a new tool called DoOPSearch for the analysis of the conserved motifs in the promoter regions of chordate or plant genes. We used the orthologous promoters of the DoOP database to extract thousands of conserved motifs from different taxonomic groups. The advantage of this approach is that different sets of conserved motifs might be found depending on how broad the taxonomic coverage of the underlying orthologous promoter sequence collection is (consider e.g. primates vs. mammals or Brassicaceae vs. Viridiplantae). The DoOPSearch tool allows the users to search these motif collections or the promoter regions of DoOP with user supplied query sequences or any of the conserved motifs from the DoOP database. To find overrepresented gene ontologies, the gene lists obtained can be analysed further using a modified version of the GeneMerge program. Conclusion We present here a comparative genomics based promoter analysis tool. Our system is based on a unique collection of conserved promoter motifs characteristic of different taxonomic groups. We offer both a command line and a web-based tool for searching in these motif collections using user specified queries. These can be either short promoter sequences or consensus sequences of known transcription factor binding sites. The GeneMerge analysis of the search results allows the user to identify statistically overrepresented Gene Ontology terms that might provide a clue on the function of the motifs and genes.
Collapse
Affiliation(s)
- Endre Sebestyén
- Agricultural Research Institute of the Hungarian Academy of Sciences, Martonvásár, Brunszvik u, 2, H-2462, Hungary.
| | | | | | | |
Collapse
|
223
|
Zeng J, Zhu S, Yan H. Towards accurate human promoter recognition: a review of currently used sequence features and classification methods. Brief Bioinform 2009; 10:498-508. [PMID: 19531545 DOI: 10.1093/bib/bbp027] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
This review describes important advances that have been made during the past decade for genome-wide human promoter recognition. Interest in promoter recognition algorithms on a genome-wide scale is worldwide and touches on a number of practical systems that are important in analysis of gene regulation and in genome annotation without experimental support of ESTs, cDNAs or mRNAs. The main focus of this review is on feature extraction and model selection for accurate human promoter recognition, with descriptions of what they are, what has been accomplished, and what remains to be done.
Collapse
Affiliation(s)
- Jia Zeng
- Department of Computer Science, Hong Kong Baptist University, Kowloon, Hong Kong.
| | | | | |
Collapse
|
224
|
Natarajan S, Jakobsson E. Functional equivalency inferred from "authoritative sources" in networks of homologous proteins. PLoS One 2009; 4:e5898. [PMID: 19521530 PMCID: PMC2690840 DOI: 10.1371/journal.pone.0005898] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2008] [Accepted: 04/29/2009] [Indexed: 11/18/2022] Open
Abstract
A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis. This paper presents a heuristic algorithm for discovering the Most Likely Functional Counterparts (MoLFunCs) of a protein, based on simple concepts from network theory. A key feature of our algorithm is utilization of the user's knowledge to assign high confidence to selected functional identification. We show use of the algorithm to retrieve functional equivalents for 7 membrane proteins, from an exploration of almost 40 genomes form multiple online resources. We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons. Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species. Based on that comparison, we believe that incorporation of user's knowledge as a key aspect of the technique adds value to purely statistical formal methods.
Collapse
Affiliation(s)
- Shreedhar Natarajan
- Biophysics and Computational Biology, University of Illinois, Urbana-Champaign, Illinois, United States of America
| | - Eric Jakobsson
- Biophysics and Computational Biology, University of Illinois, Urbana-Champaign, Illinois, United States of America
- National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign, Illinois, United States of America
- Department of Molecular and Integrative Physiology, University of Illinois, Urbana-Champaign, Illinois, United States of America
- * E-mail:
| |
Collapse
|
225
|
Sato Y, Hashiguchi Y, Nishida M. Temporal pattern of loss/persistence of duplicate genes involved in signal transduction and metabolic pathways after teleost-specific genome duplication. BMC Evol Biol 2009; 9:127. [PMID: 19500364 PMCID: PMC2702319 DOI: 10.1186/1471-2148-9-127] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2008] [Accepted: 06/05/2009] [Indexed: 11/10/2022] Open
Abstract
Background Recent genomic studies have revealed a teleost-specific third-round whole genome duplication (3R-WGD) event occurred in a common ancestor of teleost fishes. However, it is unclear how the genes duplicated in this event were lost or persisted during the diversification of teleosts, and therefore, how many of the duplicated genes contribute to the genetic differences among teleosts. This subject is also important for understanding the process of vertebrate evolution through WGD events. We applied a comparative evolutionary approach to this question by focusing on the genes involved in long-term potentiation, taste and olfactory transduction, and the tricarboxylic acid cycle, based on the whole genome sequences of four teleosts; zebrafish, medaka, stickleback, and green spotted puffer fish. Results We applied a state-of-the-art method of maximum-likelihood phylogenetic inference and conserved synteny analyses to each of 130 genes involved in the above biological systems of human. These analyses identified 116 orthologous gene groups between teleosts and tetrapods, and 45 pairs of 3R-WGD-derived duplicate genes among them. This suggests that more than half [(45×2)/(116+45)] = 56.5%) of the loci, probably more than ten thousand genes, present in a common ancestor of the four teleosts were still duplicated after the 3R-WGD. The estimated temporal pattern of gene loss suggested that, after the 3R-WGD, many (71/116) of the duplicated genes were rapidly lost during the initial 75 million years (MY), whereas on average more than half (27.3/45) of the duplicated genes remaining in the ancestor of the four teleosts (45/116) have persisted for about 275 MY. The 3R-WGD-derived duplicates that have persisted for a long evolutionary periods of time had significantly larger number of interacting partners and longer length of protein coding sequence, implying that they tend to be more multifunctional than the singletons after the 3R-WGD. Conclusion We have shown firstly the temporal pattern of gene loss process after 3R-WGD on the basis of teleost phylogeny and divergence time frameworks. The 3R-WGD-derived duplicates have not undergone constant exponential decay, suggesting that selection favoured the long-term persistence of a subset of duplicates that tend to be multi-functional. On the basis of these results obtained from the analysis of 116 orthologous gene groups, we propose that more than ten thousand of 3R-WGD-derived duplicates have experienced lineage-specific evolution, that is, the differential sub-/neo-functionalization or secondary loss between lineages, and contributed to teleost diversity.
Collapse
Affiliation(s)
- Yukuto Sato
- Division of Molecular Marine Biology, Ocean Research Institute, The University of Tokyo, 1-15-1 Minamidai, Nakano-ku, Tokyo 164-8639, Japan.
| | | | | |
Collapse
|
226
|
Kuzniar A, Lin K, He Y, Nijveen H, Pongor S, Leunissen JAM. ProGMap: an integrated annotation resource for protein orthology. Nucleic Acids Res 2009; 37:W428-34. [PMID: 19494185 PMCID: PMC2703891 DOI: 10.1093/nar/gkp462] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Current protein sequence databases employ different classification schemes that often provide conflicting annotations, especially for poorly characterized proteins. ProGMap (Protein Group Mappings, http://www.bioinformatics.nl/progmap) is a web-tool designed to help researchers and database annotators to assess the coherence of protein groups defined in various databases and thereby facilitate the annotation of newly sequenced proteins. ProGMap is based on a non-redundant dataset of over 6.6 million protein sequences which is mapped to 240 000 protein group descriptions collected from UniProt, RefSeq, Ensembl, COG, KOG, OrthoMCL-DB, HomoloGene, TRIBES and PIRSF. ProGMap combines the underlying classification schemes via a network of links constructed by a fast and fully automated mapping approach originally developed for document classification. The web interface enables queries to be made using sequence identifiers, gene symbols, protein functions or amino acid and nucleotide sequences. For the latter query type BLAST similarity search and QuickMatch identity search services have been incorporated, for finding sequences similar (or identical) to a query sequence. ProGMap is meant to help users of high throughput methodologies who deal with partially annotated genomic data.
Collapse
Affiliation(s)
- Arnold Kuzniar
- Laboratory of Bioinformatics, Wageningen University and Research Centre (WUR), Dreijenlaan 3, 6703 HA Wageningen, The Netherlands
| | | | | | | | | | | |
Collapse
|
227
|
Kiss HJM, Mihalik Á, Nánási T, Őry B, Spiró Z, Sőti C, Csermely P. Ageing as a price of cooperation and complexity. Bioessays 2009; 31:651-64. [DOI: 10.1002/bies.200800224] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
|
228
|
Lawson MJ, Zhang L. Sexy gene conversions: locating gene conversions on the X-chromosome. Nucleic Acids Res 2009; 37:4570-9. [PMID: 19487239 PMCID: PMC2724270 DOI: 10.1093/nar/gkp421] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Gene conversion can have a profound impact on both the short- and long-term evolution of genes and genomes. Here, we examined the gene families that are located on the X-chromosomes of human (Homo sapiens), chimpanzee (Pan troglodytes), mouse (Mus musculus) and rat (Rattus norvegicus) for evidence of gene conversion. We identified seven gene families (WD repeat protein family, Ferritin Heavy Chain family, RAS-related Protein RAB-40 family, Diphosphoinositol polyphosphate phosphohydrolase family, Transcription Elongation Factor A family, LDOC1-related family, Zinc Finger Protein ZIC, and GLI family) that show evidence of gene conversion. Through phylogenetic analyses and synteny evidence, we show that gene conversion has played an important role in the evolution of these gene families and that gene conversion has occurred independently in both primates and rodents. Comparing the results with those of two gene conversion prediction programs (GENECONV and Partimatrix), we found that both GENECONV and Partimatrix have very high false negative rates (i.e. failed to predict gene conversions), which leads to many undetected gene conversions. The combination of phylogenetic analyses with physical synteny evidence exhibits high resolution in the detection of gene conversions.
Collapse
|
229
|
Cai JJ, Borenstein E, Chen R, Petrov DA. Similarly strong purifying selection acts on human disease genes of all evolutionary ages. Genome Biol Evol 2009; 1:131-44. [PMID: 20333184 PMCID: PMC2817408 DOI: 10.1093/gbe/evp013] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/22/2009] [Indexed: 12/20/2022] Open
Abstract
A number of studies have showed that recently created genes differ from the genes created in deep evolutionary past in many aspects. Here, we determined the age of emergence and propensity for gene loss (PGL) of all human protein–coding genes and compared disease genes with non-disease genes in terms of their evolutionary rate, strength of purifying selection, mRNA expression, and genetic redundancy. The older and the less prone to loss, non-disease genes have been evolving 1.5- to 3-fold slower between humans and chimps than young non-disease genes, whereas Mendelian disease genes have been evolving very slowly regardless of their ages and PGL. Complex disease genes showed an intermediate pattern. Disease genes also have higher mRNA expression heterogeneity across multiple tissues than non-disease genes regardless of age and PGL. Young and middle-aged disease genes have fewer similar paralogs as non-disease genes of the same age. We reasoned that genes were more likely to be involved in human disease if they were under a strong functional constraint, expressed heterogeneously across tissues, and lacked genetic redundancy. Young human genes that have been evolving under strong constraint between humans and chimps might also be enriched for genes that encode important primate or even human-specific functions.
Collapse
Affiliation(s)
- James J Cai
- Department of Biology, Stanford University, CA, USA
| | | | | | | |
Collapse
|
230
|
Nugent T, Jones DT. Transmembrane protein topology prediction using support vector machines. BMC Bioinformatics 2009; 10:159. [PMID: 19470175 PMCID: PMC2700806 DOI: 10.1186/1471-2105-10-159] [Citation(s) in RCA: 305] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2008] [Accepted: 05/26/2009] [Indexed: 12/02/2022] Open
Abstract
Background Alpha-helical transmembrane (TM) proteins are involved in a wide range of important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion. Many are also prime drug targets, and it has been estimated that more than half of all drugs currently on the market target membrane proteins. However, due to the experimental difficulties involved in obtaining high quality crystals, this class of protein is severely under-represented in structural databases. In the absence of structural data, sequence-based prediction methods allow TM protein topology to be investigated. Results We present a support vector machine-based (SVM) TM protein topology predictor that integrates both signal peptide and re-entrant helix prediction, benchmarked with full cross-validation on a novel data set of 131 sequences with known crystal structures. The method achieves topology prediction accuracy of 89%, while signal peptides and re-entrant helices are predicted with 93% and 44% accuracy respectively. An additional SVM trained to discriminate between globular and TM proteins detected zero false positives, with a low false negative rate of 0.4%. We present the results of applying these tools to a number of complete genomes. Source code, data sets and a web server are freely available from . Conclusion The high accuracy of TM topology prediction which includes detection of both signal peptides and re-entrant helices, combined with the ability to effectively discriminate between TM and globular proteins, make this method ideally suited to whole genome annotation of alpha-helical transmembrane proteins.
Collapse
Affiliation(s)
- Timothy Nugent
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
| | | |
Collapse
|
231
|
Zhang Z, Xin D, Wang P, Zhou L, Hu L, Kong X, Hurst LD. Noisy splicing, more than expression regulation, explains why some exons are subject to nonsense-mediated mRNA decay. BMC Biol 2009; 7:23. [PMID: 19442261 PMCID: PMC2697156 DOI: 10.1186/1741-7007-7-23] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2009] [Accepted: 05/14/2009] [Indexed: 01/23/2023] Open
Abstract
Background Nonsense-mediated decay is a mechanism that degrades mRNAs with a premature termination codon. That some exons have premature termination codons at fixation is paradoxical: why make a transcript if it is only to be destroyed? One model supposes that splicing is inherently noisy and spurious transcripts are common. The evolution of a premature termination codon in a regularly made unwanted transcript can be a means to prevent costly translation. Alternatively, nonsense-mediated decay can be regulated under certain conditions so the presence of a premature termination codon can be a means to up-regulate transcripts needed when nonsense-mediated decay is suppressed. Results To resolve this issue we examined the properties of putative nonsense-mediated decay targets in humans and mice. We started with a well-annotated set of protein coding genes and found that 2 to 4% of genes are probably subject to nonsense-mediated decay, and that the premature termination codon reflects neither rare mutations nor sequencing artefacts. Several lines of evidence suggested that the noisy splicing model has considerable relevance: 1) exons that are uniquely found in nonsense-mediated decay transcripts (nonsense-mediated decay-specific exons) tend to be newly created; 2) have low-inclusion level; 3) tend not to be a multiple of three long; 4) belong to genes with multiple splice isoforms more often than expected; and 5) these genes are not obviously enriched for any functional class nor conserved as nonsense-mediated decay candidates in other species. However, nonsense-mediated decay-specific exons for which distant orthologous exons can be found tend to have been under purifying selection, consistent with the regulation model. Conclusion We conclude that for recently evolved exons the noisy splicing model is the better explanation of their properties, while for ancient exons the nonsense-mediated decay regulated gene expression is a viable explanation.
Collapse
Affiliation(s)
- Zhenguo Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS) & Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, PR China.
| | | | | | | | | | | | | |
Collapse
|
232
|
Insights into the regulation of intrinsically disordered proteins in the human proteome by analyzing sequence and gene expression data. Genome Biol 2009; 10:R50. [PMID: 19432952 PMCID: PMC2718516 DOI: 10.1186/gb-2009-10-5-r50] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2008] [Revised: 03/23/2009] [Accepted: 05/11/2009] [Indexed: 11/10/2022] Open
Abstract
Signals for microRNA targeting and ubiquitination are enriched in intrinsically disordered proteins, but some highly disordered proteins can escape rapid degradation. Background Disordered proteins need to be expressed to carry out specified functions; however, their accumulation in the cell can potentially cause major problems through protein misfolding and aggregation. Gene expression levels, mRNA decay rates, microRNA (miRNA) targeting and ubiquitination have critical roles in the degradation and disposal of human proteins and transcripts. Here, we describe a study examining these features to gain insights into the regulation of disordered proteins. Results In comparison with ordered proteins, disordered proteins have a greater proportion of predicted ubiquitination sites. The transcripts encoding disordered proteins also have higher proportions of predicted miRNA target sites and higher mRNA decay rates, both of which are indicative of the observed lower gene expression levels. The results suggest that the disordered proteins and their transcripts are present in the cell at low levels and/or for a short time before being targeted for disposal. Surprisingly, we find that for a significant proportion of highly disordered proteins, all four of these trends are reversed. Predicted estimates for miRNA targets, ubiquitination and mRNA decay rate are low in the highly disordered proteins that are constitutively and/or highly expressed. Conclusions Mechanisms are in place to protect the cell from these potentially dangerous proteins. The evidence suggests that the enrichment of signals for miRNA targeting and ubiquitination may help prevent the accumulation of disordered proteins in the cell. Our data also provide evidence for a mechanism by which a significant proportion of highly disordered proteins (with high expression levels) can escape rapid degradation to allow them to successfully carry out their function.
Collapse
|
233
|
Imamura H, Karro JE, Chuang JH. Weak preservation of local neutral substitution rates across mammalian genomes. BMC Evol Biol 2009; 9:89. [PMID: 19416516 PMCID: PMC2689173 DOI: 10.1186/1471-2148-9-89] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2008] [Accepted: 05/05/2009] [Indexed: 01/06/2023] Open
Abstract
Background The rate at which neutral (non-functional) bases undergo substitution is highly dependent on their location within a genome. However, it is not clear how fast these location-dependent rates change, or to what extent the substitution rate patterns are conserved between lineages. To address this question, which is critical not only for understanding the substitution process but also for evaluating phylogenetic footprinting algorithms, we examine ancestral repeats: a predominantly neutral dataset with a significantly higher genomic density than other datasets commonly used to study substitution rate variation. Using this repeat data, we measure the extent to which orthologous ancestral repeat sequences exhibit similar substitution patterns in separate mammalian lineages, allowing us to ascertain how well local substitution rates have been preserved across species. Results We calculated substitution rates for each ancestral repeat in each of three independent mammalian lineages (primate – from human/macaque alignments, rodent – from mouse/rat alignments, and laurasiatheria – from dog/cow alignments). We then measured the correlation of local substitution rates among these lineages. Overall we found the correlations between lineages to be statistically significant, but too weak to have much predictive power (r2 <5%). These correlations were found to be primarily driven by regional effects at the scale of several hundred kb or larger. A few repeat classes (e.g. 7SK, Charlie8, and MER121) also exhibited stronger conservation of rate patterns, likely due to the effect of repeat-specific purifying selection. These classes should be excluded when estimating local neutral substitution rates. Conclusion Although local neutral substitution rates have some correlations among mammalian species, these correlations have little predictive power on the scale of individual repeats. This indicates that local substitution rates have changed significantly among the lineages we have studied, and are likely to have changed even more for more diverged lineages. The correlations that do persist are too weak to be responsible for many of the highly conserved elements found by phylogenetic footprinting algorithms, leading us to conclude that such elements must be conserved due to selective forces.
Collapse
Affiliation(s)
- Hideo Imamura
- Boston College, Department of Biology, Chestnut Hill, MA 02467, USA.
| | | | | |
Collapse
|
234
|
Marz M, Kirsten T, Stadler PF. Evolution of spliceosomal snRNA genes in metazoan animals. J Mol Evol 2009; 67:594-607. [PMID: 19030770 DOI: 10.1007/s00239-008-9149-6] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2008] [Accepted: 07/14/2008] [Indexed: 11/28/2022]
Abstract
While studies of the evolutionary histories of protein families are commonplace, little is known on noncoding RNAs beyond microRNAs and some snoRNAs. Here we investigate in detail the evolutionary history of the nine spliceosomal snRNA families (U1, U2, U4, U5, U6, U11, U12, U4atac, and U6atac) across the completely or partially sequenced genomes of metazoan animals. Representatives of the five major spliceosomal snRNAs were found in all genomes. None of the minor splicesomal snRNAs were detected in nematodes or in the shotgun traces of Oikopleura dioica, while in all other animal genomes at most one of them is missing. Although snRNAs are present in multiple copies in most genomes, distinguishable paralogue groups are not stable over long evolutionary times, although they appear independently in several clades. In general, animal snRNA secondary structures are highly conserved, albeit, in particular, U11 and U12 in insects exhibit dramatic variations. An analysis of genomic context of snRNAs reveals that they behave like mobile elements, exhibiting very little syntenic conservation.
Collapse
Affiliation(s)
- Manuela Marz
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstrasse 16-18, 04107 Leipzig, Germany.
| | | | | |
Collapse
|
235
|
van der Burgt A, Fiers MWJE, Nap JP, van Ham RCHJ. In silico miRNA prediction in metazoan genomes: balancing between sensitivity and specificity. BMC Genomics 2009; 10:204. [PMID: 19405940 PMCID: PMC2688010 DOI: 10.1186/1471-2164-10-204] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2008] [Accepted: 04/30/2009] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND MicroRNAs (miRNAs), short approximately 21-nucleotide RNA molecules, play an important role in post-transcriptional regulation of gene expression. The number of known miRNA hairpins registered in the miRBase database is rapidly increasing, but recent reports suggest that many miRNAs with restricted temporal or tissue-specific expression remain undiscovered. Various strategies for in silico miRNA identification have been proposed to facilitate miRNA discovery. Notably support vector machine (SVM) methods have recently gained popularity. However, a drawback of these methods is that they do not provide insight into the biological properties of miRNA sequences. RESULTS We here propose a new strategy for miRNA hairpin prediction in which the likelihood that a genomic hairpin is a true miRNA hairpin is evaluated based on statistical distributions of observed biological variation of properties (descriptors) of known miRNA hairpins. These distributions are transformed into a single and continuous outcome classifier called the L score. Using a dataset of known miRNA hairpins from the miRBase database and an exhaustive set of genomic hairpins identified in the genome of Caenorhabditis elegans, a subset of 18 most informative descriptors was selected after detailed analysis of correlation among and discriminative power of individual descriptors. We show that the majority of previously identified miRNA hairpins have high L scores, that the method outperforms miRNA prediction by threshold filtering and that it is more transparent than SVM classifiers. CONCLUSION The L score is applicable as a prediction classifier with high sensitivity for novel miRNA hairpins. The L-score approach can be used to rank and select interesting miRNA hairpin candidates for downstream experimental analysis when coupled to a genome-wide set of in silico-identified hairpins or to facilitate the analysis of large sets of putative miRNA hairpin loci obtained in deep-sequencing efforts of small RNAs. Moreover, the in-depth analyses of miRNA hairpins descriptors preceding and determining the L score outcome could be used as an extension to miRBase entries to help increase the reliability and biological relevance of the miRNA registry.
Collapse
Affiliation(s)
- Ate van der Burgt
- Applied Bioinformatics, Plant Research International, Wageningen University & Research Centre, PO Box 16, 6700 AA Wageningen, The Netherlands
- Laboratory of Bioinformatics, Wageningen University, Dreijenlaan 3, 6703 HA Wageningen, The Netherlands
| | - Mark WJE Fiers
- Applied Bioinformatics, Plant Research International, Wageningen University & Research Centre, PO Box 16, 6700 AA Wageningen, The Netherlands
- Current address: New Zealand Institute for Plant & Food Research Ltd, Private Bag 4704, Christchurch, New Zealand
| | - Jan-Peter Nap
- Applied Bioinformatics, Plant Research International, Wageningen University & Research Centre, PO Box 16, 6700 AA Wageningen, The Netherlands
- Centre for BioSystems Genomics 2012 (CBSG2012), PO Box 98, 6700 AB Wageningen, The Netherlands
| | - Roeland CHJ van Ham
- Applied Bioinformatics, Plant Research International, Wageningen University & Research Centre, PO Box 16, 6700 AA Wageningen, The Netherlands
- Laboratory of Bioinformatics, Wageningen University, Dreijenlaan 3, 6703 HA Wageningen, The Netherlands
| |
Collapse
|
236
|
Glez-Peña D, Gómez-López G, Pisano DG, Fdez-Riverola F. WhichGenes: a web-based tool for gathering, building, storing and exporting gene sets with application in gene set enrichment analysis. Nucleic Acids Res 2009; 37:W329-34. [PMID: 19406925 PMCID: PMC2703947 DOI: 10.1093/nar/gkp263] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
WhichGenes is a web-based interactive gene set building tool offering a very simple interface to extract always-updated gene lists from multiple databases and unstructured biological data sources. While the user can specify new gene sets of interest by following a simple four-step wizard, the tool is able to run several queries in parallel. Every time a new set is generated, it is automatically added to the private gene-set cart and the user is notified by an e-mail containing a direct link to the new set stored in the server. WhichGenes provides functionalities to edit, delete and rename existing sets as well as the capability of generating new ones by combining previous existing sets (intersection, union and difference operators). The user can export his sets configuring the output format and selecting among multiple gene identifiers. In addition to the user-friendly environment, WhichGenes allows programmers to access its functionalities in a programmatic way through a Representational State Transfer web service. WhichGenes front-end is freely available at http://www.whichgenes.org/, WhichGenes API is accessible at http://www.whichgenes.org/api/.
Collapse
Affiliation(s)
- Daniel Glez-Peña
- Higher Technical School of Computer Engineering, University of Vigo, Ourense, Spain
| | | | | | | |
Collapse
|
237
|
Lemay DG, Lynn DJ, Martin WF, Neville MC, Casey TM, Rincon G, Kriventseva EV, Barris WC, Hinrichs AS, Molenaar AJ, Pollard KS, Maqbool NJ, Singh K, Murney R, Zdobnov EM, Tellam RL, Medrano JF, German JB, Rijnkels M. The bovine lactation genome: insights into the evolution of mammalian milk. Genome Biol 2009; 10:R43. [PMID: 19393040 PMCID: PMC2688934 DOI: 10.1186/gb-2009-10-4-r43] [Citation(s) in RCA: 150] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2008] [Revised: 12/17/2008] [Accepted: 04/24/2009] [Indexed: 11/25/2022] Open
Abstract
Comparison of milk protein and mammary genes in the bovine genome with those from other mammals gives insights into the evolution of lactation. Background The newly assembled Bos taurus genome sequence enables the linkage of bovine milk and lactation data with other mammalian genomes. Results Using publicly available milk proteome data and mammary expressed sequence tags, 197 milk protein genes and over 6,000 mammary genes were identified in the bovine genome. Intersection of these genes with 238 milk production quantitative trait loci curated from the literature decreased the search space for milk trait effectors by more than an order of magnitude. Genome location analysis revealed a tendency for milk protein genes to be clustered with other mammary genes. Using the genomes of a monotreme (platypus), a marsupial (opossum), and five placental mammals (bovine, human, dog, mice, rat), gene loss and duplication, phylogeny, sequence conservation, and evolution were examined. Compared with other genes in the bovine genome, milk and mammary genes are: more likely to be present in all mammals; more likely to be duplicated in therians; more highly conserved across Mammalia; and evolving more slowly along the bovine lineage. The most divergent proteins in milk were associated with nutritional and immunological components of milk, whereas highly conserved proteins were associated with secretory processes. Conclusions Although both copy number and sequence variation contribute to the diversity of milk protein composition across species, our results suggest that this diversity is primarily due to other mechanisms. Our findings support the essentiality of milk to the survival of mammalian neonates and the establishment of milk secretory mechanisms more than 160 million years ago.
Collapse
Affiliation(s)
- Danielle G Lemay
- Department of Food Science and Technology, University of California Davis, One Shields Avenue, Davis, CA 95616, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
238
|
Vega VB, Woo XY, Hamidi H, Yeo HC, Yeo ZX, Bourque G, Clarke ND. Inferring direct regulatory targets of a transcription factor in the DREAM2 challenge. Ann N Y Acad Sci 2009; 1158:215-23. [PMID: 19348643 DOI: 10.1111/j.1749-6632.2008.03759.x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In the DREAM2 community-wide experiment on regulatory network inference, one of the challenges was to identify which genes, in a list of 200, are direct regulatory targets of the transcription factor BCL6. The organizers of the challenge defined targets based on gene expression and chromatin immunoprecipitation experiments (ChIP-chip). The expression data were publicly available; the ChIP-chip data were not. In order to assess the likelihood that a gene is a BCL6 target, we used three classes of information: expression-level differences, over-representation of sequence motifs in promoter regions, and gene ontology annotations. A weight was attached to each analysis based on how well it identified BCL6-bound genes as defined by publicly available ChIP-chip data. By the organizers' criteria, our group, GenomeSingapore, performed best. However, our retrospective analysis indicates that this success was dominated by a gene expression analysis that was predicated on a regulatory model known to be favored by the organizers. We also noted that the 200-gene test set was enriched only in genes that are upregulated, while genes bound by BCL6 are enriched in both upregulated and downregulated genes. Together, these observations suggest possible model biases in the selection of the gold-standard gene set and imply that our success was attained in part by adhering to the same assumptions. We argue that model biases of this type are unavoidable in the inference of regulatory networks and, for that reason, we suggest that future community-wide experiments of this type should focus on the prediction of data, rather than models.
Collapse
|
239
|
A practical guide to the MaxQuant computational platform for SILAC-based quantitative proteomics. Nat Protoc 2009; 4:698-705. [DOI: 10.1038/nprot.2009.36] [Citation(s) in RCA: 645] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
240
|
Hariharan M, Scaria V, Brahmachari SK. dbSMR: a novel resource of genome-wide SNPs affecting microRNA mediated regulation. BMC Bioinformatics 2009; 10:108. [PMID: 19371411 PMCID: PMC2676258 DOI: 10.1186/1471-2105-10-108] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2008] [Accepted: 04/16/2009] [Indexed: 11/24/2022] Open
Abstract
Background MicroRNAs (miRNAs) regulate several biological processes through post-transcriptional gene silencing. The efficiency of binding of miRNAs to target transcripts depends on the sequence as well as intramolecular structure of the transcript. Single Nucleotide Polymorphisms (SNPs) can contribute to alterations in the structure of regions flanking them, thereby influencing the accessibility for miRNA binding. Description The entire human genome was analyzed for SNPs in and around predicted miRNA target sites. Polymorphisms within 200 nucleotides that could alter the intramolecular structure at the target site, thereby altering regulation were annotated. Collated information was ported in a MySQL database with a user-friendly interface accessible through the URL: . Conclusion The database has a user-friendly interface where the information can be queried using either the gene name, microRNA name, polymorphism ID or transcript ID. Combination queries using 'AND' or 'OR' is also possible along with specifying the degree of change of intramolecular bonding with and without the polymorphism. Such a resource would enable researchers address questions like the role of regulatory SNPs in the 3' UTRs and population specific regulatory modulations in the context of microRNA targets.
Collapse
Affiliation(s)
- Manoj Hariharan
- GN Ramachandran Knowledge Center for Genome Informatics, Institute of Genomics and Integrative Biology (CSIR), Delhi, India.
| | | | | |
Collapse
|
241
|
Schäche M, Chen CY, Pertile KK, Richardson AJ, Dirani M, Mitchell P, Baird PN. Fine mapping linkage analysis identifies a novel susceptibility locus for myopia on chromosome 2q37 adjacent to but not overlapping MYP12. Mol Vis 2009; 15:722-30. [PMID: 19365569 PMCID: PMC2666771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2009] [Accepted: 04/05/2009] [Indexed: 11/09/2022] Open
Abstract
PURPOSE Myopia (shortsightedness) is one of the most common ocular conditions worldwide and results in blurred distance vision. It is a complex trait influenced by both genetic and environmental factors. We have previously reported linkage of myopia to a 13.01 cM region of chromosome 2q37 in three large multigenerational Australian families that initially overlapped with the known myopia locus, MYP12. The purpose of this study was to perform fine mapping of this region and identify single nucleotide polymorphisms (SNPs) associated with myopia. METHODS Fine mapping linkage analysis was performed on three multigenerational families with common myopia to refine the previously mapped critical interval. SNPs in the region were also genotyped to assess for association with myopia using an independent case-control cohort. RESULTS The disease interval was refined to a 1.83 cM region that is adjacent to rather than overlapping with the MYP12 locus. Subsequent sequencing of all known and hypothetical genes as well as an association study using an independent myopia case-control cohort showed suggestive but not statistically significant association to two intronic SNPs. CONCLUSIONS We have identified a novel locus for common myopia on chromosome 2q37.
Collapse
Affiliation(s)
- Maria Schäche
- Centre for Eye Research Australia, University of Melbourne, Royal Victorian Eye & Ear Hospital, Melbourne, Australia,Vision Cooperative Research Centre, Sydney, Australia
| | - Christine Y. Chen
- Centre for Eye Research Australia, University of Melbourne, Royal Victorian Eye & Ear Hospital, Melbourne, Australia,Vision Cooperative Research Centre, Sydney, Australia
| | - Kelly Kathleen Pertile
- Centre for Eye Research Australia, University of Melbourne, Royal Victorian Eye & Ear Hospital, Melbourne, Australia,Vision Cooperative Research Centre, Sydney, Australia
| | - Andrea Jane Richardson
- Centre for Eye Research Australia, University of Melbourne, Royal Victorian Eye & Ear Hospital, Melbourne, Australia,Vision Cooperative Research Centre, Sydney, Australia
| | - Mohamed Dirani
- Centre for Eye Research Australia, University of Melbourne, Royal Victorian Eye & Ear Hospital, Melbourne, Australia,Vision Cooperative Research Centre, Sydney, Australia
| | - Paul Mitchell
- Vision Cooperative Research Centre, Sydney, Australia,Centre for Vision Research, Department of Ophthalmology, Westmead Millennium Institute, University of Sydney, Westmead, Australia
| | - Paul Nigel Baird
- Centre for Eye Research Australia, University of Melbourne, Royal Victorian Eye & Ear Hospital, Melbourne, Australia,Vision Cooperative Research Centre, Sydney, Australia
| |
Collapse
|
242
|
Nett IRE, Martin DMA, Miranda-Saavedra D, Lamont D, Barber JD, Mehlert A, Ferguson MAJ. The phosphoproteome of bloodstream form Trypanosoma brucei, causative agent of African sleeping sickness. Mol Cell Proteomics 2009; 8:1527-38. [PMID: 19346560 PMCID: PMC2716717 DOI: 10.1074/mcp.m800556-mcp200] [Citation(s) in RCA: 130] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
The protozoan parasite Trypanosoma brucei is the causative agent
of human African sleeping sickness and related animal diseases, and it has over
170 predicted protein kinases. Protein phosphorylation is a key regulatory
mechanism for cellular function that, thus far, has been studied in
T.brucei principally through putative kinase mRNA knockdown
and observation of the resulting phenotype. However, despite the relatively
large kinome of this organism and the demonstrated essentiality of several
T. brucei kinases, very few specific phosphorylation sites
have been determined in this organism. Using a gel-free, phosphopeptide
enrichment-based proteomics approach we performed the first large scale
phosphorylation site analyses for T.brucei. Serine, threonine,
and tyrosine phosphorylation sites were determined for a cytosolic protein
fraction of the bloodstream form of the parasite, resulting in the
identification of 491 phosphoproteins based on the identification of 852 unique
phosphopeptides and 1204 phosphorylation sites. The phosphoproteins detected in
this study are predicted from their genome annotations to participate in a wide
variety of biological processes, including signal transduction, processing of
DNA and RNA, protein synthesis, and degradation and to a minor extent in
metabolic pathways. The analysis of phosphopeptides and phosphorylation sites
was facilitated by in-house developed software, and this automated approach was
validated by manual annotation of spectra of the kinase subset of proteins.
Analysis of the cytosolic bloodstream form T. brucei kinome
revealed the presence of 44 phosphorylated protein kinases in our data set that
could be classified into the major eukaryotic protein kinase groups by applying
a multilevel hidden Markov model library of the kinase catalytic domain.
Identification of the kinase phosphorylation sites showed conserved
phosphorylation sequence motifs in several kinase activation segments,
supporting the view that phosphorylation-based signaling is a general and
fundamental regulatory process that extends to this highly divergent lower
eukaryote.
Collapse
Affiliation(s)
- Isabelle R E Nett
- Division of Biological Chemistry and Drug Discovery, College of Life Sciences, University of Dundee, Dundee, Scotland, United Kingdom
| | | | | | | | | | | | | |
Collapse
|
243
|
Uhlén M, Hober S. Generation and validation of affinity reagents on a proteome-wide level. J Mol Recognit 2009; 22:57-64. [PMID: 18546091 DOI: 10.1002/jmr.891] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
There is a need for protein-specific affinity reagents to explore the gene products encoded by the genome. Recently, systematic efforts to generate validated affinity reagents on a whole human proteome level have been initiated. There are several issues for such efforts, including choice of antigen, type of affinity reagent, and the subsequent validation of the generated protein-specific binders. The advantages and disadvantages with the different approaches are discussed and the problems related to quality assessment of antibodies to be used in multi-platform applications are addressed. This review also describes the efforts to create a virtual resource of validated antibodies using a community-based portal and summarizes the status and visions for the publicly available human protein atlas (http://www.proteinatlas.org) showing the human protein profiles in a large number of normal and cancer tissues as well as a large set of human cell lines.
Collapse
Affiliation(s)
- Mathias Uhlén
- Department of Proteomics, School of Biotechnology, Royal Institute of Technology (KTH), AlbaNova University Center, Stockholm, Sweden.
| | | |
Collapse
|
244
|
DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am J Hum Genet 2009; 84:524-33. [PMID: 19344873 DOI: 10.1016/j.ajhg.2009.03.010] [Citation(s) in RCA: 1414] [Impact Index Per Article: 94.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2008] [Revised: 03/03/2009] [Accepted: 03/13/2009] [Indexed: 01/08/2023] Open
Abstract
Many patients suffering from developmental disorders harbor submicroscopic deletions or duplications that, by affecting the copy number of dosage-sensitive genes or disrupting normal gene expression, lead to disease. However, many aberrations are novel or extremely rare, making clinical interpretation problematic and genotype-phenotype correlations uncertain. Identification of patients sharing a genomic rearrangement and having phenotypic features in common leads to greater certainty in the pathogenic nature of the rearrangement and enables new syndromes to be defined. To facilitate the analysis of these rare events, we have developed an interactive web-based database called DECIPHER (Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources) which incorporates a suite of tools designed to aid the interpretation of submicroscopic chromosomal imbalance, inversions, and translocations. DECIPHER catalogs common copy-number changes in normal populations and thus, by exclusion, enables changes that are novel and potentially pathogenic to be identified. DECIPHER enhances genetic counseling by retrieving relevant information from a variety of bioinformatics resources. Known and predicted genes within an aberration are listed in the DECIPHER patient report, and genes of recognized clinical importance are highlighted and prioritized. DECIPHER enables clinical scientists worldwide to maintain records of phenotype and chromosome rearrangement for their patients and, with informed consent, share this information with the wider clinical research community through display in the genome browser Ensembl. By sharing cases worldwide, clusters of rare cases having phenotype and structural rearrangement in common can be identified, leading to the delineation of new syndromes and furthering understanding of gene function.
Collapse
|
245
|
Desmet FO, Hamroun D, Lalande M, Collod-Béroud G, Claustres M, Béroud C. Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res 2009; 37:e67. [PMID: 19339519 PMCID: PMC2685110 DOI: 10.1093/nar/gkp215] [Citation(s) in RCA: 2013] [Impact Index Per Article: 134.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Thousands of mutations are identified yearly. Although many directly affect protein expression, an increasing proportion of mutations is now believed to influence mRNA splicing. They mostly affect existing splice sites, but synonymous, non-synonymous or nonsense mutations can also create or disrupt splice sites or auxiliary cis-splicing sequences. To facilitate the analysis of the different mutations, we designed Human Splicing Finder (HSF), a tool to predict the effects of mutations on splicing signals or to identify splicing motifs in any human sequence. It contains all available matrices for auxiliary sequence prediction as well as new ones for binding sites of the 9G8 and Tra2-β Serine-Arginine proteins and the hnRNP A1 ribonucleoprotein. We also developed new Position Weight Matrices to assess the strength of 5′ and 3′ splice sites and branch points. We evaluated HSF efficiency using a set of 83 intronic and 35 exonic mutations known to result in splicing defects. We showed that the mutation effect was correctly predicted in almost all cases. HSF could thus represent a valuable resource for research, diagnostic and therapeutic (e.g. therapeutic exon skipping) purposes as well as for global studies, such as the GEN2PHEN European Project or the Human Variome Project.
Collapse
|
246
|
Wang T, Furey TS. Analysis of complex disease association and linkage studies using the University of California Santa Cruz Genome Browser. CIRCULATION. CARDIOVASCULAR GENETICS 2009; 2:199-204. [PMID: 20031585 PMCID: PMC2798134 DOI: 10.1161/circgenetics.108.843946] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Affiliation(s)
- Tianyuan Wang
- Center for Human Genetics, Duke University Medical Center
| | | |
Collapse
|
247
|
Greco D, Volpicelli F, Di Lieto A, Leo D, Perrone-Capano C, Auvinen P, di Porzio U. Comparison of gene expression profile in embryonic mesencephalon and neuronal primary cultures. PLoS One 2009; 4:e4977. [PMID: 19305503 PMCID: PMC2654915 DOI: 10.1371/journal.pone.0004977] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2008] [Accepted: 02/26/2009] [Indexed: 11/24/2022] Open
Abstract
In the mammalian central nervous system (CNS) an important contingent of dopaminergic neurons are localized in the substantia nigra and in the ventral tegmental area of the ventral midbrain. They constitute an anatomically and functionally heterogeneous group of cells involved in a variety of regulatory mechanisms, from locomotion to emotional/motivational behavior. Midbrain dopaminergic neuron (mDA) primary cultures represent a useful tool to study molecular mechanisms involved in their development and maintenance. Considerable information has been gathered on the mDA neurons development and maturation in vivo, as well as on the molecular features of mDA primary cultures. Here we investigated in detail the gene expression differences between the tissue of origin and ventral midbrain primary cultures enriched in mDA neurons, using microarray technique. We integrated the results based on different re-annotations of the microarray probes. By using knowledge-based gene network techniques and promoter sequence analysis, we also uncovered mechanisms that might regulate the expression of CNS genes involved in the definition of the identity of specific cell types in the ventral midbrain. We integrate bioinformatics and functional genomics, together with developmental neurobiology. Moreover, we propose guidelines for the computational analysis of microarray gene expression data. Our findings help to clarify some molecular aspects of the development and differentiation of DA neurons within the midbrain.
Collapse
Affiliation(s)
- Dario Greco
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
| | | | | | | | | | | | | |
Collapse
|
248
|
From cancer genomes to cancer models: bridging the gaps. EMBO Rep 2009; 10:359-66. [PMID: 19305388 DOI: 10.1038/embor.2009.46] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2008] [Accepted: 02/23/2009] [Indexed: 11/08/2022] Open
Abstract
Cancer genome projects are now being expanded in an attempt to provide complete landscapes of the mutations that exist in tumours. Although the importance of cataloguing genome variations is well recognized, there are obvious difficulties in bridging the gaps between high-throughput resequencing information and the molecular mechanisms of cancer evolution. Here, we describe the current status of the high-throughput genomic technologies, and the current limitations of the associated computational analysis and experimental validation of cancer genetic variants. We emphasize how the current cancer-evolution models will be influenced by the high-throughput approaches, in particular through efforts devoted to monitoring tumour progression, and how, in turn, the integration of data and models will be translated into mechanistic knowledge and clinical applications.
Collapse
|
249
|
Reeves GA, Talavera D, Thornton JM. Genome and proteome annotation: organization, interpretation and integration. J R Soc Interface 2009; 6:129-47. [PMID: 19019817 DOI: 10.1098/rsif.2008.0341] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Recent years have seen a huge increase in the generation of genomic and proteomic data. This has been due to improvements in current biological methodologies, the development of new experimental techniques and the use of computers as support tools. All these raw data are useless if they cannot be properly analysed, annotated, stored and displayed. Consequently, a vast number of resources have been created to present the data to the wider community. Annotation tools and databases provide the means to disseminate these data and to comprehend their biological importance. This review examines the various aspects of annotation: type, methodology and availability. Moreover, it puts a special interest on novel annotation fields, such as that of phenotypes, and highlights the recent efforts focused on the integrating annotations.
Collapse
Affiliation(s)
- Gabrielle A Reeves
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
250
|
Recent developments in StemBase: a tool to study gene expression in human and murine stem cells. BMC Res Notes 2009; 2:39. [PMID: 19284540 PMCID: PMC2660910 DOI: 10.1186/1756-0500-2-39] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2008] [Accepted: 03/10/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Currently one of the largest online repositories for human and mouse stem cell gene expression data, StemBase was first designed as a simple web-interface to DNA microarray data generated by the Canadian Stem Cell Network to facilitate the discovery of gene functions relevant to stem cell control and differentiation. FINDINGS Since its creation, StemBase has grown in both size and scope into a system with analysis tools that examine either the whole database at once, or slices of data, based on tissue type, cell type or gene of interest. As of September 1, 2008, StemBase contains gene expression data (microarray and Serial Analysis of Gene Expression) from 210 stem cell samples in 60 different experiments. CONCLUSION StemBase can be used to study gene expression in human and murine stem cells and is available at http://www.stembase.ca.
Collapse
|