51
|
Nabholz B, Sarah G, Sabot F, Ruiz M, Adam H, Nidelet S, Ghesquière A, Santoni S, David J, Glémin S. Transcriptome population genomics reveals severe bottleneck and domestication cost in the African rice (Oryza glaberrima). Mol Ecol 2014; 23:2210-27. [PMID: 24684265 DOI: 10.1111/mec.12738] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Accepted: 03/19/2014] [Indexed: 12/17/2022]
Abstract
The African cultivated rice (Oryza glaberrima) was domesticated in West Africa 3000 years ago. Although less cultivated than the Asian rice (O. sativa), O. glaberrima landraces often display interesting adaptation to rustic environment (e.g. drought). Here, using RNA-seq technology, we were able to compare more than 12,000 transcripts between 9 O. glaberrima, 10 wild O. barthii and one O. meridionalis individuals. With a synonymous nucleotide diversity πs = 0.0006 per site, O. glaberrima appears as the least genetically diverse crop grass ever documented. Using approximate Bayesian computation, we estimated that O. glaberrima experienced a severe bottleneck during domestication. This demographic scenario almost fully accounts for the pattern of genetic diversity across O. glaberrima genome as we detected very few outliers regions where positive selection may have further impacted genetic diversity. Moreover, the large excess of derived nonsynonymous substitution that we detected suggests that the O. glaberrima population suffered from the 'cost of domestication'. In addition, we used this genome-scale data set to demonstrate that (i) O. barthii genetic diversity is positively correlated with recombination rate and negatively with gene density, (ii) expression level is negatively correlated with evolutionary constraint, and (iii) one region on chromosome 5 (position 4-6 Mb) exhibits a clear signature of introgression with a yet unidentified Oryza species. This work represents the first genome-wide survey of the African rice genetic diversity and paves the way for further comparison between the African and the Asian rice, notably regarding the genetics underlying domestication traits.
Collapse
Affiliation(s)
- Benoit Nabholz
- Institut des Sciences de l'Evolution-Montpellier, UMR CNRS-UM2 5554, University Montpellier II, Montpellier, France; UMR AGAP 1334, Montpellier SupAgro, Montpellier, France
| | | | | | | | | | | | | | | | | | | |
Collapse
|
52
|
Coman D, Altenhoff A, Zoller S, Gruissem W, Vranová E. Distinct evolutionary strategies in the GGPPS family from plants. FRONTIERS IN PLANT SCIENCE 2014; 5:230. [PMID: 24904625 PMCID: PMC4034038 DOI: 10.3389/fpls.2014.00230] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2014] [Accepted: 05/09/2014] [Indexed: 05/07/2023]
Abstract
Multiple geranylgeranyl diphosphate synthases (GGPPS) for biosynthesis of geranylgeranyl diphosphate (GGPP) exist in plants. GGPP is produced in the isoprenoid pathway and is a central precursor for various primary and specialized plant metabolites. Therefore, its biosynthesis is an essential regulatory point in the isoprenoid pathway. We selected 119 GGPPSs from 48 species representing all major plant lineages, based on stringent homology criteria. After the diversification of land plants, the number of GGPPS paralogs per species increases. Already in the moss Physcomitrella patens, GGPPS appears to be encoded by multiple paralogous genes. In gymnosperms, neofunctionalization of GGPPS may have enabled optimized biosynthesis of primary and specialized metabolites. Notably, lineage-specific expansion of GGPPS occurred in land plants. As a representative species we focused here on Arabidopsis thaliana, which retained the highest number of GGPPS paralogs (twelve) among the 48 species we considered in this study. Our results show that the A. thaliana GGPPS gene family is an example of evolution involving neo- and subfunctionalization as well as pseudogenization. We propose subfunctionalization as one of the main mechanisms allowing the maintenance of multiple GGPPS paralogs in A. thaliana genome. Accordingly, the changes in the expression patterns of the GGPPS paralogs occurring after gene duplication led to developmental and/or condition specific functional evolution.
Collapse
Affiliation(s)
- Diana Coman
- Department of Biology, ETH ZurichZurich, Switzerland
| | - Adrian Altenhoff
- Department of Computer Science, ETH ZurichZurich, Switzerland
- Swiss Institute of BioinformaticsZurich, Switzerland
| | - Stefan Zoller
- Department of Computer Science, ETH ZurichZurich, Switzerland
- Swiss Institute of BioinformaticsZurich, Switzerland
| | | | - Eva Vranová
- Department of Biology, ETH ZurichZurich, Switzerland
- Institute of Biology and Ecology, Pavol Jozef Šafárik UniversityKošice, Slovakia
- *Correspondence: Eva Vranová, Faculty of Science, Institute of Biology and Ecology, Pavol Jozef Šafárik University in Košice, Mánesova 23, Košice, 04154, Slovakia e-mail:
| |
Collapse
|
53
|
Lovell PV, Wirthlin M, Wilhelm L, Minx P, Lazar NH, Carbone L, Warren WC, Mello CV. Conserved syntenic clusters of protein coding genes are missing in birds. Genome Biol 2014; 15:565. [PMID: 25518852 PMCID: PMC4290089 DOI: 10.1186/s13059-014-0565-1] [Citation(s) in RCA: 87] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2014] [Accepted: 12/08/2014] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Birds are one of the most highly successful and diverse groups of vertebrates, having evolved a number of distinct characteristics, including feathers and wings, a sturdy lightweight skeleton and unique respiratory and urinary/excretion systems. However, the genetic basis of these traits is poorly understood. RESULTS Using comparative genomics based on extensive searches of 60 avian genomes, we have found that birds lack approximately 274 protein coding genes that are present in the genomes of most vertebrate lineages and are for the most part organized in conserved syntenic clusters in non-avian sauropsids and in humans. These genes are located in regions associated with chromosomal rearrangements, and are largely present in crocodiles, suggesting that their loss occurred subsequent to the split of dinosaurs/birds from crocodilians. Many of these genes are associated with lethality in rodents, human genetic disorders, or biological functions targeting various tissues. Functional enrichment analysis combined with orthogroup analysis and paralog searches revealed enrichments that were shared by non-avian species, present only in birds, or shared between all species. CONCLUSIONS Together these results provide a clearer definition of the genetic background of extant birds, extend the findings of previous studies on missing avian genes, and provide clues about molecular events that shaped avian evolution. They also have implications for fields that largely benefit from avian studies, including development, immune system, oncogenesis, and brain function and cognition. With regards to the missing genes, birds can be considered ‘natural knockouts’ that may become invaluable model organisms for several human diseases.
Collapse
Affiliation(s)
- Peter V Lovell
- />Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR USA
| | - Morgan Wirthlin
- />Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR USA
| | - Larry Wilhelm
- />Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR USA
- />Oregon National Primate Research Center, West Campus, Oregon Health and Science University, Portland, OR USA
| | - Patrick Minx
- />The Genome Institute, Washington University School of Medicine, St. Louis, MO USA
| | - Nathan H Lazar
- />Oregon National Primate Research Center, West Campus, Oregon Health and Science University, Portland, OR USA
- />Bioinformatics and Computational Biology Division, Oregon Health & Science University, Portland, OR USA
| | - Lucia Carbone
- />Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR USA
- />Oregon National Primate Research Center, West Campus, Oregon Health and Science University, Portland, OR USA
| | - Wesley C Warren
- />The Genome Institute, Washington University School of Medicine, St. Louis, MO USA
| | - Claudio V Mello
- />Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR USA
| |
Collapse
|
54
|
Shahzad Z, Ranwez V, Fizames C, Marquès L, Le Martret B, Alassimone J, Godé C, Lacombe E, Castillo T, Saumitou-Laprade P, Berthomieu P, Gosti F. Plant Defensin type 1 (PDF1): protein promiscuity and expression variation within the Arabidopsis genus shed light on zinc tolerance acquisition in Arabidopsis halleri. THE NEW PHYTOLOGIST 2013; 200:820-833. [PMID: 23865749 DOI: 10.1111/nph.12396] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2013] [Accepted: 05/28/2013] [Indexed: 05/11/2023]
Abstract
Plant defensins are recognized for their antifungal properties. However, a few type 1 defensins (PDF1s) were identified for their cellular zinc (Zn) tolerance properties after a study of the metal extremophile Arabidopsis halleri. In order to investigate whether different paralogues would display specialized functions, the A. halleri PDF1 family was characterized at the functional and genomic levels. Eleven PDF1s were isolated from A. halleri. Their ability to provide Zn tolerance in yeast cells, their activity against Fusarium oxysporum f. sp. melonii, and their level of expression in planta were compared with those of the seven A. thaliana PDF1s. The genomic organization of the PDF1 family was comparatively analysed within the Arabidopsis genus. AhPDF1s and AtPDF1s were able to confer Zn tolerance and AhPDF1s also displayed antifungal activity. PDF1 transcripts were constitutively more abundant in A. halleri than in A. thaliana. Within the Arabidopsis genus, the PDF1 family is evolutionarily dynamic, in terms of gain and loss of gene copy. Arabidopsis halleri PDF1s display no superior abilities to provide Zn tolerance. A constitutive increase in AhPDF1 transcript accumulation is proposed to be an evolutionary innovation co-opting the promiscuous PDF1 protein for its contribution to Zn tolerance in A. halleri.
Collapse
Affiliation(s)
- Zaigham Shahzad
- Biochimie et Physiologie Moléculaire des Plantes, Unité Mixte de Recherche Montpellier, SupAgro/CNRS/INRA/Université Montpellier II, 2 Place Viala, F-34060, Montpellier Cedex 1, France
| | - Vincent Ranwez
- Montpellier SupAgro, UMR AGAP, F-34060, Montpellier, France
| | - Cécile Fizames
- Biochimie et Physiologie Moléculaire des Plantes, Unité Mixte de Recherche Montpellier, SupAgro/CNRS/INRA/Université Montpellier II, 2 Place Viala, F-34060, Montpellier Cedex 1, France
| | - Laurence Marquès
- Biochimie et Physiologie Moléculaire des Plantes, Unité Mixte de Recherche Montpellier, SupAgro/CNRS/INRA/Université Montpellier II, 2 Place Viala, F-34060, Montpellier Cedex 1, France
| | - Bénédicte Le Martret
- Biochimie et Physiologie Moléculaire des Plantes, Unité Mixte de Recherche Montpellier, SupAgro/CNRS/INRA/Université Montpellier II, 2 Place Viala, F-34060, Montpellier Cedex 1, France
| | - Julien Alassimone
- Biochimie et Physiologie Moléculaire des Plantes, Unité Mixte de Recherche Montpellier, SupAgro/CNRS/INRA/Université Montpellier II, 2 Place Viala, F-34060, Montpellier Cedex 1, France
| | - Cécile Godé
- Laboratoire de Génétique et Evolution des Populations Végétales, UMR CNRS 8016, Université des Sciences et Technologies de Lille, Lille1, F-59655, Villeneuve d'Ascq Cedex, France
| | - Eric Lacombe
- Biochimie et Physiologie Moléculaire des Plantes, Unité Mixte de Recherche Montpellier, SupAgro/CNRS/INRA/Université Montpellier II, 2 Place Viala, F-34060, Montpellier Cedex 1, France
| | - Teddy Castillo
- Biochimie et Physiologie Moléculaire des Plantes, Unité Mixte de Recherche Montpellier, SupAgro/CNRS/INRA/Université Montpellier II, 2 Place Viala, F-34060, Montpellier Cedex 1, France
| | - Pierre Saumitou-Laprade
- Laboratoire de Génétique et Evolution des Populations Végétales, UMR CNRS 8016, Université des Sciences et Technologies de Lille, Lille1, F-59655, Villeneuve d'Ascq Cedex, France
| | - Pierre Berthomieu
- Biochimie et Physiologie Moléculaire des Plantes, Unité Mixte de Recherche Montpellier, SupAgro/CNRS/INRA/Université Montpellier II, 2 Place Viala, F-34060, Montpellier Cedex 1, France
| | - Françoise Gosti
- Biochimie et Physiologie Moléculaire des Plantes, Unité Mixte de Recherche Montpellier, SupAgro/CNRS/INRA/Université Montpellier II, 2 Place Viala, F-34060, Montpellier Cedex 1, France
| |
Collapse
|
55
|
Livesay SB, Collier SE, Bitton DA, Bähler J, Ohi MD. Structural and functional characterization of the N terminus of Schizosaccharomyces pombe Cwf10. EUKARYOTIC CELL 2013; 12:1472-89. [PMID: 24014766 PMCID: PMC3837936 DOI: 10.1128/ec.00140-13] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Accepted: 09/03/2013] [Indexed: 01/10/2023]
Abstract
The spliceosome is a dynamic macromolecular machine that catalyzes the removal of introns from pre-mRNA, yielding mature message. Schizosaccharomyces pombe Cwf10 (homolog of Saccharomyces cerevisiae Snu114 and human U5-116K), an integral member of the U5 snRNP, is a GTPase that has multiple roles within the splicing cycle. Cwf10/Snu114 family members are highly homologous to eukaryotic translation elongation factor EF2, and they contain a conserved N-terminal extension (NTE) to the EF2-like portion, predicted to be an intrinsically unfolded domain. Using S. pombe as a model system, we show that the NTE is not essential, but cells lacking this domain are defective in pre-mRNA splicing. Genetic interactions between cwf10-ΔNTE and other pre-mRNA splicing mutants are consistent with a role for the NTE in spliceosome activation and second-step catalysis. Characterization of Cwf10-NTE by various biophysical techniques shows that in solution the NTE contains regions of both structure and disorder. The first 23 highly conserved amino acids of the NTE are essential for its role in splicing but when overexpressed are not sufficient to restore pre-mRNA splicing to wild-type levels in cwf10-ΔNTE cells. When the entire NTE is overexpressed in the cwf10-ΔNTE background, it can complement the truncated Cwf10 protein in trans, and it immunoprecipitates a complex similar in composition to the late-stage U5.U2/U6 spliceosome. These data show that the structurally flexible NTE is capable of independently incorporating into the spliceosome and improving splicing function, possibly indicating a role for the NTE in stabilizing conformational rearrangements during a splice cycle.
Collapse
Affiliation(s)
- S. Brent Livesay
- Department of Cell and Developmental Biology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Scott E. Collier
- Department of Cell and Developmental Biology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Danny A. Bitton
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Jürg Bähler
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Melanie D. Ohi
- Department of Cell and Developmental Biology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
56
|
Zimmer AD, Lang D, Buchta K, Rombauts S, Nishiyama T, Hasebe M, Van de Peer Y, Rensing SA, Reski R. Reannotation and extended community resources for the genome of the non-seed plant Physcomitrella patens provide insights into the evolution of plant gene structures and functions. BMC Genomics 2013; 14:498. [PMID: 23879659 PMCID: PMC3729371 DOI: 10.1186/1471-2164-14-498] [Citation(s) in RCA: 136] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2013] [Accepted: 07/19/2013] [Indexed: 11/24/2022] Open
Abstract
Background The moss Physcomitrella patens as a model species provides an important reference for early-diverging lineages of plants and the release of the genome in 2008 opened the doors to genome-wide studies. The usability of a reference genome greatly depends on the quality of the annotation and the availability of centralized community resources. Therefore, in the light of accumulating evidence for missing genes, fragmentary gene structures, false annotations and a low rate of functional annotations on the original release, we decided to improve the moss genome annotation. Results Here, we report the complete moss genome re-annotation (designated V1.6) incorporating the increased transcript availability from a multitude of developmental stages and tissue types. We demonstrate the utility of the improved P. patens genome annotation for comparative genomics and new extensions to the cosmoss.org resource as a central repository for this plant “flagship” genome. The structural annotation of 32,275 protein-coding genes results in 8387 additional loci including 1456 loci with known protein domains or homologs in Plantae. This is the first release to include information on transcript isoforms, suggesting alternative splicing events for at least 10.8% of the loci. Furthermore, this release now also provides information on non-protein-coding loci. Functional annotations were improved regarding quality and coverage, resulting in 58% annotated loci (previously: 41%) that comprise also 7200 additional loci with GO annotations. Access and manual curation of the functional and structural genome annotation is provided via the http://www.cosmoss.org model organism database. Conclusions Comparative analysis of gene structure evolution along the green plant lineage provides novel insights, such as a comparatively high number of loci with 5’-UTR introns in the moss. Comparative analysis of functional annotations reveals expansions of moss house-keeping and metabolic genes and further possibly adaptive, lineage-specific expansions and gains including at least 13% orphan genes.
Collapse
Affiliation(s)
- Andreas D Zimmer
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Schaenzlestrasse 1, 79104, Freiburg, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
57
|
Kumar A, Kamaraj B, Sethumadhavan R, Purohit R. Evolution driven structural changes in CENP-E motor domain. Interdiscip Sci 2013; 5:102-11. [DOI: 10.1007/s12539-013-0137-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2012] [Revised: 10/19/2012] [Accepted: 10/29/2012] [Indexed: 12/13/2022]
|
58
|
Abstract
MOTIVATION The standard genetic code translates 61 codons into 20 amino acids using fewer than 61 transfer RNAs (tRNAs). This is possible because of the tRNA's ability to 'wobble' at the third base to decode more than one codon. Although the anticodon-codon mapping of tRNA to mRNA is a prerequisite for certain codon usage indices and can contribute to the understanding of the evolution of alternative genetic codes, it is usually not determined experimentally because such assays are prohibitively expensive and elaborate. Instead, the codon reading is approximated from theoretical inferences of nucleotide binding, the wobble rules. Unfortunately, these rules fail to capture all of the nuances of codon reading. This study addresses the codon reading properties of tRNAs and their evolutionary impact on codon usage bias. RESULTS Using three different computational methods, the signal of tRNA decoding in codon usage bias is identified. The predictions by the methods generally agree with each other and compare well with experimental evidence of codon reading. This analysis suggests a revised codon reading for cytosolic tRNA in the yeast genome (Saccharomyces cerevisiae) that is more accurate than the common assignment by wobble rules. The results confirm the earlier observation that the wobble rules are not sufficient for a complete description of codon reading, because they depend on genome-specific factors. The computational methods presented here are applicable to any fully sequenced genome. AVAILABILITY By request from the author. CONTACT alexander.roth@isb-sib.ch.
Collapse
Affiliation(s)
- Alexander C Roth
- Swiss Institute of Bioinformatics, and Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland.
| |
Collapse
|
59
|
McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N, Cowley AP, Lopez R. Analysis Tool Web Services from the EMBL-EBI. Nucleic Acids Res 2013; 41:W597-600. [PMID: 23671338 PMCID: PMC3692137 DOI: 10.1093/nar/gkt376] [Citation(s) in RCA: 1206] [Impact Index Per Article: 109.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Since 2004 the European Bioinformatics Institute (EMBL-EBI) has provided access to a wide range of databases and analysis tools via Web Services interfaces. This comprises services to search across the databases available from the EMBL-EBI and to explore the network of cross-references present in the data (e.g. EB-eye), services to retrieve entry data in various data formats and to access the data in specific fields (e.g. dbfetch), and analysis tool services, for example, sequence similarity search (e.g. FASTA and NCBI BLAST), multiple sequence alignment (e.g. Clustal Omega and MUSCLE), pairwise sequence alignment and protein functional analysis (e.g. InterProScan and Phobius). The REST/SOAP Web Services (http://www.ebi.ac.uk/Tools/webservices/) interfaces to these databases and tools allow their integration into other tools, applications, web sites, pipeline processes and analytical workflows. To get users started using the Web Services, sample clients are provided covering a range of programming languages and popular Web Service tool kits, and a brief guide to Web Services technologies, including a set of tutorials, is available for those wishing to learn more and develop their own clients. Users of the Web Services are informed of improvements and updates via a range of methods.
Collapse
Affiliation(s)
- Hamish McWilliam
- EMBL Outstation-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD Cambridge, UK
| | | | | | | | | | | | | | | |
Collapse
|
60
|
Gu W, Wang X, Zhai C, Zhou T, Xie X. Biological basis of miRNA action when their targets are located in human protein coding region. PLoS One 2013; 8:e63403. [PMID: 23671676 PMCID: PMC3646042 DOI: 10.1371/journal.pone.0063403] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2013] [Accepted: 03/30/2013] [Indexed: 01/09/2023] Open
Abstract
Recent analyses have revealed many functional microRNA (miRNA) targets in mammalian protein coding regions. But, the mechanisms that ensure miRNA function when their target sites are located in protein coding regions of mammalian mRNA transcripts are largely unknown. In this paper, we investigate some potential biological factors, such as target site accessibility and local translation efficiency. We computationally analyze these two factors using experimentally identified miRNA targets in human protein coding region. We find site accessibility is significantly increased in miRNA target region to facilitate miRNA binding. At the mean time, local translation efficiency is also selectively decreased near miRNA target region. GC-poor codons are preferred in the flank region of miRNA target sites to ease the access of miRNA targets. Within-genome analysis shows substantial variations of site accessibility and local translation efficiency among different miRNA targets in the genome. Further analyses suggest target gene’s GC content and conservation level could explain some of the differences in site accessibility. On the other hand, target gene’s functional importance and conservation level can affect local translation efficiency near miRNA target region. We hence propose both site accessibility and local translation efficiency are important in miRNA action when miRNA target sites are located in mammalian protein coding regions.
Collapse
Affiliation(s)
- Wanjun Gu
- Research Center of Learning Sciences, Southeast University, Nanjing, Jiangsu, China
- * E-mail: (WG); (TZ); (XX)
| | - Xiaofei Wang
- Research Center of Learning Sciences, Southeast University, Nanjing, Jiangsu, China
| | - Chuanying Zhai
- Research Center of Learning Sciences, Southeast University, Nanjing, Jiangsu, China
| | - Tong Zhou
- Institute for Personalized Respiratory Medicine, The University of Illinois at Chicago, Chicago, Illinois, United States of America
- Section of Pulmonary, Critical Care, Sleep & Allergy, Department of Medicine, The University of Illinois at Chicago, Chicago, Illinois, United States of America
- * E-mail: (WG); (TZ); (XX)
| | - Xueying Xie
- Research Center of Learning Sciences, Southeast University, Nanjing, Jiangsu, China
- * E-mail: (WG); (TZ); (XX)
| |
Collapse
|
61
|
Talavera D, Robertson DL, Lovell SC. The role of protein interactions in mediating essentiality and synthetic lethality. PLoS One 2013; 8:e62866. [PMID: 23638160 PMCID: PMC3639263 DOI: 10.1371/journal.pone.0062866] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Accepted: 03/27/2013] [Indexed: 11/18/2022] Open
Abstract
Genes are characterized as essential if their knockout is associated with a lethal phenotype, and these "essential genes" play a central role in biological function. In addition, some genes are only essential when deleted in pairs, a phenomenon known as synthetic lethality. Here we consider genes displaying synthetic lethality as "essential pairs" of genes, and analyze the properties of yeast essential genes and synthetic lethal pairs together. As gene duplication initially produces an identical pair or sets of genes, it is often invoked as an explanation for synthetic lethality. However, we find that duplication explains only a minority of cases of synthetic lethality. Similarly, disruption of metabolic pathways leads to relatively few examples of synthetic lethality. By contrast, the vast majority of synthetic lethal gene pairs code for proteins with related functions that share interaction partners. We also find that essential genes and synthetic lethal pairs cluster in the protein-protein interaction network. These results suggest that synthetic lethality is strongly dependent on the formation of protein-protein interactions. Compensation by duplicates does not usually occur mainly because the genes involved are recent duplicates, but is more commonly due to functional similarity that permits preservation of essential protein complexes. This unified view, combining genes that are individually essential with those that form essential pairs, suggests that essentiality is a feature of physical interactions between proteins protein-protein interactions, rather than being inherent in gene and protein products themselves.
Collapse
Affiliation(s)
- David Talavera
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
| | - David L. Robertson
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
| | - Simon C. Lovell
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
- * E-mail:
| |
Collapse
|
62
|
Large-scale event extraction from literature with multi-level gene normalization. PLoS One 2013; 8:e55814. [PMID: 23613707 PMCID: PMC3629104 DOI: 10.1371/journal.pone.0055814] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2012] [Accepted: 01/02/2013] [Indexed: 11/19/2022] Open
Abstract
Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked to information recorded in resources such as UniProt, KEGG, BioGRID and NCBI databases. In this study, we investigate how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families. To this end, we have combined two state-of-the-art text mining components, previously evaluated on two community-wide challenges, and have extended and improved upon these methods by exploiting their complementary nature. Using these systems, we perform normalization and event extraction to create a large-scale resource that is publicly available, unique in semantic scope, and covers all 21.9 million PubMed abstracts and 460 thousand PubMed Central open access full-text articles. This dataset contains 40 million biomolecular events involving 76 million gene/protein mentions, linked to 122 thousand distinct genes from 5032 species across the full taxonomic tree. Detailed evaluations and analyses reveal promising results for application of this data in database and pathway curation efforts. The main software components used in this study are released under an open-source license. Further, the resulting dataset is freely accessible through a novel API, providing programmatic and customized access (http://www.evexdb.org/api/v001/). Finally, to allow for large-scale bioinformatic analyses, the entire resource is available for bulk download from http://evexdb.org/download/, under the Creative Commons – Attribution – Share Alike (CC BY-SA) license.
Collapse
|
63
|
Challis RJ, Hepworth J, Mouchel C, Waites R, Leyser O. A role for more axillary growth1 (MAX1) in evolutionary diversity in strigolactone signaling upstream of MAX2. PLANT PHYSIOLOGY 2013; 161:1885-902. [PMID: 23424248 PMCID: PMC3613463 DOI: 10.1104/pp.112.211383] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Strigolactones (SLs) are carotenoid-derived phytohormones with diverse roles. They are secreted from roots as attractants for arbuscular mycorrhizal fungi and have a wide range of endogenous functions, such as regulation of root and shoot system architecture. To date, six genes associated with SL synthesis and signaling have been molecularly identified using the shoot-branching mutants more axillary growth (max) of Arabidopsis (Arabidopsis thaliana) and dwarf (d) of rice (Oryza sativa). Here, we present a phylogenetic analysis of the MAX/D genes to clarify the relationships of each gene with its wider family and to allow the correlation of events in the evolution of the genes with the evolution of SL function. Our analysis suggests that the notion of a distinct SL pathway is inappropriate. Instead, there may be a diversity of SL-like compounds, the response to which requires a D14/D14-like protein. This ancestral system could have been refined toward distinct ligand-specific pathways channeled through MAX2, the most downstream known component of SL signaling. MAX2 is tightly conserved among land plants and is more diverged from its nearest sister clade than any other SL-related gene, suggesting a pivotal role in the evolution of SL signaling. By contrast, the evidence suggests much greater flexibility upstream of MAX2. The MAX1 gene is a particularly strong candidate for contributing to diversification of inputs upstream of MAX2. Our functional analysis of the MAX1 family demonstrates the early origin of its catalytic function and both redundancy and functional diversification associated with its duplication in angiosperm lineages.
Collapse
|
64
|
Stewart AJ, Seymour RM, Pomiankowski A, Reuter M. Under-dominance constrains the evolution of negative autoregulation in diploids. PLoS Comput Biol 2013; 9:e1002992. [PMID: 23555226 PMCID: PMC3605092 DOI: 10.1371/journal.pcbi.1002992] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2012] [Accepted: 02/04/2013] [Indexed: 11/19/2022] Open
Abstract
Regulatory networks have evolved to allow gene expression to rapidly track changes in the environment as well as to buffer perturbations and maintain cellular homeostasis in the absence of change. Theoretical work and empirical investigation in Escherichia coli have shown that negative autoregulation confers both rapid response times and reduced intrinsic noise, which is reflected in the fact that almost half of Escherichia coli transcription factors are negatively autoregulated. However, negative autoregulation is rare amongst the transcription factors of Saccharomyces cerevisiae. This difference is surprising because E. coli and S. cerevisiae otherwise have similar profiles of network motifs. In this study we investigate regulatory interactions amongst the transcription factors of Drosophila melanogaster and humans, and show that they have a similar dearth of negative autoregulation to that seen in S. cerevisiae. We then present a model demonstrating that this stiking difference in the noise reduction strategies used amongst species can be explained by constraints on the evolution of negative autoregulation in diploids. We show that regulatory interactions between pairs of homologous genes within the same cell can lead to under-dominance — mutations which result in stronger autoregulation, and decrease noise in homozygotes, paradoxically can cause increased noise in heterozygotes. This severely limits a diploid's ability to evolve negative autoregulation as a noise reduction mechanism. Our work offers a simple and general explanation for a previously unexplained difference between the regulatory architectures of E. coli and yeast, Drosophila and humans. It also demonstrates that the effects of diploidy in gene networks can have counter-intuitive consequences that may profoundly influence the course of evolution. All genes have to deal with intrinsic noise, and a variety of mechanisms have evolved to reduce it. One important mechanism of noise reduction for transcription factors is negative autoregulation, in which a gene product represses its own rate of transcription. Negative auotregulation occurs frequently in E. coli but, we find, occurs much more rarely in S. cerevisiae, D. melanogaster and humans. Whilst there are a great many important differences in the genetic architectures of these organisms, they tend to share, with the exception of negative autoregulation, similar profiles of network motifs. This makes the discrepancy in the degree of negative autoregulation all the more striking, as it lacks any obvious explanation. Our study presents a potential explanation, by comparing the evolvability of negative autoregulation as a noise reduction mechanism in haploids and diploids. We show that, in diploids, mutations that increase the strength of negative autoregulation at one gene copy often increase overall noise in gene expression. This results in under-dominance, in which heterozygotes are less fit than homozygotes. The result is that the evolution of negative autoregulation in diploids is significantly constrained. We verify our results using a combination of detailed molecular simulations and evolutionary simulations
Collapse
Affiliation(s)
- Alexander J Stewart
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.
| | | | | | | |
Collapse
|
65
|
Vakhrusheva OA, Bazykin GA, Kondrashov AS. Genome-Level Analysis of Selective Constraint without Apparent Sequence Conservation. Genome Biol Evol 2013; 5:532-41. [PMID: 23418180 PMCID: PMC3622294 DOI: 10.1093/gbe/evt023] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Conservation of function can be accompanied by obvious similarity of homologous sequences which may persist for billions of years (Iyer LM, Leipe DD, Koonin EV, Aravind L. 2004. Evolutionary history and higher order classification of AAA+ ATPases. J Struct Biol. 146:11–31.). However, presumably homologous segments of noncoding DNA can also retain their ancestral function even after their sequences diverge beyond recognition (Fisher S, Grice EA, Vinton RM, Bessling SL, McCallion AS. 2006. Conservation of RET regulatory function from human to zebrafish without sequence similarity. Science 312:276–279.). To investigate this phenomenon at the genomic scale, we studied homologous introns in a quartet of insect species, and in a quartet of vertebrate species. Each quartet consisted of two pairs of moderately distant genomes, with a much larger evolutionary distance between the pairs. In both quartets, we found that introns that carry a regulatory segment or a conserved segment in the first pair tend to carry a conserved segment in the second pair, even though no similarity of these segments could be detected between the two pairs. Furthermore, introns from one pair that are preserved in the other pair tend to carry a conserved segment within the first pair, and be longer in the first pair, compared with the introns that were lost between pairs, even though no similarity between pairs could be detected in such preserved introns. These results indicate that selective constraint, presumably caused by conservation of the ancestral function, often persists even after the homologous DNA segments become unalignable.
Collapse
Affiliation(s)
- Olga A Vakhrusheva
- Department of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | | | | |
Collapse
|
66
|
Bianco AM, Marcuzzi A, Zanin V, Girardelli M, Vuch J, Crovella S. Database tools in genetic diseases research. Genomics 2013; 101:75-85. [DOI: 10.1016/j.ygeno.2012.11.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Revised: 10/26/2012] [Accepted: 11/01/2012] [Indexed: 01/22/2023]
|
67
|
Mao Y, Wang W, Cheng N, Li Q, Tao S. Universally increased mRNA stability downstream of the translation initiation site in eukaryotes and prokaryotes. Gene 2013; 517:230-5. [PMID: 23313297 DOI: 10.1016/j.gene.2012.12.062] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2012] [Accepted: 12/03/2012] [Indexed: 11/26/2022]
Abstract
Local secondary structures in coding sequences have important functions across various translational processes. To date, however, the local structures and their functions in the early stage of translation elongation remain poorly understood. Here, we surveyed the structural stability in the first 180 nucleotides of the coding sequence of 27 species using computational method. We found that the structural stability in the 30-80 nucleotide interval was significantly higher than that in other regions in eukaryotes and most prokaryotes. No significant correlation between local translation efficiency and structural stability was observed, suggesting that this structural region has undergone selection pressure directly to maintain high stability. Furthermore, ribosome was blocked by this region, providing an opportunity for co-translational regulation. Remarkably, in eukaryotes, we found that mRNAs with higher structural stability in the 30-80 nucleotide interval tended to encode the secreted proteins. Overall, our results revealed a previously unappreciated correlation between structural stability and protein localization.
Collapse
Affiliation(s)
- Yuanhui Mao
- State Key Laboratory of Crop Stress Biology in Arid Areas and College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | | | | | | | | |
Collapse
|
68
|
Wang P, Ning S, Wang Q, Li R, Ye J, Zhao Z, Li Y, Huang T, Li X. mirTarPri: improved prioritization of microRNA targets through incorporation of functional genomics data. PLoS One 2013; 8:e53685. [PMID: 23326485 PMCID: PMC3541237 DOI: 10.1371/journal.pone.0053685] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2012] [Accepted: 12/03/2012] [Indexed: 11/18/2022] Open
Abstract
MicroRNAs (miRNAs) are a class of small (19-25 nt) non-coding RNAs. This important class of gene regulator downregulates gene expression through sequence-specific binding to the 3'untranslated regions (3'UTRs) of target mRNAs. Several computational target prediction approaches have been developed for predicting miRNA targets. However, the predicted target lists often have high false positive rates. To construct a workable target list for subsequent experimental studies, we need novel approaches to properly rank the candidate targets from traditional methods. We performed a systematic analysis of experimentally validated miRNA targets using functional genomics data, and found significant functional associations between genes that were targeted by the same miRNA. Based on this finding, we developed a miRNA target prioritization method named mirTarPri to rank the predicted target lists from commonly used target prediction methods. Leave-one-out cross validation has proved to be successful in identifying known targets, achieving an AUC score up to 0. 84. Validation in high-throughput data proved that mirTarPri was an unbiased method. Applying mirTarPri to prioritize results of six commonly used target prediction methods allowed us to find more positive targets at the top of the prioritized candidate list. In comparison with other methods, mirTarPri had an outstanding performance in gold standard and CLIP data. mirTarPri was a valuable method to improve the efficacy of current miRNA target prediction methods. We have also developed a web-based server for implementing mirTarPri method, which is freely accessible at http://bioinfo.hrbmu.edu.cn/mirTarPri.
Collapse
Affiliation(s)
- Peng Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shangwei Ning
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Qianghu Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Ronghong Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jingrun Ye
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Zuxianglan Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Teng Huang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- * E-mail:
| |
Collapse
|
69
|
Szcześniak MW, Kabza M, Pokrzywa R, Gudyś A, Makałowska I. ERISdb: A Database of Plant Splice Sites and Splicing Signals. ACTA ACUST UNITED AC 2013; 54:e10. [DOI: 10.1093/pcp/pct001] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
70
|
Fey P, Dodson RJ, Basu S, Chisholm RL. One stop shop for everything Dictyostelium: dictyBase and the Dicty Stock Center in 2012. Methods Mol Biol 2013; 983:59-92. [PMID: 23494302 DOI: 10.1007/978-1-62703-302-2_4] [Citation(s) in RCA: 121] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
dictyBase (http://dictybase.org), the model organism database for Dictyostelium discoideum, includes the complete genome sequence and expression data for this organism. Relevant literature is integrated into the database, and gene models and functional annotation are manually curated from experimental results and comparative multigenome analyses. dictyBase has recently expanded to include the genome sequences of three additional Dictyostelids and has added new software tools to facilitate multigenome comparisons. The Dicty Stock Center, a strain and plasmid repository for Dictyostelium research, has relocated to Northwestern University in 2009. This allowed us integrating all Dictyostelium resources to better serve the research community. In this chapter, we will describe how to navigate the Web site and highlight some of our newer improvements.
Collapse
Affiliation(s)
- Petra Fey
- dictyBase and the Dicty Stock Center, Center for Genetic Medicine, Northwestern University, Chicago, IL, USA.
| | | | | | | |
Collapse
|
71
|
Discovery of microRNA Regulatory Networks by Integrating Multidimensional High-Throughput Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2013; 774:251-66. [DOI: 10.1007/978-94-007-5590-1_13] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
72
|
Mannige RV, Brooks CL, Shakhnovich EI. A universal trend among proteomes indicates an oily last common ancestor. PLoS Comput Biol 2012; 8:e1002839. [PMID: 23300421 PMCID: PMC3531291 DOI: 10.1371/journal.pcbi.1002839] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2012] [Accepted: 10/28/2012] [Indexed: 11/19/2022] Open
Abstract
Despite progresses in ancestral protein sequence reconstruction, much needs to be unraveled about the nature of the putative last common ancestral proteome that served as the prototype of all extant lifeforms. Here, we present data that indicate a steady decline (oil escape) in proteome hydrophobicity over species evolvedness (node number) evident in 272 diverse proteomes, which indicates a highly hydrophobic (oily) last common ancestor (LCA). This trend, obtained from simple considerations (free from sequence reconstruction methods), was corroborated by regression studies within homologous and orthologous protein clusters as well as phylogenetic estimates of the ancestral oil content. While indicating an inherent irreversibility in molecular evolution, oil escape also serves as a rare and universal reaction-coordinate for evolution (reinforcing Darwin's principle of Common Descent), and may prove important in matters such as (i) explaining the emergence of intrinsically disordered proteins, (ii) developing composition- and speciation-based "global" molecular clocks, and (iii) improving the statistical methods for ancestral sequence reconstruction.
Collapse
Affiliation(s)
- Ranjan V Mannige
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America.
| | | | | |
Collapse
|
73
|
Sifrim A, Van Houdt JKJ, Tranchevent LC, Nowakowska B, Sakai R, Pavlopoulos GA, Devriendt K, Vermeesch JR, Moreau Y, Aerts J. Annotate-it: a Swiss-knife approach to annotation, analysis and interpretation of single nucleotide variation in human disease. Genome Med 2012; 4:73. [PMID: 23013645 PMCID: PMC3580443 DOI: 10.1186/gm374] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Revised: 09/14/2012] [Accepted: 09/26/2012] [Indexed: 12/18/2022] Open
Abstract
The increasing size and complexity of exome/genome sequencing data requires new tools for clinical geneticists to discover disease-causing variants. Bottlenecks in identifying the causative variation include poor cross-sample querying, constantly changing functional annotation and not considering existing knowledge concerning the phenotype. We describe a methodology that facilitates exploration of patient sequencing data towards identification of causal variants under different genetic hypotheses. Annotate-it facilitates handling, analysis and interpretation of high-throughput single nucleotide variant data. We demonstrate our strategy using three case studies. Annotate-it is freely available and test data are accessible to all users at http://www.annotate-it.org.
Collapse
Affiliation(s)
- Alejandro Sifrim
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| | - Jeroen KJ Van Houdt
- KU Leuven, Centre for Human Genetics, University Hospital Gasthuisberg, Herestraat 49, 3000 Leuven, Belgium
| | - Leon-Charles Tranchevent
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| | - Beata Nowakowska
- KU Leuven, Centre for Human Genetics, University Hospital Gasthuisberg, Herestraat 49, 3000 Leuven, Belgium
| | - Ryo Sakai
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| | - Georgios A Pavlopoulos
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| | - Koen Devriendt
- KU Leuven, Centre for Human Genetics, University Hospital Gasthuisberg, Herestraat 49, 3000 Leuven, Belgium
| | - Joris R Vermeesch
- KU Leuven, Centre for Human Genetics, University Hospital Gasthuisberg, Herestraat 49, 3000 Leuven, Belgium
| | - Yves Moreau
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| | - Jan Aerts
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| |
Collapse
|
74
|
Gu W, Zhai C, Wang X, Xie X, Parinandi G, Zhou T. Translation Efficiency in Upstream Region of microRNA Targets in Arabidopsis thaliana. Evol Bioinform Online 2012; 8:565-74. [PMID: 23071387 PMCID: PMC3469488 DOI: 10.4137/ebo.s10362] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
With respect to upstream regions of microRNA (miRNA) target sites located in protein coding sequences, experimental studies have suggested rare codons, rather than frequent codons, are important for miRNA function, because they slow down the local translational process. But, whether there is a trend of reduced translation efficiency near miRNA targets is still unknown. Using Arabidopsis thaliana, we perform genome-wide analysis of synonymous codon usage in upstream regions of miRNA target sites. At the whole genome level, we find no significant selection signals for decreased translational efficiency. However, the same genome analyses do show substantial variations of translation efficiency reduction among miRNA targets. We find that miRNA conservation level, gene codon usage bias, and the mechanism of miRNA action can account for the differences in translation efficiency. But gene's GC content, gene expression level, and miRNA target's conservation level have no effect on local translation efficiency of miRNA targets. Although local translation efficiency in the upstream region of miRNA targets is related to miRNA function in A. thaliana, the selection signal of rare codon usage in that region is weak. We propose some other biological factors are more important than local translation efficiency in miRNA action when miRNA targets are located in protein coding sequences.
Collapse
Affiliation(s)
- Wanjun Gu
- Key Laboratory of Child Development and Learning Science of Ministry of Education of China, Southeast University, Nanjing, Jiangsu 210096, China
| | | | | | | | | | | |
Collapse
|
75
|
Comparative genomics of eukaryotic small nucleolar RNAs reveals deep evolutionary ancestry amidst ongoing intragenomic mobility. BMC Evol Biol 2012; 12:183. [PMID: 22978381 PMCID: PMC3511168 DOI: 10.1186/1471-2148-12-183] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Accepted: 09/04/2012] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Small nucleolar (sno)RNAs are required for posttranscriptional processing and modification of ribosomal, spliceosomal and messenger RNAs. Their presence in both eukaryotes and archaea indicates that snoRNAs are evolutionarily ancient. The location of some snoRNAs within the introns of ribosomal protein genes has been suggested to belie an RNA world origin, with the exons of the earliest protein-coding genes having evolved around snoRNAs after the advent of templated protein synthesis. Alternatively, this intronic location may reflect more recent selection for coexpression of snoRNAs and ribosomal components, ensuring rRNA modification by snoRNAs during ribosome synthesis. To gain insight into the evolutionary origins of this genetic organization, we examined the antiquity of snoRNA families and the stability of their genomic location across 44 eukaryote genomes. RESULTS We report that dozens of snoRNA families are traceable to the Last Eukaryotic Common Ancestor (LECA), but find only weak similarities between the oldest eukaryotic snoRNAs and archaeal snoRNA-like genes. Moreover, many of these LECA snoRNAs are located within the introns of host genes independently traceable to the LECA. Comparative genomic analyses reveal the intronic location of LECA snoRNAs is not ancestral however, suggesting the pattern we observe is the result of ongoing intragenomic mobility. Analysis of human transcriptome data indicates that the primary requirement for hosting intronic snoRNAs is a broad expression profile. Consistent with ongoing mobility across broadly-expressed genes, we report a case of recent migration of a non-LECA snoRNA from the intron of a ubiquitously expressed non-LECA host gene into the introns of two LECA genes during the evolution of primates. CONCLUSIONS Our analyses show that snoRNAs were a well-established family of RNAs at the time when eukaryotes began to diversify. While many are intronic, this association is not evolutionarily stable across the eukaryote tree; ongoing intragenomic mobility has erased signal of their ancestral gene organization, and neither introns-first nor evolved co-expression adequately explain our results. We therefore present a third model - constrained drift - whereby individual snoRNAs are intragenomically mobile and may occupy any genomic location from which expression satisfies phenotype.
Collapse
|
76
|
Valenzuela J, Mazurie A, Carlson RP, Gerlach R, Cooksey KE, Peyton BM, Fields MW. Potential role of multiple carbon fixation pathways during lipid accumulation in Phaeodactylum tricornutum. BIOTECHNOLOGY FOR BIOFUELS 2012; 5:40. [PMID: 22672912 PMCID: PMC3457861 DOI: 10.1186/1754-6834-5-40] [Citation(s) in RCA: 121] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2012] [Accepted: 06/06/2012] [Indexed: 05/04/2023]
Abstract
BACKGROUND Phaeodactylum tricornutum is a unicellular diatom in the class Bacillariophyceae. The full genome has been sequenced (<30 Mb), and approximately 20 to 30% triacylglyceride (TAG) accumulation on a dry cell basis has been reported under different growth conditions. To elucidate P. tricornutum gene expression profiles during nutrient-deprivation and lipid-accumulation, cell cultures were grown with a nitrate to phosphate ratio of 20:1 (N:P) and whole-genome transcripts were monitored over time via RNA-sequence determination. RESULTS The specific Nile Red (NR) fluorescence (NR fluorescence per cell) increased over time; however, the increase in NR fluorescence was initiated before external nitrate was completely exhausted. Exogenous phosphate was depleted before nitrate, and these results indicated that the depletion of exogenous phosphate might be an early trigger for lipid accumulation that is magnified upon nitrate depletion. As expected, many of the genes associated with nitrate and phosphate utilization were up-expressed. The diatom-specific cyclins cyc7 and cyc10 were down-expressed during the nutrient-deplete state, and cyclin B1 was up-expressed during lipid-accumulation after growth cessation. While many of the genes associated with the C3 pathway for photosynthetic carbon reduction were not significantly altered, genes involved in a putative C4 pathway for photosynthetic carbon assimilation were up-expressed as the cells depleted nitrate, phosphate, and exogenous dissolved inorganic carbon (DIC) levels. P. tricornutum has multiple, putative carbonic anhydrases, but only two were significantly up-expressed (2-fold and 4-fold) at the last time point when exogenous DIC levels had increased after the cessation of growth. Alternative pathways that could utilize HCO3- were also suggested by the gene expression profiles (e.g., putative propionyl-CoA and methylmalonyl-CoA decarboxylases). CONCLUSIONS The results indicate that P. tricornutum continued carbon dioxide reduction when population growth was arrested and different carbon-concentrating mechanisms were used dependent upon exogenous DIC levels. Based upon overall low gene expression levels for fatty acid synthesis, the results also suggest that the build-up of precursors to the acetyl-CoA carboxylases may play a more significant role in TAG synthesis rather than the actual enzyme levels of acetyl-CoA carboxylases per se. The presented insights into the types and timing of cellular responses to inorganic carbon will help maximize photoautotrophic carbon flow to lipid accumulation.
Collapse
Affiliation(s)
- Jacob Valenzuela
- Department of Biochemistry and Chemistry, Bozeman, USA
- Center for Biofilm Engineering, Bozeman, USA
| | - Aurelien Mazurie
- Department of Microbiology, Bozeman, USA
- Bioinformatics Core, Bozeman, USA
| | - Ross P Carlson
- Center for Biofilm Engineering, Bozeman, USA
- Department of Chemical and Biological Engineering, Montana State University, Bozeman, MT, 59717, USA
| | - Robin Gerlach
- Center for Biofilm Engineering, Bozeman, USA
- Department of Chemical and Biological Engineering, Montana State University, Bozeman, MT, 59717, USA
| | | | - Brent M Peyton
- Center for Biofilm Engineering, Bozeman, USA
- Department of Chemical and Biological Engineering, Montana State University, Bozeman, MT, 59717, USA
| | - Matthew W Fields
- Center for Biofilm Engineering, Bozeman, USA
- Department of Microbiology, Bozeman, USA
- Center for Biofilm Engineering, 366 EPS Building, Montana State University, Bozeman, MT, 59717, USA
| |
Collapse
|
77
|
Van Landeghem S, Hakala K, Rönnqvist S, Salakoski T, Van de Peer Y, Ginter F. Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations. Adv Bioinformatics 2012; 2012:582765. [PMID: 22719757 PMCID: PMC3375141 DOI: 10.1155/2012/582765] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Revised: 03/16/2012] [Accepted: 03/28/2012] [Indexed: 01/20/2023] Open
Abstract
Technological advancements in the field of genetics have led not only to an abundance of experimental data, but also caused an exponential increase of the number of published biomolecular studies. Text mining is widely accepted as a promising technique to help researchers in the life sciences deal with the amount of available literature. This paper presents a freely available web application built on top of 21.3 million detailed biomolecular events extracted from all PubMed abstracts. These text mining results were generated by a state-of-the-art event extraction system and enriched with gene family associations and abstract generalizations, accounting for lexical variants and synonymy. The EVEX resource locates relevant literature on phosphorylation, regulation targets, binding partners, and several other biomolecular events and assigns confidence values to these events. The search function accepts official gene/protein symbols as well as common names from all species. Finally, the web application is a powerful tool for generating homology-based hypotheses as well as novel, indirect associations between genes and proteins such as coregulators.
Collapse
Affiliation(s)
- Sofie Van Landeghem
- Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, 9052 Gent, Belgium
| | - Kai Hakala
- Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland
| | - Samuel Rönnqvist
- Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland
| | - Tapio Salakoski
- Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland
- Turku BioNLP Group, Turku Centre for Computer Science (TUCS), Joukahaisenkatu 3-5, 20520 Turku, Finland
| | - Yves Van de Peer
- Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, 9052 Gent, Belgium
| | - Filip Ginter
- Department of Information Technology, University of Turku, Joukahaisenkatu 3-5, 20520 Turku, Finland
| |
Collapse
|
78
|
Guerra-Assunção JA, Enright AJ. Large-scale analysis of microRNA evolution. BMC Genomics 2012; 13:218. [PMID: 22672736 PMCID: PMC3497579 DOI: 10.1186/1471-2164-13-218] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Accepted: 02/17/2012] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND In animals, microRNAs (miRNA) are important genetic regulators. Animal miRNAs appear to have expanded in conjunction with an escalation in complexity during early bilaterian evolution. Their small size and high-degree of similarity makes them challenging for phylogenetic approaches. Furthermore, genomic locations encoding miRNAs are not clearly defined in many species. A number of studies have looked at the evolution of individual miRNA families. However, we currently lack resources for large-scale analysis of miRNA evolution. RESULTS We addressed some of these issues in order to analyse the evolution of miRNAs. We perform syntenic and phylogenetic analysis for miRNAs from 80 animal species. We present synteny maps, phylogenies and functional data for miRNAs across these species. These data represent the basis of our analyses and also act as a resource for the community. CONCLUSIONS We use these data to explore the distribution of miRNAs across phylogenetic space, characterise their birth and death, and examine functional relationships between miRNAs and other genes. These data confirm a number of previously reported findings on a larger scale and also offer novel insights into the evolution of the miRNA repertoire in animals, and it's genomic organization.
Collapse
Affiliation(s)
- José Afonso Guerra-Assunção
- EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- PDBC, Instituto Gulbenkian de Ciência, Rua da Quinta Grande, 6, 2780-156, Oeiras, Portugal
| | - Anton J Enright
- EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| |
Collapse
|
79
|
Wu Y, Punta M, Xiao R, Acton TB, Sathyamoorthy B, Dey F, Fischer M, Skerra A, Rost B, Montelione GT, Szyperski T. NMR structure of lipoprotein YxeF from Bacillus subtilis reveals a calycin fold and distant homology with the lipocalin Blc from Escherichia coli. PLoS One 2012; 7:e37404. [PMID: 22693626 PMCID: PMC3367933 DOI: 10.1371/journal.pone.0037404] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2012] [Accepted: 04/19/2012] [Indexed: 11/18/2022] Open
Abstract
The soluble monomeric domain of lipoprotein YxeF from the Gram positive bacterium B. subtilis was selected by the Northeast Structural Genomics Consortium (NESG) as a target of a biomedical theme project focusing on the structure determination of the soluble domains of bacterial lipoproteins. The solution NMR structure of YxeF reveals a calycin fold and distant homology with the lipocalin Blc from the Gram-negative bacterium E.coli. In particular, the characteristic β-barrel, which is open to the solvent at one end, is extremely well conserved in YxeF with respect to Blc. The identification of YxeF as the first lipocalin homologue occurring in a Gram-positive bacterium suggests that lipocalins emerged before the evolutionary divergence of Gram positive and Gram negative bacteria. Since YxeF is devoid of the α-helix that packs in all lipocalins with known structure against the β-barrel to form a second hydrophobic core, we propose to introduce a new lipocalin sub-family named ‘slim lipocalins’, with YxeF and the other members of Pfam family PF11631 to which YxeF belongs constituting the first representatives. The results presented here exemplify the impact of structural genomics to enhance our understanding of biology and to generate new biological hypotheses.
Collapse
Affiliation(s)
- Yibing Wu
- Department of Chemistry, State University of New York at Buffalo, Buffalo, New York, United States of America
- Northeast Structural Genomics Consortium
| | - Marco Punta
- Department of Computer Science and Institute for Advanced Study, Technical University of Munich, Munich, Germany
- Northeast Structural Genomics Consortium
| | - Rong Xiao
- Center of Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Robert Wood Johnson Medical School, The State University of New Jersey, Piscataway, New Jersey, United States of America
- Northeast Structural Genomics Consortium
| | - Thomas B. Acton
- Center of Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Robert Wood Johnson Medical School, The State University of New Jersey, Piscataway, New Jersey, United States of America
- Northeast Structural Genomics Consortium
| | - Bharathwaj Sathyamoorthy
- Department of Chemistry, State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Fabian Dey
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- Northeast Structural Genomics Consortium
| | - Markus Fischer
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- Northeast Structural Genomics Consortium
| | - Arne Skerra
- Munich Center for Integrated Protein Science, CIPS-M, and Lehrstuhl für Biologische Chemie, Technische Universität München, Freising-Weihenstephan, Germany
| | - Burkhard Rost
- Department of Computer Science and Institute for Advanced Study, Technical University of Munich, Munich, Germany
- Northeast Structural Genomics Consortium
| | - Gaetano T. Montelione
- Center of Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Robert Wood Johnson Medical School, The State University of New Jersey, Piscataway, New Jersey, United States of America
- Northeast Structural Genomics Consortium
| | - Thomas Szyperski
- Department of Chemistry, State University of New York at Buffalo, Buffalo, New York, United States of America
- Northeast Structural Genomics Consortium
- * E-mail:
| |
Collapse
|
80
|
Selection on Synonymous Sites for Increased Accessibility around miRNA Binding Sites in Plants. Mol Biol Evol 2012; 29:3037-44. [DOI: 10.1093/molbev/mss109] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
|
81
|
|
82
|
Armani R, Archer H, Clarke A, Vasudevan P, Zweier C, Ho G, Williamson S, Cloosterman D, Yang N, Christodoulou J. Transcription factor 4 and myocyte enhancer factor 2C mutations are not common causes of Rett syndrome. Am J Med Genet A 2012; 158A:713-9. [PMID: 22383159 DOI: 10.1002/ajmg.a.34206] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2010] [Accepted: 06/22/2011] [Indexed: 01/04/2023]
Abstract
The systematic screening of Rett syndrome (RTT) patients for pathogenetic sequence variations has focused on three genes that have been associated with RTT or related clinical phenotypes, namely MECP2, CDKL5, and FOXG1. More recently, it has been suggested that phenotypes associated with TCF4 and MEF2C mutations may represent a form of RTT. Here we report on the screening of the TCF4 and MEF2C genes in a cohort of 81 classical, atypical, and incomplete atypical RTT patients harboring no known mutations in MECP2, CDKL5, and FOXG1 genes. No pathogenetic sequence variations were identified in the MEF2C gene in our cohort. However, a frameshift mutation in TCF4 was identified in a patient with a clinical diagnosis of "variant" RTT, in whom the clinical evolution later raised the possibility of Pitt-Hopkins syndrome. Although our results suggest that these genes are not commonly associated with RTT, we note the clinical similarity between RTT and Pitt-Hopkins syndrome, and suggest that RTT patients with no mutation identified in MECP2 be considered for molecular screening of the TCF4 gene.
Collapse
Affiliation(s)
- Roksana Armani
- NSW Centre for Rett Syndrome Research, Kids Research Institute, The Children's Hospital at Westmead, Sydney, NSW, Australia
| | | | | | | | | | | | | | | | | | | |
Collapse
|
83
|
Spooner W, Youens-Clark K, Staines D, Ware D. GrameneMart: the BioMart data portal for the Gramene project. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2012; 2012:bar056. [PMID: 22374386 PMCID: PMC3289142 DOI: 10.1093/database/bar056] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Gramene is a well-established resource for plant comparative genome analysis. Data are generated through automated and curated analyses and made available through web interfaces such as GrameneMart. The Gramene project was an early adopter of the BioMart software, which remains an integral and well-used component of the Gramene website. BioMart accessible data sets include plant gene annotations, plant variation catalogues, genetic markers, physical mapping entities, public DNA/mRNA sequences of various types and curated quantitative trait loci for various species. Database URL:http://www.gramene.org/biomart/martview
Collapse
Affiliation(s)
- William Spooner
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA and European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Ken Youens-Clark
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA and European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Daniel Staines
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA and European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Doreen Ware
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA and European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- *Corresponding author: ;
| |
Collapse
|
84
|
Ramazzotti M, Monsellier E, Kamoun C, Degl'Innocenti D, Melki R. Polyglutamine repeats are associated to specific sequence biases that are conserved among eukaryotes. PLoS One 2012; 7:e30824. [PMID: 22312432 PMCID: PMC3270027 DOI: 10.1371/journal.pone.0030824] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2011] [Accepted: 12/23/2011] [Indexed: 12/20/2022] Open
Abstract
Nine human neurodegenerative diseases, including Huntington's disease and several spinocerebellar ataxia, are associated to the aggregation of proteins comprising an extended tract of consecutive glutamine residues (polyQs) once it exceeds a certain length threshold. This event is believed to be the consequence of the expansion of polyCAG codons during the replication process. This is in apparent contradiction with the fact that many polyQs-containing proteins remain soluble and are encoded by invariant genes in a number of eukaryotes. The latter suggests that polyQs expansion and/or aggregation might be counter-selected through a genetic and/or protein context. To identify this context, we designed a software that scrutinize entire proteomes in search for imperfect polyQs. The nature of residues flanking the polyQs and that of residues other than Gln within polyQs (insertions) were assessed. We discovered strong amino acid residue biases robustly associated to polyQs in the 15 eukaryotic proteomes we examined, with an over-representation of Pro, Leu and His and an under-representation of Asp, Cys and Gly amino acid residues. These biases are conserved amongst unrelated proteins and are independent of specific functional classes. Our findings suggest that specific residues have been co-selected with polyQs during evolution. We discuss the possible selective pressures responsible of the observed biases.
Collapse
Affiliation(s)
- Matteo Ramazzotti
- Dipartimento di Scienze Biochimiche, Università degli Studi di Firenze, Florence, Italy
- * E-mail: (MR); (EM)
| | - Elodie Monsellier
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
- * E-mail: (MR); (EM)
| | - Choumouss Kamoun
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
| | | | - Ronald Melki
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
| |
Collapse
|
85
|
Van Bel M, Proost S, Wischnitzki E, Movahedi S, Scheerlinck C, Van de Peer Y, Vandepoele K. Dissecting plant genomes with the PLAZA comparative genomics platform. PLANT PHYSIOLOGY 2012; 158:590-600. [PMID: 22198273 PMCID: PMC3271752 DOI: 10.1104/pp.111.189514] [Citation(s) in RCA: 192] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2011] [Accepted: 12/22/2011] [Indexed: 05/17/2023]
Abstract
With the arrival of low-cost, next-generation sequencing, a multitude of new plant genomes are being publicly released, providing unseen opportunities and challenges for comparative genomics studies. Here, we present PLAZA 2.5, a user-friendly online research environment to explore genomic information from different plants. This new release features updates to previous genome annotations and a substantial number of newly available plant genomes as well as various new interactive tools and visualizations. Currently, PLAZA hosts 25 organisms covering a broad taxonomic range, including 13 eudicots, five monocots, one lycopod, one moss, and five algae. The available data consist of structural and functional gene annotations, homologous gene families, multiple sequence alignments, phylogenetic trees, and colinear regions within and between species. A new Integrative Orthology Viewer, combining information from different orthology prediction methodologies, was developed to efficiently investigate complex orthology relationships. Cross-species expression analysis revealed that the integration of complementary data types extended the scope of complex orthology relationships, especially between more distantly related species. Finally, based on phylogenetic profiling, we propose a set of core gene families within the green plant lineage that will be instrumental to assess the gene space of draft or newly sequenced plant genomes during the assembly or annotation phase.
Collapse
|
86
|
Parida L, Haiminen N. Discovering patterns in gene order. Methods Mol Biol 2012; 855:431-455. [PMID: 22407719 DOI: 10.1007/978-1-61779-582-4_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Various genetic events during the process of natural evolution shape the landscape of the genomes. In this chapter, we explore an approach to investigating multiple genomes in order to unravel their complex relationships that go beyond their placement on a phylogeny. To this end, we treat genes as the smallest syntactic unit on the genome and explore their relative organization across multiple genomes. In the first half of the chapter, we discuss mathematical models to capture the combinatorial structures of this relative organization and statistical models to study their distributions. In the second half of the chapter, we apply these models to analyze the relationship between three closely related plant genomes.
Collapse
Affiliation(s)
- Laxmi Parida
- IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA.
| | | |
Collapse
|
87
|
Bitton DA, Grallert A, Scutt PJ, Yates T, Li Y, Bradford JR, Hey Y, Pepper SD, Hagan IM, Miller CJ. Programmed fluctuations in sense/antisense transcript ratios drive sexual differentiation in S. pombe. Mol Syst Biol 2011; 7:559. [PMID: 22186733 PMCID: PMC3738847 DOI: 10.1038/msb.2011.90] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2010] [Accepted: 11/07/2011] [Indexed: 12/31/2022] Open
Abstract
Strand-specific RNA sequencing of S. pombe reveals a highly structured programme of ncRNA expression at over 600 loci. Functional investigations show that this extensive ncRNA landscape controls the complex programme of sexual differentiation in S. pombe. The model eukaryote S. pombe features substantial numbers of ncRNAs many of which are antisense regulatory transcripts (ARTs), ncRNAs expressed on the opposing strand to coding sequences. Individual ARTs are generated during the mitotic cycle, or at discrete stages of sexual differentiation to downregulate the levels of proteins that drive and coordinate sexual differentiation. Antisense transcription occurring from events such as bidirectional transcription is not simply artefactual ‘chatter', it performs a critical role in regulating gene expression.
Regulation of the RNA profile is a principal control driving sexual differentiation in the fission yeast Schizosaccharomyces pombe. Before transcription, RNAi-mediated formation of heterochromatin is used to suppress expression, while post-transcription, regulation is achieved via the active stabilisation or destruction of transcripts, and through at least two distinct types of splicing control (Mata et al, 2002; Shimoseki and Shimoda, 2001; Averbeck et al, 2005; Mata and Bähler, 2006; Xue-Franzen et al, 2006; Moldon et al, 2008; Djupedal et al, 2009; Amorim et al, 2010; Grewal, 2010; Cremona et al, 2011). Around 94% of the S. pombe genome is transcribed (Wilhelm et al, 2008). While many of these transcripts encode proteins (Wood et al, 2002; Bitton et al, 2011), the majority have no known function. We used a strand-specific protocol to sequence total RNA extracts taken from vegetatively growing cells, and at different points during a time course of sexual differentiation. The resulting data redefined existing gene coordinates and identified additional transcribed loci. The frequency of reads at each of these was used to monitor transcript abundance. Transcript levels at 6599 loci changed in at least one sample (G-statistic; False Discovery Rate <5%). 4231 (72.3%), of which 4011 map to protein-coding genes, while 809 loci were antisense to a known gene. Comparisons between haploid and diploid strains identified changes in transcript levels at over 1000 loci. At 354 loci, greater antisense abundance was observed relative to sense, in at least one sample (putative antisense regulatory transcripts—ARTs). Since antisense mechanisms are known to modulate sense transcript expression through a variety of inhibitory mechanisms (Faghihi and Wahlestedt, 2009), we postulated that the waves of antisense expression activated at different stages during meiosis might be regulating protein expression. To ask whether transcription factors that drive sense-transcript levels influenced ART production, we performed RNA-seq of a pat1.114 diploid meiosis in the absence of the transcription factors Atf21 and Atf31 (responsible for late meiotic transcription; Mata et al, 2002). Transcript levels at 185 ncRNA loci showed significant changes in the knockout backgrounds. Although meiotic progression is largely unaffected by removal of Atf21 and Atf31, viability of the resulting spores was significantly diminished, indicating that Atf21- and Atf31-mediated events are critical to efficient sexual differentiation. If changes to relative antisense/sense transcript levels during a particular phase of sexual differentiation were to regulate protein expression, then the continued presence of the antisense at points in the differentiation programme where it would normally be absent should abolish protein function during this phase. We tested this hypothesis at four loci representing the three means of antisense production: convergent gene expression, improper termination and nascent transcription from an independent locus. Induction of the natural antisense transcripts that opposed spo4+, spo6+ and dis1+ (Figures 3 and 7) in trans from a heterologous locus phenocopied a loss of function of the target protein. ART overexpression decreased Dis1 protein levels. Antisense transcription opposing spk1+ originated from improper termination of the sense ups1+ transcript on the opposite strand (Figure 3B, left locus). Expression of either the natural full-length ups1+ transcript or a truncated version, restricted to the portion of ups1+ overlapping spk1+ (Figure 3, orange transcripts) in trans from a heterologous locus phenocopied the spk1.Δ differentiation deficiency. Convergent transcription from a neighbouring gene on the opposing strand is, therefore, an effective mechanism to generate RNAi-mediated (below) silencing in fission yeast. Further analysis of the data revealed, for many loci, substantial changes in UTR length over the course of meiosis, suggesting that UTR dynamics may have an active role in regulating gene expression by controlling the transcriptional overlap between convergent adjacent gene pairs. The RNAi machinery (Grewal, 2010) was required for antisense suppression at each of the dis1, spk1, spo4 and spo6 loci, as antisense to each locus had no impact in ago1.Δ, dcr1.Δ and rdp1.Δ backgrounds. We conclude that RNAi control has a key role in maintaining the fidelity of sexual differentiation in fission yeast. The histone H3 methyl transferase Clr4 was required for antisense control from a heterologous locus. Thus, a significant portion of the impact of ncRNA upon sexual differentiation arises from antisense gene silencing. Importantly, in contrast to the extensively characterised ability of the RNAi machinery to operate in cis at a target locus in S. pombe (Grewal, 2010), each case of gene silencing generated here could be achieved in trans by expression of the antisense transcript from a single heterologous locus elsewhere in the genome. Integration of an antibiotic marker gene immediately downstream of the dis1+ locus instigated antisense control in an orientation-dependent manner. PCR-based gene tagging approaches are widely used to fuse the coding sequences of epitope or protein tags to a gene of interest. Not only do these tagging approaches disrupt normal 3′UTR controls, but the insertion of a heterologous marker gene immediately downstream of an ORF can clearly have a significant impact upon transcriptional control of the resulting fusion protein. Thus, PCR tagging approaches can no longer be viewed as benign manipulations of a locus that only result in the production of a tagged protein product. Repression of Dis1 function by gene deletion or antisense control revealed a key role this conserved microtubule regulator in driving the horsetail nuclear migrations that promote recombination during meiotic prophase. Non-coding transcripts have often been viewed as simple ‘chatter', maintained solely because evolutionary pressures have not been strong enough to force their elimination from the system. Our data show that phenomena such as improper termination and bidirectional transcription are not simply interesting artifacts arising from the complexities of transcription or genome history, but have a critical role in regulating gene expression in the current genome. Given the widespread use of RNAi, it is reasonable to anticipate that future analyses will establish ARTs to have equal importance in other organisms, including vertebrates. These data highlight the need to modify our concept of a gene from that of a spatially distinct locus. This view is becoming increasingly untenable. Not only are the 5′ and 3′ ends of many genes indistinct, but that this lack of a hard and fast boundary is actively used by cells to control the transcription of adjacent and overlapping loci, and thus to regulate critical events in the life of a cell. Strand-specific RNA sequencing of S. pombe revealed a highly structured programme of ncRNA expression at over 600 loci. Waves of antisense transcription accompanied sexual differentiation. A substantial proportion of ncRNA arose from mechanisms previously considered to be largely artefactual, including improper 3′ termination and bidirectional transcription. Constitutive induction of the entire spk1+, spo4+, dis1+ and spo6+ antisense transcripts from an integrated, ectopic, locus disrupted their respective meiotic functions. This ability of antisense transcripts to disrupt gene function when expressed in trans suggests that cis production at native loci during sexual differentiation may also control gene function. Consistently, insertion of a marker gene adjacent to the dis1+ antisense start site mimicked ectopic antisense expression in reducing the levels of this microtubule regulator and abolishing the microtubule-dependent ‘horsetail' stage of meiosis. Antisense production had no impact at any of these loci when the RNA interference (RNAi) machinery was removed. Thus, far from being simply ‘genome chatter', this extensive ncRNA landscape constitutes a fundamental component in the controls that drive the complex programme of sexual differentiation in S. pombe.
Collapse
Affiliation(s)
- Danny A Bitton
- CRUK Applied Computational Biology and Bioinformatics Group, Cancer Research UK, Paterson Institute for Cancer Research, The University of Manchester, Manchester, UK
| | | | | | | | | | | | | | | | | | | |
Collapse
|
88
|
Azevedo H, Silva-Correia J, Oliveira J, Laranjeira S, Barbeta C, Amorim-Silva V, Botella MA, Lino-Neto T, Tavares RM. A strategy for the identification of new abiotic stress determinants in Arabidopsis using web-based data mining and reverse genetics. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2011; 15:935-47. [PMID: 22136640 DOI: 10.1089/omi.2011.0083] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Since the sequencing of the Arabidopsis thaliana genome in 2000, plant researchers have faced the complex challenge of assigning function to thousands of genes. Functional discovery by in silico prediction or homology search resolved a significant number of genes, but only a minor part has been experimentally validated. Arabidopsis entry into the post-genomic era signified a massive increase in high-throughput approaches to functional discovery, which have since become available through publicly-available web-based resources. The present work focuses on an easy and straightforward strategy that couples data-mining to reverse genetics principles, to allow for the identification of new abiotic stress determinant genes. The strategy explores systematic microarray-based transcriptomics experiments, involving Arabidopsis abiotic stress responses. An overview of the most significant resources and databases for functional discovery in Arabidopsis is presented. The successful application of the outlined strategy is illustrated by the identification of a new abiotic stress determinant gene, HRR, which displays a heat-stress-related phenotype after a loss-of-function reverse genetics approach.
Collapse
Affiliation(s)
- Herlânder Azevedo
- Center for Biodiversity, Functional & Integrative Genomics (BioFIG), CBFP/Department of Biology, University of Minho, Campus de Gualtar, Braga, Portugal.
| | | | | | | | | | | | | | | | | |
Collapse
|
89
|
Lees J, Yeats C, Perkins J, Sillitoe I, Rentzsch R, Dessailly BH, Orengo C. Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis. Nucleic Acids Res 2011; 40:D465-71. [PMID: 22139938 PMCID: PMC3245158 DOI: 10.1093/nar/gkr1181] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Gene3D http://gene3d.biochem.ucl.ac.uk is a comprehensive database of protein domain assignments for sequences from the major sequence databases. Domains are directly mapped from structures in the CATH database or predicted using a library of representative profile HMMs derived from CATH superfamilies. As previously described, Gene3D integrates many other protein family and function databases. These facilitate complex associations of molecular function, structure and evolution. Gene3D now includes a domain functional family (FunFam) level below the homologous superfamily level assignments. Additions have also been made to the interaction data. More significantly, to help with the visualization and interpretation of multi-genome scale data sets, we have developed a new, revamped website. Searching has been simplified with more sophisticated filtering of results, along with new tools based on Cytoscape Web, for visualizing protein–protein interaction networks, differences in domain composition between genomes and the taxonomic distribution of individual superfamilies.
Collapse
Affiliation(s)
- Jonathan Lees
- Institute of Structural and Molecular Biology, University College London, Darwin Building, Gower St, London WC1E 6BT, UK.
| | | | | | | | | | | | | |
Collapse
|
90
|
Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer ELL, Eddy SR, Bateman A, Finn RD. The Pfam protein families database. Nucleic Acids Res 2011; 40:D290-301. [PMID: 22127870 PMCID: PMC3245129 DOI: 10.1093/nar/gkr1065] [Citation(s) in RCA: 2883] [Impact Index Per Article: 221.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Pfam is a widely used database of protein families, currently containing more than 13,000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the 'sunburst' representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam.
Collapse
Affiliation(s)
- Marco Punta
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
91
|
Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F, Chen C, Duesbury M, Dumousseau M, Feuermann M, Hinz U, Jandrasits C, Jimenez RC, Khadake J, Mahadevan U, Masson P, Pedruzzi I, Pfeiffenberger E, Porras P, Raghunath A, Roechert B, Orchard S, Hermjakob H. The IntAct molecular interaction database in 2012. Nucleic Acids Res 2011; 40:D841-6. [PMID: 22121220 PMCID: PMC3245075 DOI: 10.1093/nar/gkr1088] [Citation(s) in RCA: 743] [Impact Index Per Article: 57.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
IntAct is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. Two levels of curation are now available within the database, with both IMEx-level annotation and less detailed MIMIx-compatible entries currently supported. As from September 2011, IntAct contains approximately 275 000 curated binary interaction evidences from over 5000 publications. The IntAct website has been improved to enhance the search process and in particular the graphical display of the results. New data download formats are also available, which will facilitate the inclusion of IntAct's data in the Semantic Web. IntAct is an active contributor to the IMEx consortium (http://www.imexconsortium.org). IntAct source code and data are freely available at http://www.ebi.ac.uk/intact.
Collapse
Affiliation(s)
- Samuel Kerrien
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
92
|
Mabey Gilsenan J, Cooley J, Bowyer P. CADRE: the Central Aspergillus Data REpository 2012. Nucleic Acids Res 2011; 40:D660-6. [PMID: 22080563 PMCID: PMC3245145 DOI: 10.1093/nar/gkr971] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The Central Aspergillus Data REpository (CADRE; http://www.cadre-genomes.org.uk) is a public resource for genomic data extracted from species of Aspergillus. It provides an array of online tools for searching and visualising features of this significant fungal genus. CADRE arose from a need within the medical community to understand the human pathogen Aspergillus fumigatus. Due to the paucity of Aspergillus genomic resources 10 years ago, the long-term goal of this project was to collate and maintain Aspergillus genomes as they became available. Since our first release in 2004, the resource has expanded to encompass annotated sequence for eight other Aspergilli and provides much needed support to the international Aspergillus research community. Recent developments, however, in sequencing technology are creating a vast amount of genomic data and, as a result, we shortly expect a tidal wave of Aspergillus data. In preparation for this, we have upgraded the database and software suite. This not only enables better management of more complex data sets, but also improves annotation by providing access to genome comparison data and the integration of high-throughput data.
Collapse
Affiliation(s)
- Jane Mabey Gilsenan
- School of Translational Medicine, University of Manchester, Manchester M23 9LT, UK.
| | | | | |
Collapse
|
93
|
Stajich JE, Harris T, Brunk BP, Brestelli J, Fischer S, Harb OS, Kissinger JC, Li W, Nayak V, Pinney DF, Stoeckert CJ, Roos DS. FungiDB: an integrated functional genomics database for fungi. Nucleic Acids Res 2011; 40:D675-81. [PMID: 22064857 PMCID: PMC3245123 DOI: 10.1093/nar/gkr918] [Citation(s) in RCA: 245] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
FungiDB (http://FungiDB.org) is a functional genomic resource for pan-fungal genomes that was developed in partnership with the Eukaryotic Pathogen Bioinformatic resource center (http://EuPathDB.org). FungiDB uses the same infrastructure and user interface as EuPathDB, which allows for sophisticated and integrated searches to be performed using an intuitive graphical system. The current release of FungiDB contains genome sequence and annotation from 18 species spanning several fungal classes, including the Ascomycota classes, Eurotiomycetes, Sordariomycetes, Saccharomycetes and the Basidiomycota orders, Pucciniomycetes and Tremellomycetes, and the basal 'Zygomycete' lineage Mucormycotina. Additionally, FungiDB contains cell cycle microarray data, hyphal growth RNA-sequence data and yeast two hybrid interaction data. The underlying genomic sequence and annotation combined with functional data, additional data from the FungiDB standard analysis pipeline and the ability to leverage orthology provides a powerful resource for in silico experimentation.
Collapse
Affiliation(s)
- Jason E Stajich
- Department of Plant Pathology & Microbiology, University of California, Riverside, CA 92521, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
94
|
Xie C, Mao X, Huang J, Ding Y, Wu J, Dong S, Kong L, Gao G, Li CY, Wei L. KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res 2011; 39:W316-22. [PMID: 21715386 PMCID: PMC3125809 DOI: 10.1093/nar/gkr483] [Citation(s) in RCA: 3226] [Impact Index Per Article: 248.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
High-throughput experimental technologies often identify dozens to hundreds of genes related to, or changed in, a biological or pathological process. From these genes one wants to identify biological pathways that may be involved and diseases that may be implicated. Here, we report a web server, KOBAS 2.0, which annotates an input set of genes with putative pathways and disease relationships based on mapping to genes with known annotations. It allows for both ID mapping and cross-species sequence similarity mapping. It then performs statistical tests to identify statistically significantly enriched pathways and diseases. KOBAS 2.0 incorporates knowledge across 1327 species from 5 pathway databases (KEGG PATHWAY, PID, BioCyc, Reactome and Panther) and 5 human disease databases (OMIM, KEGG DISEASE, FunDO, GAD and NHGRI GWAS Catalog). KOBAS 2.0 can be accessed at http://kobas.cbi.pku.edu.cn.
Collapse
Affiliation(s)
- Chen Xie
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
95
|
Bateman A, Agrawal S, Birney E, Bruford EA, Bujnicki JM, Cochrane G, Cole JR, Dinger ME, Enright AJ, Gardner PP, Gautheret D, Griffiths-Jones S, Harrow J, Herrero J, Holmes IH, Huang HD, Kelly KA, Kersey P, Kozomara A, Lowe TM, Marz M, Moxon S, Pruitt KD, Samuelsson T, Stadler PF, Vilella AJ, Vogel JH, Williams KP, Wright MW, Zwieb C. RNAcentral: A vision for an international database of RNA sequences. RNA (NEW YORK, N.Y.) 2011; 17:1941-6. [PMID: 21940779 PMCID: PMC3198587 DOI: 10.1261/rna.2750811] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
During the last decade there has been a great increase in the number of noncoding RNA genes identified, including new classes such as microRNAs and piRNAs. There is also a large growth in the amount of experimental characterization of these RNA components. Despite this growth in information, it is still difficult for researchers to access RNA data, because key data resources for noncoding RNAs have not yet been created. The most pressing omission is the lack of a comprehensive RNA sequence database, much like UniProt, which provides a comprehensive set of protein knowledge. In this article we propose the creation of a new open public resource that we term RNAcentral, which will contain a comprehensive collection of RNA sequences and fill an important gap in the provision of biomedical databases. We envision RNA researchers from all over the world joining a federated RNAcentral network, contributing specialized knowledge and databases. RNAcentral would centralize key data that are currently held across a variety of databases, allowing researchers instant access to a single, unified resource. This resource would facilitate the next generation of RNA research and help drive further discoveries, including those that improve food production and human and animal health. We encourage additional RNA database resources and research groups to join this effort. We aim to obtain international network funding to further this endeavor.
Collapse
Affiliation(s)
- Alex Bateman
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, United Kingdom
- Corresponding author.E-mail .
| | - Shipra Agrawal
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Bangalore 560 100, India
- BioCOS Life Sciences Private Limited, Bangalore 560 100, India
| | - Ewan Birney
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
| | - Elspeth A. Bruford
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
| | - Janusz M. Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Trojdena 4, 02-109 Warsaw, Poland
- Laboratory of Bioinformatics, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Umultowska 89, 61-614 Poznan, Poland
| | - Guy Cochrane
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
| | - James R. Cole
- Microbial Ecology Center, Michigan State University, East Lansing, Michigan 48824-1319, USA
| | - Marcel E. Dinger
- Institute for Molecular Bioscience, The University of Queensland, St Lucia QLD 4072, Australia
| | - Anton J. Enright
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
| | - Paul P. Gardner
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, United Kingdom
| | - Daniel Gautheret
- Institut de Génétique et Microbiologie–UMR CNRS 8621, Université Paris-Sud–Bâtiment 400, 91405 Orsay Cedex, France
| | - Sam Griffiths-Jones
- Faculty of Life Sciences, University of Manchester, Michael Smith Building, Manchester, M13 9PT, United Kingdom
| | - Jen Harrow
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, United Kingdom
| | - Javier Herrero
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
| | - Ian H. Holmes
- Department of Bioengineering, University of California, Berkeley, California 94720-1762, USA
| | - Hsien-Da Huang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, HsinChu, 30050, Taiwan
| | - Krystyna A. Kelly
- Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom
| | - Paul Kersey
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
| | - Ana Kozomara
- Faculty of Life Sciences, University of Manchester, Michael Smith Building, Manchester, M13 9PT, United Kingdom
| | - Todd M. Lowe
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Manja Marz
- RNA Bioinformatics Group, Institute of Pharmaceutical Chemistry, Marbacher Weg 6, 35037 Marburg, Germany
| | - Simon Moxon
- University of East Anglia, Norwich, NR4 7TJ, United Kingdom
| | - Kim D. Pruitt
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland 20894, USA
| | - Tore Samuelsson
- Department of Medical Biochemistry, University of Goteborg, Medicinareg. 9A, S-405 30 Goteborg, Sweden
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, 04009 Leipzig, Germany
| | - Albert J. Vilella
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
| | - Jan-Hinnerk Vogel
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, United Kingdom
| | - Kelly P. Williams
- Sandia National Laboratories, MS 9291, Livermore, California 94551-0969, USA
| | - Mathew W. Wright
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom
| | - Christian Zwieb
- Department of Biochemistry, University of Texas Health Science Center at San Antonio, San Antonio, Texas 78229-3901, USA
| |
Collapse
|
96
|
Wood V, Harris MA, McDowall MD, Rutherford K, Vaughan BW, Staines DM, Aslett M, Lock A, Bähler J, Kersey PJ, Oliver SG. PomBase: a comprehensive online resource for fission yeast. Nucleic Acids Res 2011; 40:D695-9. [PMID: 22039153 PMCID: PMC3245111 DOI: 10.1093/nar/gkr853] [Citation(s) in RCA: 240] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
PomBase (www.pombase.org) is a new model organism database established to provide access to comprehensive, accurate, and up-to-date molecular data and biological information for the fission yeast Schizosaccharomyces pombe to effectively support both exploratory and hypothesis-driven research. PomBase encompasses annotation of genomic sequence and features, comprehensive manual literature curation and genome-wide data sets, and supports sophisticated user-defined queries. The implementation of PomBase integrates a Chado relational database that houses manually curated data with Ensembl software that supports sequence-based annotation and web access. PomBase will provide user-friendly tools to promote curation by experts within the fission yeast community. This will make a key contribution to shaping its content and ensuring its comprehensiveness and long-term relevance.
Collapse
Affiliation(s)
- Valerie Wood
- Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
97
|
Ndegwa N, Côté RG, Ovelleiro D, D'Eustachio P, Hermjakob H, Vizcaíno JA, Croft D. Critical amino acid residues in proteins: a BioMart integration of Reactome protein annotations with PRIDE mass spectrometry data and COSMIC somatic mutations. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2011; 2011:bar047. [PMID: 22025670 PMCID: PMC3199918 DOI: 10.1093/database/bar047] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The reversible phosphorylation of serine, threonine and tyrosine hydroxyl groups is an especially prominent form of post-translational modification (PTM) of proteins. It plays critical roles in the regulation of diverse processes, and mutations that directly or indirectly affect these phosphorylation events have been associated with many cancers and other pathologies. Here, we describe the development of a new BioMart tool that gathers data from three different biological resources to provide the user with an integrated view of phosphorylation events associated with a human protein of interest, the complexes of which the protein (modified or not) is a part, the reactions in which the protein and its complexes participate and the somatic mutations that might be expected to perturb those functions. The three resources used are the Reactome, PRIDE and COSMIC databases. The Reactome knowledgebase contains annotations of phosphorylated human proteins linked to the reactions in which they are phosphorylated and dephosphorylated, to the complexes of which they are parts and to the reactions in which the phosphorylated proteins participate as substrates, catalysts and regulators. The PRIDE database holds extensive mass spectrometry data from which protein phosphorylation patterns can be inferred, and the COSMIC database holds records of somatic mutations found in human cancer cells. This tool supports both flexible, user-specified queries and standard (‘canned’) queries to retrieve frequently used combinations of data for user-specified proteins and reactions. We demonstrate using the Wnt signaling pathway and the human c-SRC protein how the tool can be used to place somatic mutation data into a functional perspective by changing critical residues involved in pathway modulation, and where available, check for mass spectrometry evidence in PRIDE supporting identification of the critical residue. Database URL:http://www.reactome.org/cgi-bin/mart
Collapse
Affiliation(s)
- Nelson Ndegwa
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | | | | | | | | | |
Collapse
|
98
|
Wang Y, Li X, Hu H. Transcriptional regulation of co-expressed microRNA target genes. Genomics 2011; 98:445-52. [PMID: 22002038 DOI: 10.1016/j.ygeno.2011.09.004] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2011] [Revised: 08/12/2011] [Accepted: 09/24/2011] [Indexed: 01/26/2023]
Abstract
MicroRNAs play pivotal roles in gene regulation. Despite various research efforts on microRNAs, how microRNA target genes are transcriptionally regulated and how the transcriptional regulation of microRNA target genes relates to that of the microRNA genes are not well studied. By investigating the transcriptional regulation of microRNA target genes, we found that different groups of target genes of the same microRNA are co-expressed under different conditions, and these groups rarely overlap with each other for the majority of microRNAs. We also discovered that co-expressed microRNA target genes are often co-regulated, and different groups of target genes of the same microRNA are often regulated differently. In addition, we observed that transcription factors regulating a microRNA gene often regulate its target genes. Our study sheds light on the regulation of microRNA target genes, which will facilitate the prediction of microRNA target genes and the understanding of the transcriptional regulation of microRNA genes.
Collapse
Affiliation(s)
- Ying Wang
- Department of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | | | | |
Collapse
|
99
|
Martinez M. Plant protein-coding gene families: emerging bioinformatics approaches. TRENDS IN PLANT SCIENCE 2011; 16:558-567. [PMID: 21757395 DOI: 10.1016/j.tplants.2011.06.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2011] [Revised: 06/09/2011] [Accepted: 06/10/2011] [Indexed: 05/31/2023]
Abstract
Protein-coding gene families are sets of similar genes with a shared evolutionary origin and, generally, with similar biological functions. In plants, the size and role of gene families has been only partially addressed. However, suitable bioinformatics tools are being developed to cluster the enormous number of sequences currently available in databases. Specifically, comparative genomic databases promise to become powerful tools for gene family annotation in plant clades. In this review, I evaluate the data retrieved from various gene family databases, the ease with which they can be extracted and how useful the extracted information is.
Collapse
Affiliation(s)
- Manuel Martinez
- Centro de Biotecnología y Genómica de Plantas (UPM-INIA), Campus Montegancedo, Universidad Politécnica de Madrid. Autovía M40 (Km 38), 28223-Pozuelo de Alarcón, Madrid, Spain.
| |
Collapse
|
100
|
Keays MC, Barker D, Wicker-Thomas C, Ritchie MG. Signatures of selection and sex-specific expression variation of a novel duplicate during the evolution of the Drosophila desaturase gene family. Mol Ecol 2011; 20:3617-30. [PMID: 21801259 DOI: 10.1111/j.1365-294x.2011.05208.x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The tempo and mode of evolution of loci with a large effect on adaptation and reproductive isolation will influence the rate of evolutionary divergence and speciation. Desaturase loci are involved in key biochemical changes in long-chain fatty acids. In insects, these have been shown to influence adaptation to starvation or desiccation resistance and in some cases act as important pheromones. The desaturase gene family of Drosophila is known to have evolved by gene duplication and diversification, and at least one locus shows rapid evolution of sex-specific expression variation. Here, we examine the evolution of the gene family in species representing the Drosophila phylogeny. We find that the family includes more loci than have been previously described. Most are represented as single-copy loci, but we also find additional examples of duplications in loci which influence pheromone blends. Most loci show patterns of variation associated with purifying selection, but there are strong signatures of diversifying selection in new duplicates. In the case of a new duplicate of desat1 in the obscura group species, we show that strong selection on the coding sequence is associated with the evolution of sex-specific expression variation. It seems likely that both sexual selection and ecological adaptation have influenced the evolution of this gene family in Drosophila.
Collapse
Affiliation(s)
- Maria C Keays
- Centre for Evolution, Genes and Genomics, School of Biology, University of St. Andrews, St. Andrews, Fife, UK
| | | | | | | |
Collapse
|