651
|
Brylinski M. Is the growth rate of Protein Data Bank sufficient to solve the protein structure prediction problem using template-based modeling? BIO-ALGORITHMS AND MED-SYSTEMS 2015. [DOI: 10.1515/bams-2014-0024] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractThe Protein Data Bank (PDB) undergoes an exponential expansion in terms of the number of macromolecular structures deposited every year. A pivotal question is how this rapid growth of structural information improves the quality of three-dimensional models constructed by contemporary bioinformatics approaches. To address this problem, we performed a retrospective analysis of the structural coverage of a representative set of proteins using remote homology detected by COMPASS and HHpred. We show that the number of proteins whose structures can be confidently predicted increased during a 9-year period between 2005 and 2014 on account of the PDB growth alone. Nevertheless, this encouraging trend slowed down noticeably around the year 2008 and has yielded insignificant improvements ever since. At the current pace, it is unlikely that the protein structure prediction problem will be solved in the near future using existing template-based modeling techniques. Therefore, further advances in experimental structure determination, qualitatively better approaches in fold recognition, and more accurate template-free structure prediction methods are desperately needed.
Collapse
|
652
|
Chapman JA, Mascher M, Buluç A, Barry K, Georganas E, Session A, Strnadova V, Jenkins J, Sehgal S, Oliker L, Schmutz J, Yelick KA, Scholz U, Waugh R, Poland JA, Muehlbauer GJ, Stein N, Rokhsar DS. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome. Genome Biol 2015; 16:26. [PMID: 25637298 PMCID: PMC4373400 DOI: 10.1186/s13059-015-0582-8] [Citation(s) in RCA: 164] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Accepted: 01/06/2015] [Indexed: 11/10/2022] Open
Abstract
Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.
Collapse
Affiliation(s)
- Jarrod A Chapman
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA.
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Aydın Buluç
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Kerrie Barry
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA.
| | - Evangelos Georganas
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. .,Department of Electrical Engineering and Computer Science, Computer Science Division, University of California, Berkeley, CA, 94720, USA.
| | - Adam Session
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
| | - Veronika Strnadova
- Department of Computer Science, University of California, Santa Barbara, CA, 93106, USA.
| | - Jerry Jenkins
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,HudsonAlpha Institute of Biotechnology, Huntsville, AL, 35806, USA.
| | - Sunish Sehgal
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 65506, USA. .,Present address: Department of Plant Science, South Dakota State University, Brookings, SD, 57007, USA.
| | - Leonid Oliker
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Jeremy Schmutz
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,HudsonAlpha Institute of Biotechnology, Huntsville, AL, 35806, USA.
| | - Katherine A Yelick
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. .,Department of Electrical Engineering and Computer Science, Computer Science Division, University of California, Berkeley, CA, 94720, USA.
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Robbie Waugh
- Division of Plant Sciences, University of Dundee & The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK.
| | - Jesse A Poland
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 65506, USA.
| | - Gary J Muehlbauer
- Departments of Agronomy and Plant Genetics, and Plant Biology, University of Minnesota, St Paul, MN, 55108, USA.
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Daniel S Rokhsar
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
| |
Collapse
|
653
|
Luu W, Hart-Smith G, Sharpe LJ, Brown AJ. The terminal enzymes of cholesterol synthesis, DHCR24 and DHCR7, interact physically and functionally. J Lipid Res 2015; 56:888-97. [PMID: 25637936 DOI: 10.1194/jlr.m056986] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Cholesterol is essential to human health, and its levels are tightly regulated by a balance of synthesis, uptake, and efflux. Cholesterol synthesis requires the actions of more than twenty enzymes to reach the final product, through two alternate pathways. Here we describe a physical and functional interaction between the two terminal enzymes. 24-Dehydrocholesterol reductase (DHCR24) and 7-dehydrocholesterol reductase (DHCR7) coimmunoprecipitate, and when the DHCR24 gene is knocked down by siRNA, DHCR7 activity is also ablated. Conversely, overexpression of DHCR24 enhances DHCR7 activity, but only when a functional form of DHCR24 is used. DHCR7 is important for both cholesterol and vitamin D synthesis, and we have identified a novel layer of regulation, whereby its activity is controlled by DHCR24. This suggests the existence of a cholesterol "metabolon", where enzymes from the same metabolic pathway interact with each other to provide a substrate channeling benefit. We predict that other enzymes in cholesterol synthesis may similarly interact, and this should be explored in future studies.
Collapse
Affiliation(s)
- Winnie Luu
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW 2052, Australia
| | - Gene Hart-Smith
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW 2052, Australia
| | - Laura J Sharpe
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW 2052, Australia
| | - Andrew J Brown
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW 2052, Australia
| |
Collapse
|
654
|
Webb TE, Hughes A, Smalley DS, Spriggs KA. An internal ribosome entry site in the 5' untranslated region of epidermal growth factor receptor allows hypoxic expression. Oncogenesis 2015; 4:e134. [PMID: 25622307 PMCID: PMC4275558 DOI: 10.1038/oncsis.2014.43] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 09/29/2014] [Accepted: 10/15/2014] [Indexed: 12/25/2022] Open
Abstract
The expression of epidermal growth factor receptor (EGFR/ERBB1/HER1) is implicated in the progress of numerous cancers, a feature that has been exploited in the development of EGFR antibodies and EGFR tyrosine kinase inhibitors as anti-cancer drugs. However, EGFR also has important normal cellular functions, leading to serious side effects when EGFR is inhibited. One damaging characteristic of many oncogenes is the ability to be expressed in the hypoxic conditions associated with the tumour interior. It has previously been demonstrated that expression of EGFR is maintained in hypoxic conditions via an unknown mechanism of translational control, despite global translation rates generally being attenuated under hypoxic conditions. In this report, we demonstrate that the human EGFR 5′ untranslated region (UTR) sequence can initiate the expression of a downstream open reading frame via an internal ribosome entry site (IRES). We show that this effect is not due to either cryptic promoter activity or splicing events. We have investigated the requirement of the EGFR IRES for eukaryotic initiation factor 4A (eIF4A), which is an RNA helicase responsible for processing RNA secondary structure as part of translation initiation. Treatment with hippuristanol (a potent inhibitor of eIF4A) caused a decrease in EGFR 5′ UTR-driven reporter activity and also a reduction in EGFR protein level. Importantly, we show that expression of a reporter gene under the control of the EGFR IRES is maintained under hypoxic conditions despite a fall in global translation rates.
Collapse
Affiliation(s)
- T E Webb
- School of Pharmacy, University of Nottingham, Nottingham, UK
| | - A Hughes
- School of Pharmacy, University of Nottingham, Nottingham, UK
| | - D S Smalley
- School of Pharmacy, University of Nottingham, Nottingham, UK
| | - K A Spriggs
- School of Pharmacy, University of Nottingham, Nottingham, UK
| |
Collapse
|
655
|
Andreev DE, O'Connor PBF, Fahey C, Kenny EM, Terenin IM, Dmitriev SE, Cormican P, Morris DW, Shatsky IN, Baranov PV. Translation of 5' leaders is pervasive in genes resistant to eIF2 repression. eLife 2015; 4:e03971. [PMID: 25621764 PMCID: PMC4383229 DOI: 10.7554/elife.03971] [Citation(s) in RCA: 234] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2014] [Accepted: 01/22/2015] [Indexed: 12/18/2022] Open
Abstract
Eukaryotic cells rapidly reduce protein synthesis in response to various stress
conditions. This can be achieved by the phosphorylation-mediated inactivation of a
key translation initiation factor, eukaryotic initiation factor 2 (eIF2). However,
the persistent translation of certain mRNAs is required for deployment of an adequate
stress response. We carried out ribosome profiling of cultured human cells under
conditions of severe stress induced with sodium arsenite. Although this led to a
5.4-fold general translational repression, the protein coding open reading frames
(ORFs) of certain individual mRNAs exhibited resistance to the inhibition. Nearly all
resistant transcripts possess at least one efficiently translated upstream open
reading frame (uORF) that represses translation of the main coding ORF under normal
conditions. Site-specific mutagenesis of two identified stress resistant mRNAs
(PPP1R15B and IFRD1) demonstrated that a single uORF is sufficient for eIF2-mediated
translation control in both cases. Phylogenetic analysis suggests that at least two
regulatory uORFs (namely, in SLC35A4 and MIEF1) encode functional protein
products. DOI:http://dx.doi.org/10.7554/eLife.03971.001 Proteins carry out essential tasks for living cells and genes contain the
instructions to make proteins within their DNA. These instructions are copied to make
a molecule of mRNA, and a molecular machine known as a ribosome then reads and
translates the mRNA to build the protein. The first step in the translation process is called ‘initiation’ and
requires a protein called eIF2 to work together with the ribosome. This step involves
identifying an instruction called the start codon that marks the beginning of the
mRNA's coding sequence. The section of an mRNA molecule before the start codon
is not normally translated by the ribosome and is hence called the 5′
untranslated region. Building proteins requires energy and resources, and so it is carefully regulated. If
a cell is stressed, such as by being exposed to harmful chemicals, it makes fewer
proteins in order to conserve its resources. This down-regulation of protein
production is achieved in part by the cell chemically modifying its eIF2 proteins to
make them less able to initiate translation. However, stressed cells still continue
to make more of certain proteins that help them to combat stress. The mRNA molecules
for some of these proteins contain at least one other start codon in the 5′
untranslated region. The sequence that would be translated from such a start codon is
known as an upstream open reading frame (or uORF for short)—and this feature
is thought to help certain proteins to still be expressed despite low levels of
active eIF2. Andreev, O'Connor et al. have now analysed which mRNAs are
translated in human cells that have been treated with a chemical that induces stress
and makes the eIF2 protein less able to initiate translation. To do so, a technique
called ribosome profiling was used to identify all of the mRNA molecules bound to
ribosomes shortly after treatment with this chemical. Overall translation of most mRNAs in stressed cells was reduced to a quarter of the
normal level. However, Andreev, O'Connor et al. observed that the translation
of a few mRNAs continued almost as normal, or even increased, after the chemical
treatment. Notably, most of these mRNAs encoded regulatory proteins, which are not
required in large amounts. With one exception, all of these resistant mRNAs contained
uORFs. In unstressed cells, these uORFs were efficiently translated, while the same
mRNA's coding sequences were translated less efficiently. Andreev,
O'Connor et al. suggest that these two features could be used to identify
mRNAs that are still translated into working proteins when cells are stressed.
Further work is now needed to explore the mechanisms by which translation of these
uORFs allows mRNAs to resist the stress. DOI:http://dx.doi.org/10.7554/eLife.03971.002
Collapse
Affiliation(s)
- Dmitry E Andreev
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia
| | | | - Ciara Fahey
- Department of Psychiatry and Institute of Molecular Medicine, Trinity College Dublin, Dublin, Ireland
| | - Elaine M Kenny
- Department of Psychiatry and Institute of Molecular Medicine, Trinity College Dublin, Dublin, Ireland
| | - Ilya M Terenin
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Sergey E Dmitriev
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Paul Cormican
- Department of Psychiatry and Institute of Molecular Medicine, Trinity College Dublin, Dublin, Ireland
| | - Derek W Morris
- Department of Psychiatry and Institute of Molecular Medicine, Trinity College Dublin, Dublin, Ireland
| | - Ivan N Shatsky
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Pavel V Baranov
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| |
Collapse
|
656
|
Le Pera L, Mazzapioda M, Tramontano A. 3USS: a web server for detecting alternative 3'UTRs from RNA-seq experiments. ACTA ACUST UNITED AC 2015; 31:1845-7. [PMID: 25617413 PMCID: PMC4443675 DOI: 10.1093/bioinformatics/btv035] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Accepted: 01/15/2015] [Indexed: 12/04/2022]
Abstract
Summary: Protein-coding genes with multiple alternative polyadenylation sites can generate mRNA 3′UTR sequences of different lengths, thereby causing the loss or gain of regulatory elements, which can affect stability, localization and translation efficiency. 3USS is a web-server developed with the aim of giving experimentalists the possibility to automatically identify alternative 3′UTRs (shorter or longer with respect to a reference transcriptome), an option that is not available in standard RNA-seq data analysis procedures. The tool reports as putative novel the 3′UTRs not annotated in available databases. Furthermore, if data from two related samples are uploaded, common and specific alternative 3′UTRs are identified and reported by the server. Availability and implementation: 3USS is freely available at http://www.biocomputing.it/3uss_server Contact:anna.tramontano@uniroma1.it Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Loredana Le Pera
- Center for Life Nano Science@Sapienza, Istituto Italiano di Tecnologia, Rome, Italy, Department of Physics, Sapienza University, Rome, Italy and Istituto Pasteur - Fondazione Cenci Bolognetti, Sapienza University, Rome, Italy Center for Life Nano Science@Sapienza, Istituto Italiano di Tecnologia, Rome, Italy, Department of Physics, Sapienza University, Rome, Italy and Istituto Pasteur - Fondazione Cenci Bolognetti, Sapienza University, Rome, Italy
| | - Mariagiovanna Mazzapioda
- Center for Life Nano Science@Sapienza, Istituto Italiano di Tecnologia, Rome, Italy, Department of Physics, Sapienza University, Rome, Italy and Istituto Pasteur - Fondazione Cenci Bolognetti, Sapienza University, Rome, Italy
| | - Anna Tramontano
- Center for Life Nano Science@Sapienza, Istituto Italiano di Tecnologia, Rome, Italy, Department of Physics, Sapienza University, Rome, Italy and Istituto Pasteur - Fondazione Cenci Bolognetti, Sapienza University, Rome, Italy Center for Life Nano Science@Sapienza, Istituto Italiano di Tecnologia, Rome, Italy, Department of Physics, Sapienza University, Rome, Italy and Istituto Pasteur - Fondazione Cenci Bolognetti, Sapienza University, Rome, Italy Center for Life Nano Science@Sapienza, Istituto Italiano di Tecnologia, Rome, Italy, Department of Physics, Sapienza University, Rome, Italy and Istituto Pasteur - Fondazione Cenci Bolognetti, Sapienza University, Rome, Italy
| |
Collapse
|
657
|
Chiang Z, Vastermark A, Punta M, Coggill PC, Mistry J, Finn RD, Saier MH. The complexity, challenges and benefits of comparing two transporter classification systems in TCDB and Pfam. Brief Bioinform 2015; 16:865-72. [PMID: 25614388 PMCID: PMC4570203 DOI: 10.1093/bib/bbu053] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Indexed: 01/04/2023] Open
Abstract
Transport systems comprise roughly 10% of all proteins in a cell, playing critical roles in many processes. Improving and expanding their classification is an important goal that can affect studies ranging from comparative genomics to potential drug target searches. It is not surprising that different classification systems for transport proteins have arisen, be it within a specialized database, focused on this functional class of proteins, or as part of a broader classification system for all proteins. Two such databases are the Transporter Classification Database (TCDB) and the Protein family (Pfam) database. As part of a long-term endeavor to improve consistency between the two classification systems, we have compared transporter annotations in the two databases to understand the rationale for differences and to improve both systems. Differences sometimes reflect the fact that one database has a particular transporter family while the other does not. Differing family definitions and hierarchical organizations were reconciled, resulting in recognition of 69 Pfam ‘Domains of Unknown Function’, which proved to be transport protein families to be renamed using TCDB annotations. Of over 400 potential new Pfam families identified from TCDB, 10% have already been added to Pfam, and TCDB has created 60 new entries based on Pfam data. This work, for the first time, reveals the benefits of comprehensive database comparisons and explains the differences between Pfam and TCDB.
Collapse
|
658
|
Wheeler NJ, Agbedanu PN, Kimber MJ, Ribeiro P, Day TA, Zamanian M. Functional analysis of Girardia tigrina transcriptome seeds pipeline for anthelmintic target discovery. Parasit Vectors 2015; 8:34. [PMID: 25600302 PMCID: PMC4304616 DOI: 10.1186/s13071-014-0622-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Accepted: 12/23/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Neglected diseases caused by helminth infections impose a massive hindrance to progress in the developing world. While basic research on parasitic flatworms (platyhelminths) continues to expand, researchers have yet to broadly adopt a free-living model to complement the study of these important parasites. METHODS We report the high-coverage sequencing (RNA-Seq) and assembly of the transcriptome of the planarian Girardia tigrina across a set of dynamic conditions. The assembly was annotated and extensive orthology analysis was used to seed a pipeline for the rational prioritization and validation of putative anthelmintic targets. A small number of targets conserved between parasitic and free-living flatworms were comparatively interrogated. RESULTS 240 million paired-end reads were assembled de novo to produce a strictly filtered predicted proteome consisting of over 22,000 proteins. Gene Ontology annotations were extended to 16,467 proteins. 2,693 sequences were identified in orthology groups spanning flukes, tapeworms and planaria, with 441 highlighted as belonging to druggable protein families. Chemical inhibitors were used on three targets in pharmacological screens using both planaria and schistosomula, revealing distinct motility phenotypes that were shown to correlate with planarian RNAi phenotypes. CONCLUSIONS This work provides the first comprehensive and annotated sequence resource for the model planarian G. tigrina, alongside a prioritized list of candidate drug targets conserved among parasitic and free-living flatworms. As proof of principle, we show that a simple RNAi and pharmacology pipeline in the more convenient planarian model system can inform parasite biology and serve as an efficient screening tool for the identification of lucrative anthelmintic targets.
Collapse
Affiliation(s)
- Nicolas J Wheeler
- Department of Biomedical Sciences, Iowa State University, Ames, IA, 50010, USA.
| | - Prince N Agbedanu
- Department of Biomedical Sciences, Iowa State University, Ames, IA, 50010, USA.
| | - Michael J Kimber
- Department of Biomedical Sciences, Iowa State University, Ames, IA, 50010, USA.
| | - Paula Ribeiro
- Institute of Parasitology, McGill University, Ste. Anne de Bellevue, QC, H9X 3V9, Canada.
| | - Tim A Day
- Department of Biomedical Sciences, Iowa State University, Ames, IA, 50010, USA.
| | - Mostafa Zamanian
- Department of Biomedical Sciences, Iowa State University, Ames, IA, 50010, USA. .,Institute of Parasitology, McGill University, Ste. Anne de Bellevue, QC, H9X 3V9, Canada.
| |
Collapse
|
659
|
The landscape of long noncoding RNAs in the human transcriptome. Nat Genet 2015; 47:199-208. [PMID: 25599403 DOI: 10.1038/ng.3192] [Citation(s) in RCA: 2033] [Impact Index Per Article: 225.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Accepted: 12/18/2014] [Indexed: 12/13/2022]
Abstract
Long noncoding RNAs (lncRNAs) are emerging as important regulators of tissue physiology and disease processes including cancer. To delineate genome-wide lncRNA expression, we curated 7,256 RNA sequencing (RNA-seq) libraries from tumors, normal tissues and cell lines comprising over 43 Tb of sequence from 25 independent studies. We applied ab initio assembly methodology to this data set, yielding a consensus human transcriptome of 91,013 expressed genes. Over 68% (58,648) of genes were classified as lncRNAs, of which 79% were previously unannotated. About 1% (597) of the lncRNAs harbored ultraconserved elements, and 7% (3,900) overlapped disease-associated SNPs. To prioritize lineage-specific, disease-associated lncRNA expression, we employed non-parametric differential expression testing and nominated 7,942 lineage- or cancer-associated lncRNA genes. The lncRNA landscape characterized here may shed light on normal biology and cancer pathogenesis and may be valuable for future biomarker development.
Collapse
|
660
|
Hutchins JRA. What's that gene (or protein)? Online resources for exploring functions of genes, transcripts, and proteins. Mol Biol Cell 2015; 25:1187-201. [PMID: 24723265 PMCID: PMC3982986 DOI: 10.1091/mbc.e13-10-0602] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The genomic era has enabled research projects that use approaches including genome-scale screens, microarray analysis, next-generation sequencing, and mass spectrometry-based proteomics to discover genes and proteins involved in biological processes. Such methods generate data sets of gene, transcript, or protein hits that researchers wish to explore to understand their properties and functions and thus their possible roles in biological systems of interest. Recent years have seen a profusion of Internet-based resources to aid this process. This review takes the viewpoint of the curious biologist wishing to explore the properties of protein-coding genes and their products, identified using genome-based technologies. Ten key questions are asked about each hit, addressing functions, phenotypes, expression, evolutionary conservation, disease association, protein structure, interactors, posttranslational modifications, and inhibitors. Answers are provided by presenting the latest publicly available resources, together with methods for hit-specific and data set-wide information retrieval, suited to any genome-based analytical technique and experimental species. The utility of these resources is demonstrated for 20 factors regulating cell proliferation. Results obtained using some of these are discussed in more depth using the p53 tumor suppressor as an example. This flexible and universally applicable approach for characterizing experimental hits helps researchers to maximize the potential of their projects for biological discovery.
Collapse
Affiliation(s)
- James R A Hutchins
- Institute of Human Genetics, Centre National de la Recherche Scientifique (CNRS), 34396 Montpellier, France
| |
Collapse
|
661
|
Menezes-Souza D, Mendes TADO, Gomes MDS, Bartholomeu DC, Fujiwara RT. Improving serodiagnosis of human and canine leishmaniasis with recombinant Leishmania braziliensis cathepsin l-like protein and a synthetic peptide containing its linear B-cell epitope. PLoS Negl Trop Dis 2015; 9:e3426. [PMID: 25569432 PMCID: PMC4287388 DOI: 10.1371/journal.pntd.0003426] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2014] [Accepted: 11/17/2014] [Indexed: 12/17/2022] Open
Abstract
Background The early and correct diagnosis of human leishmaniasis is essential for disease treatment. Another important step in the control of visceral leishmaniasis is the identification of infected dogs, which are the main domestic reservoir of L. infantum. Recombinant proteins and synthetic peptides based on Leishmania genes have emerged as valuable targets for serodiagnosis due to their increased sensitivity, specificity and potential for standardization. Cathepsin L-like genes are surface antigens that are secreted by amastigotes and have little similarity to host proteins, factors that enable this protein as a good target for serodiagnosis of the leishmaniasis. Methodology/Principal Findings We mapped a linear B-cell epitope within the Cathepsin L-like protein from L. braziliensis. A synthetic peptide containing the epitope and the recombinant protein was evaluated for serodiagnosis of human tegumentary and visceral leishmaniasis, as well as canine visceral leishmaniasis. Conclusions/Significance The recombinant protein performed best for human tegumentary and canine visceral leishmaniasis, with 96.30% and 89.33% accuracy, respectively. The synthetic peptide was the best to discriminate human visceral leishmaniasis, with 97.14% specificity, 94.55% sensitivity and 96.00% accuracy. Comparison with T. cruzi-infected humans and dogs suggests that the identified epitope is specific to Leishmania parasites, which minimizes the likelihood of cross-reactions. Leishmaniasis is one of the major diseases of importance in public health and its precise diagnosis may represent one of the most relevant challenges for the control and possible eradication of the disease. In this context, recombinant proteins and synthetic peptides based on Leishmania genes have emerged as valuable targets for serodiagnosis due to their increased sensitivity, specificity and potential for standardization. Cathepsin L-like (CatL) genes are more abundant in stationary promastigotes and amastigotes, and have less than 40% identity with human proteins and more than 60% identity with other Leishmania species. We mapped a linear B-cell epitope in the CatL protein sequence and compared its performance with the recombinant protein and current serology methodologies for the diagnosis of human tegumentary and visceral leishmaniasis as well as of canine visceral leishmaniasis (CVL). Both the recombinant protein and synthetic peptide showed higher specificity and sensitivity than crude preparations commonly used for other antigens, and thus, they are valuable targets to compose an antigen panel that could significantly improve leishmaniasis diagnosis.
Collapse
Affiliation(s)
- Daniel Menezes-Souza
- Departamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | | | - Matheus de Souza Gomes
- Instituto de Genética e Bioquímica, Universidade Federal de Uberlândia, Patos de Minas, Brazil
| | | | - Ricardo Toshio Fujiwara
- Departamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
- * E-mail:
| |
Collapse
|
662
|
Park D, Jung JW, Choi BS, Jayakodi M, Lee J, Lim J, Yu Y, Choi YS, Lee ML, Park Y, Choi IY, Yang TJ, Edwards OR, Nah G, Kwon HW. Uncovering the novel characteristics of Asian honey bee, Apis cerana, by whole genome sequencing. BMC Genomics 2015; 16:1. [PMID: 25553907 PMCID: PMC4326529 DOI: 10.1186/1471-2164-16-1] [Citation(s) in RCA: 451] [Impact Index Per Article: 50.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Accepted: 12/02/2014] [Indexed: 12/03/2022] Open
Abstract
Background The honey bee is an important model system for increasing understanding of molecular and neural mechanisms underlying social behaviors relevant to the agricultural industry and basic science. The western honey bee, Apis mellifera, has served as a model species, and its genome sequence has been published. In contrast, the genome of the Asian honey bee, Apis cerana, has not yet been sequenced. A. cerana has been raised in Asian countries for thousands of years and has brought considerable economic benefits to the apicultural industry. A cerana has divergent biological traits compared to A. mellifera and it has played a key role in maintaining biodiversity in eastern and southern Asia. Here we report the first whole genome sequence of A. cerana. Results Using de novo assembly methods, we produced a 238 Mbp draft of the A. cerana genome and generated 10,651 genes. A.cerana-specific genes were analyzed to better understand the novel characteristics of this honey bee species. Seventy-two percent of the A. cerana-specific genes had more than one GO term, and 1,696 enzymes were categorized into 125 pathways. Genes involved in chemoreception and immunity were carefully identified and compared to those from other sequenced insect models. These included 10 gustatory receptors, 119 odorant receptors, 10 ionotropic receptors, and 160 immune-related genes. Conclusions This first report of the whole genome sequence of A. cerana provides resources for comparative sociogenomics, especially in the field of social insect communication. These important tools will contribute to a better understanding of the complex behaviors and natural biology of the Asian honey bee and to anticipate its future evolutionary trajectory. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-16-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | - Gyoungju Nah
- Biomodulation Major, Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul 151-921, Republic of Korea.
| | | |
Collapse
|
663
|
Muthamilarasan M, Prasad M. Advances in Setaria genomics for genetic improvement of cereals and bioenergy grasses. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2015. [PMID: 25239219 DOI: 10.1007/s00122-014-2399-325239219] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Recent advances in Setaria genomics appear promising for genetic improvement of cereals and biofuel crops towards providing multiple securities to the steadily increasing global population. The prominent attributes of foxtail millet (Setaria italica, cultivated) and green foxtail (S. viridis, wild) including small genome size, short life-cycle, in-breeding nature, genetic close-relatedness to several cereals, millets and bioenergy grasses, and potential abiotic stress tolerance have accentuated these two Setaria species as novel model system for studying C4 photosynthesis, stress biology and biofuel traits. Considering this, studies have been performed on structural and functional genomics of these plants to develop genetic and genomic resources, and to delineate the physiology and molecular biology of stress tolerance, for the improvement of millets, cereals and bioenergy grasses. The release of foxtail millet genome sequence has provided a new dimension to Setaria genomics, resulting in large-scale development of genetic and genomic tools, construction of informative databases, and genome-wide association and functional genomic studies. In this context, this review discusses the advancements made in Setaria genomics, which have generated a considerable knowledge that could be used for the improvement of millets, cereals and biofuel crops. Further, this review also shows the nutritional potential of foxtail millet in providing health benefits to global population and provides a preliminary information on introgressing the nutritional properties in graminaceous species through molecular breeding and transgene-based approaches.
Collapse
Affiliation(s)
- Mehanathan Muthamilarasan
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, JNU Campus, New Delhi, 110 067, India
| | | |
Collapse
|
664
|
Muthamilarasan M, Prasad M. Advances in Setaria genomics for genetic improvement of cereals and bioenergy grasses. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2015; 128:1-14. [PMID: 25239219 DOI: 10.1007/s00122-014-2399-3] [Citation(s) in RCA: 95] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Accepted: 09/11/2014] [Indexed: 05/18/2023]
Abstract
Recent advances in Setaria genomics appear promising for genetic improvement of cereals and biofuel crops towards providing multiple securities to the steadily increasing global population. The prominent attributes of foxtail millet (Setaria italica, cultivated) and green foxtail (S. viridis, wild) including small genome size, short life-cycle, in-breeding nature, genetic close-relatedness to several cereals, millets and bioenergy grasses, and potential abiotic stress tolerance have accentuated these two Setaria species as novel model system for studying C4 photosynthesis, stress biology and biofuel traits. Considering this, studies have been performed on structural and functional genomics of these plants to develop genetic and genomic resources, and to delineate the physiology and molecular biology of stress tolerance, for the improvement of millets, cereals and bioenergy grasses. The release of foxtail millet genome sequence has provided a new dimension to Setaria genomics, resulting in large-scale development of genetic and genomic tools, construction of informative databases, and genome-wide association and functional genomic studies. In this context, this review discusses the advancements made in Setaria genomics, which have generated a considerable knowledge that could be used for the improvement of millets, cereals and biofuel crops. Further, this review also shows the nutritional potential of foxtail millet in providing health benefits to global population and provides a preliminary information on introgressing the nutritional properties in graminaceous species through molecular breeding and transgene-based approaches.
Collapse
Affiliation(s)
- Mehanathan Muthamilarasan
- National Institute of Plant Genome Research (NIPGR), Aruna Asaf Ali Marg, JNU Campus, New Delhi, 110 067, India
| | | |
Collapse
|
665
|
Lin YH, Bundschuh R. RNA structure generates natural cooperativity between single-stranded RNA binding proteins targeting 5' and 3'UTRs. Nucleic Acids Res 2014; 43:1160-9. [PMID: 25550422 PMCID: PMC4333377 DOI: 10.1093/nar/gku1320] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
In post-transcriptional regulation, an mRNA molecule is bound by many proteins and/or miRNAs to modulate its function. To enable combinatorial gene regulation, these binding partners of an RNA must communicate with each other, exhibiting cooperativity. Even in the absence of direct physical interactions between the binding partners, such cooperativity can be mediated through RNA secondary structures, since they affect the accessibility of the binding sites. Here we propose a quantitative measure of this structure-mediated cooperativity that can be numerically calculated for an arbitrary RNA sequence. Focusing on an RNA with two binding sites, we derive a characteristic difference of free energy differences, i.e. ΔΔG, as a measure of the effect of the occupancy of one binding site on the binding strength of another. We apply this measure to a large number of human and Caenorhabditis elegans mRNAs, and find that structure-mediated cooperativity is a generic feature. Interestingly, this cooperativity not only affects binding sites in close proximity along the sequence but also configurations in which one binding site is located in the 5′UTR and the other is located in the 3′UTR of the mRNA. Furthermore, we find that this end-to-end cooperativity is determined by the UTR sequences while the sequences of the coding regions are irrelevant.
Collapse
Affiliation(s)
- Yi-Hsuan Lin
- Department of Physics, The Ohio State University, 191W Woodruff Avenue, Columbus, OH 43210-1107, USA
| | - Ralf Bundschuh
- Department of Physics, The Ohio State University, 191W Woodruff Avenue, Columbus, OH 43210-1107, USA Department of Chemistry & Biochemistry, The Ohio State University, 100W 18th Avenue, Columbus, OH 43210-1340, USA Division of Hematology, Department of Internal Medicine, The Ohio State University, 320W 10th Avenue, Columbus, OH 43210, USA Center for RNA Biology, The Ohio State University, 484W 12th Avenue, Columbus, OH 43210-1292, USA
| |
Collapse
|
666
|
Arai D, Hayakawa K, Ohgane J, Hirosawa M, Nakao Y, Tanaka S, Shiota K. An epigenetic regulatory element of the Nodal gene in the mouse and human genomes. Mech Dev 2014; 136:143-54. [PMID: 25528267 DOI: 10.1016/j.mod.2014.12.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2014] [Revised: 12/12/2014] [Accepted: 12/15/2014] [Indexed: 01/28/2023]
Abstract
Nodal signaling plays critical roles during embryonic development. The Nodal gene is not expressed in adult tissues but is frequently activated in cancer cells, contributing to progression toward malignancy. Although several regulatory elements of the Nodal gene have been identified, the epigenetic mechanisms by which Nodal expression is regulated over the long term remain unclear. We found a region exhibiting dynamic changes in DNA methylation at approximately -3.0 kb to -0.4 kb upstream from the transcriptional start site (TSS) that we termed the epigenetic regulatory element (ERE). The ERE was unmethylated in mouse embryonic stem cells (mESCs) but became increasingly methylated in differentiated cells and tissues, concomitant with the downregulation of Nodal mRNA expression. In vitro reporter assays identified an Oct3/4 binding motif within the ERE, indicating that the ERE is responsible for the activation of Nodal in mESCs. Furthermore, the ERE was a target of differentiation-associated Polycomb silencing, and the chromatin condensed when mESCs differentiated to embryoid bodies (EBs). Pharmacological inhibition of PRC2 led to the reactivation of Nodal expression in EBs and mouse embryonic fibroblasts (MEFs). The ERE was also targeted by PRC2 in normal human cells. In NODAL-expressing human cancer cells, accumulation of EZH2 and trimethylation of H3K27 at the ERE were diminished. In conclusion, Nodal is epigenetically controlled through the ERE in the mouse embryo and human cells.
Collapse
Affiliation(s)
- Daisuke Arai
- Laboratory of Cellular Biochemistry, Department of Animal Resource Sciences/Veterinary Medical Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan; Laboratory of Chemical Biology, Department of Chemistry and Biochemistry, School of Advanced Science and Engineering, Waseda University, 3-4-1 Ohkubo, Shinjuku-ku, Tokyo 169-8555, Japan
| | - Koji Hayakawa
- Laboratory of Cellular Biochemistry, Department of Animal Resource Sciences/Veterinary Medical Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Jun Ohgane
- Laboratory of Genomic Function Engineering, Department of Life Sciences, School of Agriculture, Meiji University, 1-1-1 Higashi-mita, Tama-ku, Kawasaki 214-8571, Japan
| | - Mitsuko Hirosawa
- Laboratory of Cellular Biochemistry, Department of Animal Resource Sciences/Veterinary Medical Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Yoichi Nakao
- Laboratory of Chemical Biology, Department of Chemistry and Biochemistry, School of Advanced Science and Engineering, Waseda University, 3-4-1 Ohkubo, Shinjuku-ku, Tokyo 169-8555, Japan
| | - Satoshi Tanaka
- Laboratory of Cellular Biochemistry, Department of Animal Resource Sciences/Veterinary Medical Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Kunio Shiota
- Laboratory of Cellular Biochemistry, Department of Animal Resource Sciences/Veterinary Medical Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan.
| |
Collapse
|
667
|
Wang M, Zhang P, Shu Y, Yuan F, Zhang Y, Zhou Y, Jiang M, Zhu Y, Hu L, Kong X, Zhang Z. Alternative splicing at GYNNGY 5' splice sites: more noise, less regulation. Nucleic Acids Res 2014; 42:13969-80. [PMID: 25428370 PMCID: PMC4267661 DOI: 10.1093/nar/gku1253] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2014] [Revised: 10/29/2014] [Accepted: 11/12/2014] [Indexed: 12/28/2022] Open
Abstract
Numerous eukaryotic genes are alternatively spliced. Recently, deep transcriptome sequencing has skyrocketed proportion of alternatively spliced genes; over 95% human multi-exon genes are alternatively spliced. One fundamental question is: are all these alternative splicing (AS) events functional? To look into this issue, we studied the most common form of alternative 5' splice sites-GYNNGYs (Y = C/T), where both GYs can function as splice sites. Global analyses suggest that splicing noise (due to stochasticity of splicing process) can cause AS at GYNNGYs, evidenced by higher AS frequency in non-coding than in coding regions, in non-conserved than in conserved genes and in lowly expressed than in highly expressed genes. However, ∼20% AS GYNNGYs in humans and ∼3% in mice exhibit tissue-dependent regulation. Consistent with being functional, regulated GYNNGYs are more conserved than unregulated ones. And regulated GYNNGYs have distinctive sequence features which may confer regulation. Particularly, each regulated GYNNGY comprises two splice sites more resembling each other than unregulated GYNNGYs, and has more conserved downstream flanking intron. Intriguingly, most regulated GYNNGYs may tune gene expression through coupling with nonsense-mediated mRNA decay, rather than encode different proteins. In summary, AS at GYNNGY 5' splice sites is primarily splicing noise, and secondarily a way of regulation.
Collapse
Affiliation(s)
- Meng Wang
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Peiwei Zhang
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Yang Shu
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Fei Yuan
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Yuchao Zhang
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - You Zhou
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Min Jiang
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Yufei Zhu
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Landian Hu
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Xiangyin Kong
- State Key Laboratory of Medical Genomics, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, People's Republic of China Graduate School of the Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Zhenguo Zhang
- Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
668
|
Hornbeck PV, Zhang B, Murray B, Kornhauser JM, Latham V, Skrzypek E. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res 2014; 43:D512-20. [PMID: 25514926 PMCID: PMC4383998 DOI: 10.1093/nar/gku1267] [Citation(s) in RCA: 2153] [Impact Index Per Article: 215.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
PhosphoSitePlus® (PSP, http://www.phosphosite.org/), a knowledgebase dedicated to mammalian post-translational modifications (PTMs), contains over 330 000 non-redundant PTMs, including phospho, acetyl, ubiquityl and methyl groups. Over 95% of the sites are from mass spectrometry (MS) experiments. In order to improve data reliability, early MS data have been reanalyzed, applying a common standard of analysis across over 1 000 000 spectra. Site assignments with P > 0.05 were filtered out. Two new downloads are available from PSP. The ‘Regulatory sites’ dataset includes curated information about modification sites that regulate downstream cellular processes, molecular functions and protein-protein interactions. The ‘PTMVar’ dataset, an intersect of missense mutations and PTMs from PSP, identifies over 25 000 PTMVars (PTMs Impacted by Variants) that can rewire signaling pathways. The PTMVar data include missense mutations from UniPROTKB, TCGA and other sources that cause over 2000 diseases or syndromes (MIM) and polymorphisms, or are associated with hundreds of cancers. PTMVars include 18 548 phosphorlyation sites, 3412 ubiquitylation sites, 2316 acetylation sites, 685 methylation sites and 245 succinylation sites.
Collapse
Affiliation(s)
| | - Bin Zhang
- Cell Signaling Technology, 3 Trask Lane, Danvers, MA 01923, USA
| | - Beth Murray
- Cell Signaling Technology, 3 Trask Lane, Danvers, MA 01923, USA
| | | | - Vaughan Latham
- Cell Signaling Technology, 3 Trask Lane, Danvers, MA 01923, USA
| | | |
Collapse
|
669
|
Jian X, Boerwinkle E, Liu X. In silico prediction of splice-altering single nucleotide variants in the human genome. Nucleic Acids Res 2014; 42:13534-44. [PMID: 25416802 PMCID: PMC4267638 DOI: 10.1093/nar/gku1206] [Citation(s) in RCA: 350] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2014] [Revised: 10/12/2014] [Accepted: 11/04/2014] [Indexed: 01/17/2023] Open
Abstract
In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.
Collapse
Affiliation(s)
- Xueqiu Jian
- Division of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Center for Human Genetics, The Brown Foundation Institute of Molecular Medicine for the Prevention of Human Diseases, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Eric Boerwinkle
- Division of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Center for Human Genetics, The Brown Foundation Institute of Molecular Medicine for the Prevention of Human Diseases, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Xiaoming Liu
- Division of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
670
|
Bahrami-Samani E, Penalva LOF, Smith AD, Uren PJ. Leveraging cross-link modification events in CLIP-seq for motif discovery. Nucleic Acids Res 2014; 43:95-103. [PMID: 25505146 PMCID: PMC4288180 DOI: 10.1093/nar/gku1288] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
High-throughput protein-RNA interaction data generated by CLIP-seq has provided an unprecedented depth of access to the activities of RNA-binding proteins (RBPs), the key players in co- and post-transcriptional regulation of gene expression. Motif discovery forms part of the necessary follow-up data analysis for CLIP-seq, both to refine the exact locations of RBP binding sites, and to characterize them. The specific properties of RBP binding sites, and the CLIP-seq methods, provide additional information not usually present in the classic motif discovery problem: the binding site structure, and cross-linking induced events in reads. We show that CLIP-seq data contains clear secondary structure signals, as well as technology- and RBP-specific cross-link signals. We introduce Zagros, a motif discovery algorithm specifically designed to leverage this information and explore its impact on the quality of recovered motifs. Our results indicate that using both secondary structure and cross-link modifications can greatly improve motif discovery on CLIP-seq data. Further, the motifs we recover provide insight into the balance between sequence- and structure-specificity struck by RBP binding.
Collapse
Affiliation(s)
- Emad Bahrami-Samani
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Luiz O F Penalva
- Children's Cancer Research Institute and Department of Cellular and Structural Biology, University of Texas Health Science Center, San Antonio, TX 78229, USA
| | - Andrew D Smith
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Philip J Uren
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
671
|
Dritsou V, Deligianni E, Dialynas E, Allen J, Poulakakis N, Louis C, Lawson D, Topalis P. Non-coding RNA gene families in the genomes of anopheline mosquitoes. BMC Genomics 2014; 15:1038. [PMID: 25432596 PMCID: PMC4300560 DOI: 10.1186/1471-2164-15-1038] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Accepted: 11/19/2014] [Indexed: 12/12/2022] Open
Abstract
Background Only a small fraction of the mosquito species of the genus Anopheles are able to transmit malaria, one of the biggest killer diseases of poverty, which is mostly prevalent in the tropics. This diversity has genetic, yet unknown, causes. In a further attempt to contribute to the elucidation of these variances, the international “Anopheles Genomes Cluster Consortium” project (a.k.a. “16 Anopheles genomes project”) was established, aiming at a comprehensive genomic analysis of several anopheline species, most of which are malaria vectors. In the frame of the international consortium carrying out this project our team studied the genes encoding families of non-coding RNAs (ncRNAs), concentrating on four classes: microRNA (miRNA), ribosomal RNA (rRNA), small nuclear RNA (snRNA), and in particular small nucleolar RNA (snoRNA) and, finally, transfer RNA (tRNA). Results Our analysis was carried out using, exclusively, computational approaches, and evaluating both the primary NGS reads as well as the respective genome assemblies produced by the consortium and stored in VectorBase; moreover, the results of RNAseq surveys in cases in which these were available and meaningful were also accessed in order to obtain supplementary data, as were “pre-genomic era” sequence data stored in nucleic acid databases. The investigation included the identification and analysis, in most species studied, of ncRNA genes belonging to several families, as well as the analysis of the evolutionary relations of some of those genes in cross-comparisons to other members of the genus Anopheles. Conclusions Our study led to the identification of members of these gene families in the majority of twenty different anopheline taxa. A set of tools for the study of the evolution and molecular biology of important disease vectors has, thus, been obtained. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-1038) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Pantelis Topalis
- Institute of Molecular Biology and Biotechnology, FORTH, Heraklion, Greece.
| |
Collapse
|
672
|
Rosenbloom KR, Armstrong J, Barber GP, Casper J, Clawson H, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, Harte RA, Heitner S, Hickey G, Hinrichs AS, Hubley R, Karolchik D, Learned K, Lee BT, Li CH, Miga KH, Nguyen N, Paten B, Raney BJ, Smit AFA, Speir ML, Zweig AS, Haussler D, Kuhn RM, Kent WJ. The UCSC Genome Browser database: 2015 update. Nucleic Acids Res 2014; 43:D670-81. [PMID: 25428374 PMCID: PMC4383971 DOI: 10.1093/nar/gku1177] [Citation(s) in RCA: 699] [Impact Index Per Article: 69.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), 'mined the web' for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled.
Collapse
Affiliation(s)
- Kate R Rosenbloom
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Joel Armstrong
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Galt P Barber
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Jonathan Casper
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Timothy R Dreszer
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Pauline A Fujita
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Luvina Guruvadoo
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Rachel A Harte
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Steve Heitner
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Glenn Hickey
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Angie S Hinrichs
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Robert Hubley
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Donna Karolchik
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Katrina Learned
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brian T Lee
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Chin H Li
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Karen H Miga
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Ngan Nguyen
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Benedict Paten
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | | | - Matthew L Speir
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - David Haussler
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Robert M Kuhn
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - W James Kent
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| |
Collapse
|
673
|
Oates ME, Stahlhacke J, Vavoulis DV, Smithers B, Rackham OJL, Sardar AJ, Zaucha J, Thurlby N, Fang H, Gough J. The SUPERFAMILY 1.75 database in 2014: a doubling of data. Nucleic Acids Res 2014; 43:D227-33. [PMID: 25414345 PMCID: PMC4383889 DOI: 10.1093/nar/gku1041] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
We present updates to the SUPERFAMILY 1.75 (http://supfam.org) online resource and protein sequence collection. The hidden Markov model library that provides sequence homology to SCOP structural domains remains unchanged at version 1.75. In the last 4 years SUPERFAMILY has more than doubled its holding of curated complete proteomes over all cellular life, from 1400 proteomes reported previously in 2010 up to 3258 at present. Outside of the main sequence collection, SUPERFAMILY continues to provide domain annotation for sequences provided by other resources such as: UniProt, Ensembl, PDB, much of JGI Phytozome and selected subcollections of NCBI RefSeq. Despite this growth in data volume, SUPERFAMILY now provides users with an expanded and daily updated phylogenetic tree of life (sTOL). This tree is built with genomic-scale domain annotation data as before, but constantly updated when new species are introduced to the sequence library. Our Gene Ontology and other functional and phenotypic annotations previously reported have stood up to critical assessment by the function prediction community. We have now introduced these data in an integrated manner online at the level of an individual sequence, and—in the case of whole genomes—with enrichment analysis against a taxonomically defined background.
Collapse
Affiliation(s)
- Matt E Oates
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | | | | | - Ben Smithers
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Owen J L Rackham
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK Medical Research Council Clinical Sciences Centre, Faculty of Medicine, Imperial College London, Hammersmith Hospital, London, UK
| | - Adam J Sardar
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK e-Therapeutics plc,17 Blenheim Office Park, Long Hanborough, Oxfordshire, OX29 8LN, UK
| | - Jan Zaucha
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK Bristol Centre for Complexity Sciences, University of Bristol, Bristol, UK
| | - Natalie Thurlby
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK Bristol Centre for Complexity Sciences, University of Bristol, Bristol, UK
| | - Hai Fang
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Julian Gough
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| |
Collapse
|
674
|
Krug K, Popic S, Carpy A, Taumer C, Macek B. Construction and assessment of individualized proteogenomic databases for large-scale analysis of nonsynonymous single nucleotide variants. Proteomics 2014; 14:2699-708. [DOI: 10.1002/pmic.201400219] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Revised: 08/02/2014] [Accepted: 09/19/2014] [Indexed: 01/08/2023]
Affiliation(s)
- Karsten Krug
- Proteome Center Tuebingen; University of Tuebingen; Germany
| | - Sasa Popic
- Proteome Center Tuebingen; University of Tuebingen; Germany
| | | | | | - Boris Macek
- Proteome Center Tuebingen; University of Tuebingen; Germany
| |
Collapse
|
675
|
Montague E, Janko I, Stanberry L, Lee E, Choiniere J, Anderson N, Stewart E, Broomall W, Higdon R, Kolker N, Kolker E. Beyond protein expression, MOPED goes multi-omics. Nucleic Acids Res 2014; 43:D1145-51. [PMID: 25404128 PMCID: PMC4383969 DOI: 10.1093/nar/gku1175] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
MOPED (Multi-Omics Profiling Expression Database; http://moped.proteinspire.org) has transitioned from solely a protein expression database to a multi-omics resource for human and model organisms. Through a web-based interface, MOPED presents consistently processed data for gene, protein and pathway expression. To improve data quality, consistency and use, MOPED includes metadata detailing experimental design and analysis methods. The multi-omics data are integrated through direct links between genes and proteins and further connected to pathways and experiments. MOPED now contains over 5 million records, information for approximately 75 000 genes and 50 000 proteins from four organisms (human, mouse, worm, yeast). These records correspond to 670 unique combinations of experiment, condition, localization and tissue. MOPED includes the following new features: pathway expression, Pathway Details pages, experimental metadata checklists, experiment summary statistics and more advanced searching tools. Advanced searching enables querying for genes, proteins, experiments, pathways and keywords of interest. The system is enhanced with visualizations for comparing across different data types. In the future MOPED will expand the number of organisms, increase integration with pathways and provide connections to disease.
Collapse
Affiliation(s)
- Elizabeth Montague
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Imre Janko
- High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Larissa Stanberry
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Elaine Lee
- High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - John Choiniere
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Nathaniel Anderson
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Elizabeth Stewart
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - William Broomall
- High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Roger Higdon
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Natali Kolker
- High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101
| | - Eugene Kolker
- Bioinformatics and High-Throughput Analysis Laboratory, Center for Developmental Therapeutics, Seattle Children's Research Institute, Seattle, WA, USA 98101 High-Throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA, USA 98101 CDO Analytics, Seattle Children's, Seattle, WA, USA 98101 Data-Enabled Life Sciences Alliance (DELSA Global), Seattle, WA, USA 98101 Departments of Biomedical Informatics and Medical Education and Pediatrics, University of Washington, Seattle, WA, USA 98109 Department of Chemistry and Chemical Biology, College of Science, Northeastern University, Boston, MA 02115
| |
Collapse
|
676
|
Huang PJ, Lee CC, Tan BCM, Yeh YM, Julie Chu L, Chen TW, Chang KP, Lee CY, Gan RC, Liu H, Tang P. CMPD: cancer mutant proteome database. Nucleic Acids Res 2014; 43:D849-55. [PMID: 25398898 PMCID: PMC4383976 DOI: 10.1093/nar/gku1182] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Whole-exome sequencing, which centres on the protein coding regions of disease/cancer associated genes, represents the most cost-effective method to-date for deciphering the association between genetic alterations and diseases. Large-scale whole exome/genome sequencing projects have been launched by various institutions, such as NCI, Broad Institute and TCGA, to provide a comprehensive catalogue of coding variants in diverse tissue samples and cell lines. Further functional and clinical interrogation of these sequence variations must rely on extensive cross-platforms integration of sequencing information and a proteome database that explicitly and comprehensively archives the corresponding mutated peptide sequences. While such data resource is a critical for the mass spectrometry-based proteomic analysis of exomic variants, no database is currently available for the collection of mutant protein sequences that correspond to recent large-scale genomic data. To address this issue and serve as bridge to integrate genomic and proteomics datasets, CMPD (http://cgbc.cgu.edu.tw/cmpd) collected over 2 millions genetic alterations, which not only facilitates the confirmation and examination of potential cancer biomarkers but also provides an invaluable resource for translational medicine research and opportunities to identify mutated proteins encoded by mutated genes.
Collapse
Affiliation(s)
- Po-Jung Huang
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan Molecular Medicine Research Center, Chang Gung University, Taoyuan 333, Taiwan
| | - Chi-Ching Lee
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan
| | | | - Yuan-Ming Yeh
- Bioinformatics Division, Tri-I Biotech, Inc., Taipei 221, Taiwan
| | - Lichieh Julie Chu
- Molecular Medicine Research Center, Chang Gung University, Taoyuan 333, Taiwan
| | - Ting-Wen Chen
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan
| | - Kai-Ping Chang
- Department of Otolaryngology, Head and Neck Surgery, Chang Gung Memorial Hospital, Lin-Kou, Taoyuan 333, Taiwan
| | - Cheng-Yang Lee
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan
| | - Ruei-Chi Gan
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan
| | - Hsuan Liu
- Department of Molecular and Cellular Biology, Chang Gung University, Taoyuan 333, Taiwan
| | - Petrus Tang
- Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan
| |
Collapse
|
677
|
dos Santos G, Schroeder AJ, Goodman JL, Strelets VB, Crosby MA, Thurmond J, Emmert DB, Gelbart WM. FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations. Nucleic Acids Res 2014; 43:D690-7. [PMID: 25398896 PMCID: PMC4383921 DOI: 10.1093/nar/gku1099] [Citation(s) in RCA: 303] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Release 6, the latest reference genome assembly of the fruit fly Drosophila melanogaster, was released by the Berkeley Drosophila Genome Project in 2014; it replaces their previous Release 5 genome assembly, which had been the reference genome assembly for over 7 years. With the enormous amount of information now attached to the D. melanogaster genome in public repositories and individual laboratories, the replacement of the previous assembly by the new one is a major event requiring careful migration of annotations and genome-anchored data to the new, improved assembly. In this report, we describe the attributes of the new Release 6 reference genome assembly, the migration of FlyBase genome annotations to this new assembly, how genome features on this new assembly can be viewed in FlyBase (http://flybase.org) and how users can convert coordinates for their own data to the corresponding Release 6 coordinates.
Collapse
Affiliation(s)
- Gilberto dos Santos
- The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Andrew J Schroeder
- The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Joshua L Goodman
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Victor B Strelets
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Madeline A Crosby
- The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - Jim Thurmond
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - David B Emmert
- The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| | - William M Gelbart
- The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA
| |
Collapse
|
678
|
Kroll JE, de Souza SJ, de Souza GA. Identification of rare alternative splicing events in MS/MS data reveals a significant fraction of alternative translation initiation sites. PeerJ 2014; 2:e673. [PMID: 25405079 PMCID: PMC4232841 DOI: 10.7717/peerj.673] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 10/30/2014] [Indexed: 01/08/2023] Open
Abstract
Integration of transcriptome data is a crucial step for the identification of rare protein variants in mass-spectrometry (MS) data with important consequences for all branches of biotechnology research. Here, we used Splooce, a database of splicing variants recently developed by us, to search MS data derived from a variety of human tumor cell lines. More than 800 new protein variants were identified whose corresponding MS spectra were specific to protein entries from Splooce. Although the types of splicing variants (exon skipping, alternative splice sites and intron retention) were found at the same frequency as in the transcriptome, we observed a large variety of modifications at the protein level induced by alternative splicing events. Surprisingly, we found that 40% of all protein modifications induced by alternative splicing led to the use of alternative translation initiation sites. Other modifications include frameshifts in the open reading frame and inclusion or deletion of peptide sequences. To make the dataset generated here available to the community in a more effective form, the Splooce portal (http://www.bioinformatics-brazil.org/splooce) was modified to report the alternative splicing events supported by MS data.
Collapse
Affiliation(s)
- José E Kroll
- Institute of Bioinformatics and Biotechnology , Natal , Brazil ; Brain Institute, UFRN , Natal , Brazil
| | | | - Gustavo A de Souza
- Department of Immunology and Centre for Immune Regulation, Oslo University Hospital HF Rikshospitalet, University of Oslo , Oslo , Norway
| |
Collapse
|
679
|
Okamura Y, Aoki Y, Obayashi T, Tadaka S, Ito S, Narise T, Kinoshita K. COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems. Nucleic Acids Res 2014; 43:D82-6. [PMID: 25392420 PMCID: PMC4383961 DOI: 10.1093/nar/gku1163] [Citation(s) in RCA: 126] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
The COXPRESdb (http://coxpresdb.jp) provides gene coexpression relationships for animal species. Here, we report the updates of the database, mainly focusing on the following two points. For the first point, we added RNAseq-based gene coexpression data for three species (human, mouse and fly), and largely increased the number of microarray experiments to nine species. The increase of the number of expression data with multiple platforms could enhance the reliability of coexpression data. For the second point, we refined the data assessment procedures, for each coexpressed gene list and for the total performance of a platform. The assessment of coexpressed gene list now uses more reasonable P-values derived from platform-specific null distribution. These developments greatly reduced pseudo-predictions for directly associated genes, thus expanding the reliability of coexpression data to design new experiments and to discuss experimental results.
Collapse
Affiliation(s)
- Yasunobu Okamura
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Yuichi Aoki
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Takeshi Obayashi
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Shu Tadaka
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Satoshi Ito
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Takafumi Narise
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan
| | - Kengo Kinoshita
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai 980-8679, Japan Institute of Development, Aging, and Cancer, Tohoku University, Sendai 980-8575, Japan Tohoku Medical Megabank Organization, Tohoku University, Sendai 980-8573, Japan
| |
Collapse
|
680
|
Li P, Liu Y, Wang H, He Y, Wang X, He Y, Lv F, Chen H, Pang X, Liu M, Shi T, Yi Z. PubAngioGen: a database and knowledge for angiogenesis and related diseases. Nucleic Acids Res 2014; 43:D963-7. [PMID: 25392416 PMCID: PMC4383947 DOI: 10.1093/nar/gku1139] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Angiogenesis is the process of generating new blood vessels based on existing ones, which is involved in many diseases including cancers, cardiovascular diseases and diabetes mellitus. Recently, great efforts have been made to explore the mechanisms of angiogenesis in various diseases and many angiogenic factors have been discovered as therapeutic targets in anti- or pro-angiogenic drug development. However, the resulted information is sparsely distributed and no systematical summarization has been made. In order to integrate these related results and facilitate the researches for the community, we conducted manual text-mining from published literature and built a database named as PubAngioGen (http://www.megabionet.org/aspd/). Our online application displays a comprehensive network for exploring the connection between angiogenesis and diseases at multilevels including protein–protein interaction, drug-target, disease-gene and signaling pathways among various cells and animal models recorded through text-mining. To enlarge the scope of the PubAngioGen application, our database also links to other common resources including STRING, DrugBank and OMIM databases, which will facilitate understanding the underlying molecular mechanisms of angiogenesis and drug development in clinical therapy.
Collapse
Affiliation(s)
- Peng Li
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Yongrui Liu
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Huan Wang
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Yuan He
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Xue Wang
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Yundong He
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Fang Lv
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Huaqing Chen
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Xiufeng Pang
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Mingyao Liu
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China Center for Cancer and Stem Cell Biology, Institute of Biosciences and Technology, Texas A&M University Health Science Center, Houston, TX 77030, USA
| | - Tieliu Shi
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| | - Zhengfang Yi
- The center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China
| |
Collapse
|
681
|
Peng X, Thierry-Mieg J, Thierry-Mieg D, Nishida A, Pipes L, Bozinoski M, Thomas MJ, Kelly S, Weiss JM, Raveendran M, Muzny D, Gibbs RA, Rogers J, Schroth GP, Katze MG, Mason CE. Tissue-specific transcriptome sequencing analysis expands the non-human primate reference transcriptome resource (NHPRTR). Nucleic Acids Res 2014; 43:D737-42. [PMID: 25392405 PMCID: PMC4383927 DOI: 10.1093/nar/gku1110] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The non-human primate reference transcriptome resource (NHPRTR, available online at http://nhprtr.org/) aims to generate comprehensive RNA-seq data from a wide variety of non-human primates (NHPs), from lemurs to hominids. In the 2012 Phase I of the NHPRTR project, 19 billion fragments or 3.8 terabases of transcriptome sequences were collected from pools of ∼20 tissues in 15 species and subspecies. Here we describe a major expansion of NHPRTR by adding 10.1 billion fragments of tissue-specific RNA-seq data. For this effort, we selected 11 of the original 15 NHP species and subspecies and constructed total RNA libraries for the same ∼15 tissues in each. The sequence quality is such that 88% of the reads align to human reference sequences, allowing us to compute the full list of expression abundance across all tissues for each species, using the reads mapped to human genes. This update also includes improved transcript annotations derived from RNA-seq data for rhesus and cynomolgus macaques, two of the most commonly used NHP models and additional RNA-seq data compiled from related projects. Together, these comprehensive reference transcriptomes from multiple primates serve as a valuable community resource for genome annotation, gene dynamics and comparative functional analysis.
Collapse
Affiliation(s)
- Xinxia Peng
- Department of Microbiology, University of Washington, Seattle, WA 98109, USA Washington National Primate Research Center, Seattle, WA 98109, USA
| | - Jean Thierry-Mieg
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Danielle Thierry-Mieg
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Andrew Nishida
- Department of Microbiology, University of Washington, Seattle, WA 98109, USA Washington National Primate Research Center, Seattle, WA 98109, USA
| | - Lenore Pipes
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY 10065, USA Institute for Computational Biology (ICB), Weill Cornell Medical College, New York, NY 10065, USA
| | - Marjan Bozinoski
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY 10065, USA Institute for Computational Biology (ICB), Weill Cornell Medical College, New York, NY 10065, USA
| | - Matthew J Thomas
- Department of Microbiology, University of Washington, Seattle, WA 98109, USA Washington National Primate Research Center, Seattle, WA 98109, USA
| | - Sara Kelly
- Department of Microbiology, University of Washington, Seattle, WA 98109, USA Washington National Primate Research Center, Seattle, WA 98109, USA
| | - Jeffrey M Weiss
- Department of Microbiology, University of Washington, Seattle, WA 98109, USA Washington National Primate Research Center, Seattle, WA 98109, USA
| | | | - Donna Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | | | - Michael G Katze
- Department of Microbiology, University of Washington, Seattle, WA 98109, USA Washington National Primate Research Center, Seattle, WA 98109, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY 10065, USA Institute for Computational Biology (ICB), Weill Cornell Medical College, New York, NY 10065, USA Feil Family Brain and Mind Research Institute (BMRI), Weill Cornell Medical College, New York, NY 10065, USA
| |
Collapse
|
682
|
Bioinformatic analysis reveals genome size reduction and the emergence of tyrosine phosphorylation site in the movement protein of New World bipartite begomoviruses. PLoS One 2014; 9:e111957. [PMID: 25383632 PMCID: PMC4226511 DOI: 10.1371/journal.pone.0111957] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Accepted: 10/09/2014] [Indexed: 11/19/2022] Open
Abstract
Begomovirus (genus Begomovirus, family Geminiviridae) infection is devastating to a wide variety of agricultural crops including tomato, squash, and cassava. Thus, understanding the replication and adaptation of begomoviruses has important translational value in alleviating substantial economic loss, particularly in developing countries. The bipartite genome of begomoviruses prevalent in the New World and their counterparts in the Old World share a high degree of genome homology except for a partially overlapping reading frame encoding the pre-coat protein (PCP, or AV2). PCP contributes to the essential functions of intercellular movement and suppression of host RNA silencing, but it is only present in the Old World viruses. In this study, we analyzed a set of non-redundant bipartite begomovirus genomes originating from the Old World (N = 28) and the New World (N = 65). Our bioinformatic analysis suggests ∼ 120 nucleotides were deleted from PCP's proximal promoter region that may have contributed to its loss in the New World viruses. Consequently, genomes of the New World viruses are smaller than the Old World counterparts, possibly compensating for the loss of the intercellular movement functions of PCP. Additionally, we detected substantial purifying selection on a portion of the New World DNA-B movement protein (MP, or BC1). Further analysis of the New World MP gene revealed the emergence of a putative tyrosine phosphorylation site, which likely explains the increased purifying selection in that region. These findings provide important information about the strategies adopted by bipartite begomoviruses in adapting to new environment and suggest future in planta experiments.
Collapse
|
683
|
Dreos R, Ambrosini G, Périer RC, Bucher P. The Eukaryotic Promoter Database: expansion of EPDnew and new promoter analysis tools. Nucleic Acids Res 2014; 43:D92-6. [PMID: 25378343 PMCID: PMC4383928 DOI: 10.1093/nar/gku1111] [Citation(s) in RCA: 207] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
We present an update of EPDNew (http://epd.vital-it.ch), a recently introduced new part of the Eukaryotic Promoter Database (EPD) which has been described in more detail in a previous NAR Database Issue. EPD is an old database of experimentally characterized eukaryotic POL II promoters, which are conceptually defined as transcription initiation sites or regions. EPDnew is a collection of automatically compiled, organism-specific promoter lists complementing the old corpus of manually compiled promoter entries of EPD. This new part is exclusively derived from next generation sequencing data from high-throughput promoter mapping experiments. We report on the recent growth of EPDnew, its extension to additional model organisms and its improved integration with other bioinformatics resources developed by our group, in particular the Signal Search Analysis and ChIP-Seq web servers.
Collapse
Affiliation(s)
- René Dreos
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland
| | - Giovanna Ambrosini
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland
| | - Rouayda Cavin Périer
- Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland
| | - Philipp Bucher
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland
| |
Collapse
|
684
|
Volders PJ, Verheggen K, Menschaert G, Vandepoele K, Martens L, Vandesompele J, Mestdagh P. An update on LNCipedia: a database for annotated human lncRNA sequences. Nucleic Acids Res 2014; 43:D174-80. [PMID: 25378313 PMCID: PMC4383901 DOI: 10.1093/nar/gku1060] [Citation(s) in RCA: 212] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The human genome is pervasively transcribed, producing thousands of non-coding RNA transcripts. The majority of these transcripts are long non-coding RNAs (lncRNAs) and novel lncRNA genes are being identified at rapid pace. To streamline these efforts, we created LNCipedia, an online repository of lncRNA transcripts and annotation. Here, we present LNCipedia 3.0 (http://www.lncipedia.org), the latest version of the publicly available human lncRNA database. Compared to the previous version of LNCipedia, the database grew over five times in size, gaining over 90,000 new lncRNA transcripts. Assessment of the protein-coding potential of LNCipedia entries is improved with state-of-the art methods that include large-scale reprocessing of publicly available proteomics data. As a result, a high-confidence set of lncRNA transcripts with low coding potential is defined and made available for download. In addition, a tool to assess lncRNA gene conservation between human, mouse and zebrafish has been implemented.
Collapse
Affiliation(s)
| | - Kenneth Verheggen
- Department of Medical Protein Research, VIB, Ghent 9000, Belgium Department of Biochemistry, Ghent University, Ghent 9000 Belgium
| | - Gerben Menschaert
- Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Ghent 9000, Belgium
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Gent 9000, Belgium Department of Plant Systems Biology, VIB, Ghent 9000, Belgium
| | - Lennart Martens
- Department of Medical Protein Research, VIB, Ghent 9000, Belgium Department of Biochemistry, Ghent University, Ghent 9000 Belgium
| | - Jo Vandesompele
- Center for Medical Genetics, Ghent University, Ghent 9000, Belgium
| | - Pieter Mestdagh
- Center for Medical Genetics, Ghent University, Ghent 9000, Belgium
| |
Collapse
|
685
|
Cato L, Neeb A, Brown M, Cato ACB. Control of steroid receptor dynamics and function by genomic actions of the cochaperones p23 and Bag-1L. NUCLEAR RECEPTOR SIGNALING 2014; 12:e005. [PMID: 25422595 PMCID: PMC4242288 DOI: 10.1621/nrs.12005] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Accepted: 09/20/2014] [Indexed: 01/23/2023]
Abstract
Molecular chaperones encompass a group of unrelated proteins that facilitate the
correct assembly and disassembly of other macromolecular structures, which they
themselves do not remain a part of. They associate with a large and diverse set
of coregulators termed cochaperones that regulate their function and
specificity. Amongst others, chaperones and cochaperones regulate the activity
of several signaling molecules including steroid receptors, which upon ligand
binding interact with discrete nucleotide sequences within the nucleus to
control the expression of diverse physiological and developmental genes.
Molecular chaperones and cochaperones are typically known to provide the correct
conformation for ligand binding by the steroid receptors. While this
contribution is widely accepted, recent studies have reported that they further
modulate steroid receptor action outside ligand binding. They are thought to
contribute to receptor turnover, transport of the receptor to different
subcellular localizations, recycling of the receptor on chromatin and even
stabilization of the DNA-binding properties of the receptor. In addition to
these combined effects with molecular chaperones, cochaperones are reported to
have additional functions that are independent of molecular chaperones. Some of
these functions also impact on steroid receptor action. Two well-studied
examples are the cochaperones p23 and Bag-1L, which have been identified as
modulators of steroid receptor activity in nuclei. Understanding details of
their regulatory action will provide new therapeutic opportunities of
controlling steroid receptor action independent of the widespread effects of
molecular chaperones.
Collapse
Affiliation(s)
- Laura Cato
- Division of Molecular and Cellular Oncology, Department of Medical Oncology and Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA (LC, MB) and Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany (AN, ACBC)
| | - Antje Neeb
- Division of Molecular and Cellular Oncology, Department of Medical Oncology and Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA (LC, MB) and Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany (AN, ACBC)
| | - Myles Brown
- Division of Molecular and Cellular Oncology, Department of Medical Oncology and Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA (LC, MB) and Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany (AN, ACBC)
| | - Andrew C B Cato
- Division of Molecular and Cellular Oncology, Department of Medical Oncology and Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA (LC, MB) and Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany (AN, ACBC)
| |
Collapse
|
686
|
Nagai Y, Takahashi Y, Imanishi T. VaDE: a manually curated database of reproducible associations between various traits and human genomic polymorphisms. Nucleic Acids Res 2014; 43:D868-72. [PMID: 25361969 PMCID: PMC4383886 DOI: 10.1093/nar/gku1037] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Genome-wide association studies (GWASs) have identified numerous single nucleotide polymorphisms (SNPs) associated with the development of common diseases. However, it is clear that genetic risk factors of common diseases are heterogeneous among human populations. Therefore, we developed a database of genomic polymorphisms that are reproducibly associated with disease susceptibilities, drug responses and other traits for each human population: 'VarySysDB Disease Edition' (VaDE; http://bmi-tokai.jp/VaDE/). SNP-trait association data were obtained from the National Human Genome Research Institute GWAS (NHGRI GWAS) catalog and RAvariome, and we added detailed information of sample populations by curating original papers. In addition, we collected and curated original papers, and registered the detailed information of SNP-trait associations in VaDE. Then, we evaluated reproducibility of associations in each population by counting the number of significantly associated studies. VaDE provides literature-based SNP-trait association data and functional genomic region annotation for SNP functional research. SNP functional annotation data included experimental data of the ENCODE project, H-InvDB transcripts and the 1000 Genome Project. A user-friendly web interface was developed to assist quick search, easy download and fast swapping among viewers. We believe that our database will contribute to the future establishment of personalized medicine and increase our understanding of genetic factors underlying diseases.
Collapse
Affiliation(s)
- Yoko Nagai
- Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Kanagawa 259-1193, Japan
| | - Yasuko Takahashi
- Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Kanagawa 259-1193, Japan
| | - Tadashi Imanishi
- Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Kanagawa 259-1193, Japan Data Management and Integration Team, Molecular Profiling Research Center for Drug Discovery, National Institute of Advanced Industrial Science and Technology, Koto-ku, Tokyo 135-0064, Japan
| |
Collapse
|
687
|
Yang X, Li M, Liu Q, Zhang Y, Qian J, Wan X, Wang A, Zhang H, Zhu C, Lu X, Mao Y, Sang X, Zhao H, Zhao Y, Zhang X. Dr.VIS v2.0: an updated database of human disease-related viral integration sites in the era of high-throughput deep sequencing. Nucleic Acids Res 2014; 43:D887-92. [PMID: 25355513 PMCID: PMC4383912 DOI: 10.1093/nar/gku1074] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Dr.VIS is a database of human disease-related viral integration sites (VIS). The number of VIS has grown rapidly since Dr.VIS was first released in 2011, and there is growing recognition of the important role that viral integration plays in the development of malignancies. The updated database version, Dr.VIS v2.0 (http://www.bioinfo.org/drvis or bminfor.tongji.edu.cn/drvis_v2), represents 25 diseases, covers 3340 integration sites of eight oncogenic viruses in human chromosomes and provides more accurate information about VIS from high-throughput deep sequencing results obtained mainly after 2012. Data of VISes for three newly identified oncogenic viruses for 14 related diseases have been added to this 2015 update, which has a 5-fold increase of VISes compared to Dr.VIS v1.0. Dr.VIS v2.0 has 2244 precise integration sites, 867 integration regions and 551 junction sequences. A total of 2295 integration sites are located near 1730 involved genes. Of the VISes, 1153 are detected in the exons or introns of genes, with 294 located up to 5 kb and a further 112 located up to 10 kb away. As viral integration may alter chromosome stability and gene expression levels, characterizing VISes will contribute toward the discovery of novel oncogenes, tumor suppressor genes and tumor-associated pathways.
Collapse
Affiliation(s)
- Xiaobo Yang
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Ming Li
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China University of Chinese Academy of Sciences, Beijing, China
| | - Qi Liu
- School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Yabing Zhang
- Otolaryngology Head and Neck Surgery Department, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Junyan Qian
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Xueshuai Wan
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Anqiang Wang
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Haohai Zhang
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Chengpei Zhu
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Xin Lu
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Yilei Mao
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Xinting Sang
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Haitao Zhao
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China
| | - Yi Zhao
- Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS & PUMC), Beijing, China Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
| | - Xiaoyan Zhang
- School of Life Sciences and Technology, Tongji University, Shanghai, China
| |
Collapse
|
688
|
Brown GR, Hem V, Katz KS, Ovetsky M, Wallin C, Ermolaeva O, Tolstoy I, Tatusova T, Pruitt KD, Maglott DR, Murphy TD. Gene: a gene-centered information resource at NCBI. Nucleic Acids Res 2014; 43:D36-42. [PMID: 25355515 DOI: 10.1093/nar/gku1055] [Citation(s) in RCA: 431] [Impact Index Per Article: 43.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
The National Center for Biotechnology Information's (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP.
Collapse
Affiliation(s)
- Garth R Brown
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Vichet Hem
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Kenneth S Katz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Michael Ovetsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Craig Wallin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Olga Ermolaeva
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Igor Tolstoy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Tatiana Tatusova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Donna R Maglott
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510, USA
| |
Collapse
|
689
|
Petrov AI, Kay SJE, Gibson R, Kulesha E, Staines D, Bruford EA, Wright MW, Burge S, Finn RD, Kersey PJ, Cochrane G, Bateman A, Griffiths-Jones S, Harrow J, Chan PP, Lowe TM, Zwieb CW, Wower J, Williams KP, Hudson CM, Gutell R, Clark MB, Dinger M, Quek XC, Bujnicki JM, Chua NH, Liu J, Wang H, Skogerbø G, Zhao Y, Chen R, Zhu W, Cole JR, Chai B, Huang HD, Huang HY, Cherry JM, Hatzigeorgiou A, Pruitt KD. RNAcentral: an international database of ncRNA sequences. Nucleic Acids Res 2014; 43:D123-9. [PMID: 25352543 PMCID: PMC4384043 DOI: 10.1093/nar/gku991] [Citation(s) in RCA: 86] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
The field of non-coding RNA biology has been hampered by the lack of availability of a
comprehensive, up-to-date collection of accessioned RNA sequences. Here we present the
first release of RNAcentral, a database that collates and integrates information from an
international consortium of established RNA sequence databases. The initial release
contains over 8.1 million sequences, including representatives of all major functional
classes. A web portal (http://rnacentral.org) provides free access to data, search functionality,
cross-references, source code and an integrated genome browser for selected species.
Collapse
|
690
|
Omer WH, Narita A, Hosomichi K, Mitsunaga S, Hayashi Y, Yamashita A, Krasniqi A, Iwasaki Y, Kimura M, Inoue I. Genome-wide linkage and exome analyses identify variants of HMCN1 for splenic epidermoid cyst. BMC MEDICAL GENETICS 2014; 15:115. [PMID: 25338956 PMCID: PMC4258954 DOI: 10.1186/s12881-014-0115-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Accepted: 10/03/2014] [Indexed: 12/30/2022]
Abstract
BACKGROUND Splenic epidermoid cyst is a benign tumor-like lesion affecting the spleen and sometimes occurs in familial form. The causality of such rare diseases remain challenging, however recently, with the emergence of exome re-sequencing, the genetics of many diseases have been unveiled. In the present study, we performed a combinatorial approach of genome-wide parametric linkage and exome analyses for a moderate-sized Japanese family with frequent occurrence of splenic epidermoid cyst to identify the genetic causality of the disease. METHODS Twelve individuals from the family were subject to SNP typing and exome re-sequencing was done for 8 family members and 4 unrelated patients from Kosovo. Linkage was estimated using multi-point parametric linkage analysis assuming a dominant mode of inheritance. All of the candidate variants from exome analysis were confirmed by direct sequencing. RESULTS The parametric linkage analysis suggested two loci on 1q and 14q with a maximal LOD score of 2.5 . Exome generated variants were prioritized based on; impact on the protein coding sequence, novelty or rareness in public databases, and position within the linkage loci. This approach identified three variants; variants of HMCN1 and CNTN2 on 1q and a variant of DDHD1 on 14q. The variant of HMCN1 (p.R5205H) showed the best co-segregation in the family after validation with Sanger sequencing. Additionally, rare missense variants (p.A4704V, p.T5004I, and p.H5244Q) were detected in three unrelated Kosovo patients. The identified variants of HMCN1 are on conserved domains, particularly the two variants on calcium-binding epidermal growth factor domain. CONCLUSIONS The present study, by combining linkage and exome analyses, identified HMCN1 as a genetic causality of splenic epidermoid cyst. Understanding the biology of the disease is a key step toward developing innovative approaches of intervention.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Ituro Inoue
- Division of Human Genetics, National Institute of Genetics, The Graduate University for Advanced Studies (SOKENDAI), Yata 1111, Mishima 411-8540, Shizuoka, Japan.
| |
Collapse
|
691
|
Abstract
Identifying sequence variants that play a mechanistic role in human disease and other phenotypes is a fundamental goal in human genetics and will be important in translating the results of variation studies. Experimental validation to confirm that a variant causes the biochemical changes responsible for a given disease or phenotype is considered the gold standard, but this cannot currently be applied to the 3 million or so variants expected in an individual genome. This has prompted the development of a wide variety of computational approaches that use several different sources of information to identify functional variation. Here, we review and assess the limitations of computational techniques for categorizing variants according to functional classes, prioritizing variants for experimental follow-up and generating hypotheses about the possible molecular mechanisms to inform downstream experiments. We discuss the main current bioinformatics approaches to identifying functional variation, including widely used algorithms for coding variation such as SIFT and PolyPhen and also novel techniques for interpreting variation across the genome.
Collapse
Affiliation(s)
- Graham RS Ritchie
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD UK
- />Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA UK
| | - Paul Flicek
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD UK
- />Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA UK
| |
Collapse
|
692
|
Du X, Gertz EM, Wojtowicz D, Zhabinskaya D, Levens D, Benham CJ, Schäffer AA, Przytycka TM. Potential non-B DNA regions in the human genome are associated with higher rates of nucleotide mutation and expression variation. Nucleic Acids Res 2014; 42:12367-79. [PMID: 25336616 PMCID: PMC4227770 DOI: 10.1093/nar/gku921] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
While individual non-B DNA structures have been shown to impact gene expression, their broad regulatory role remains elusive. We utilized genomic variants and expression quantitative trait loci (eQTL) data to analyze genome-wide variation propensities of potential non-B DNA regions and their relation to gene expression. Independent of genomic location, these regions were enriched in nucleotide variants. Our results are consistent with previously observed mutagenic properties of these regions and counter a previous study concluding that G-quadruplex regions have a reduced frequency of variants. While such mutagenicity might undermine functionality of these elements, we identified in potential non-B DNA regions a signature of negative selection. Yet, we found a depletion of eQTL-associated variants in potential non-B DNA regions, opposite to what might be expected from their proposed regulatory role. However, we also observed that genes downstream of potential non-B DNA regions showed higher expression variation between individuals. This coupling between mutagenicity and tolerance for expression variability of downstream genes may be a result of evolutionary adaptation, which allows reconciling mutagenicity of non-B DNA structures with their location in functionally important regions and their potential regulatory role.
Collapse
Affiliation(s)
- Xiangjun Du
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - E Michael Gertz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Damian Wojtowicz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Dina Zhabinskaya
- Laboratory of Pathology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - David Levens
- UC Davis Genome Center, University of California Davis, Davis, CA 95616, USA
| | - Craig J Benham
- Laboratory of Pathology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Alejandro A Schäffer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Teresa M Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
693
|
Garson K, Vanderhyden BC. Epithelial ovarian cancer stem cells: underlying complexity of a simple paradigm. Reproduction 2014; 149:R59-70. [PMID: 25301968 DOI: 10.1530/rep-14-0234] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The lack of significant progress in the treatment of epithelial ovarian cancer (EOC) underscores the need to gain a better understanding of the processes that lead to chemoresistance and recurrence. The cancer stem cell (CSC) hypothesis offers an attractive explanation of how a subpopulation of cells within a patient's tumour might remain refractory to treatment and subsequently form the basis of recurrent chemoresistant disease. This review examines the literature defining somatic stem cells of the ovary and fallopian tube, two tissues that give rise to EOC. In addition, considerable research has been reviewed, that has identified subpopulations of EOC cells, based on marker expression (CD133, CD44, CD117, CD24, epithelial cell adhesion molecule, LY6A, ALDH1 and side population (SP)), which are enriched for tumour initiating cells (TICs). While many studies identified either CD133 or CD44 as markers useful for enriching for TICs, there is little consensus. This suggests that EOC cells may have a phenotypic plasticity that may preclude the identification of universal markers defining a CSC. The assay that forms the basis of quantifying TICs is the xenograft assay. Considerable controversy surrounds the xenograft assay and it is essential that some of the potential limitations be examined in this review. Highlighting such limitations or weaknesses is required to properly evaluate data and broaden our interpretation of potential mechanisms that might be contributing to the pathogenesis of ovarian cancer.
Collapse
Affiliation(s)
- Kenneth Garson
- Ottawa Hospital Research InstituteCentre for Cancer Therapeutics, Ottawa, Ontario, Canada K1H 8L6Department of Cellular and Molecular MedicineFaculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada K1H 8M5
| | - Barbara C Vanderhyden
- Ottawa Hospital Research InstituteCentre for Cancer Therapeutics, Ottawa, Ontario, Canada K1H 8L6Department of Cellular and Molecular MedicineFaculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada K1H 8M5 Ottawa Hospital Research InstituteCentre for Cancer Therapeutics, Ottawa, Ontario, Canada K1H 8L6Department of Cellular and Molecular MedicineFaculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada K1H 8M5
| |
Collapse
|
694
|
Nguyen H, Maier J, Huang H, Perrone V, Simmerling C. Folding simulations for proteins with diverse topologies are accessible in days with a physics-based force field and implicit solvent. J Am Chem Soc 2014; 136:13959-62. [PMID: 25255057 PMCID: PMC4195377 DOI: 10.1021/ja5032776] [Citation(s) in RCA: 178] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The millisecond time scale needed for molecular dynamics simulations to approach the quantitative study of protein folding is not yet routine. One approach to extend the simulation time scale is to perform long simulations on specialized and expensive supercomputers such as Anton. Ideally, however, folding simulations would be more economical while retaining reasonable accuracy, and provide feedback on structure, stability and function rapidly enough if partnered directly with experiment. Approaches to this problem typically involve varied compromises between accuracy, precision, and cost; the goal here is to address whether simple implicit solvent models have become sufficiently accurate for their weaknesses to be offset by their ability to rapidly provide much more precise conformational data as compared to explicit solvent. We demonstrate that our recently developed physics-based model performs well on this challenge, enabling accurate all-atom simulated folding for 16 of 17 proteins with a variety of sizes, secondary structure, and topologies. The simulations were carried out using the Amber software on inexpensive GPUs, providing ∼1 μs/day per GPU, and >2.5 ms data presented here. We also show that native conformations are preferred over misfolded structures for 14 of the 17 proteins. For the other 3, misfolded structures are thermodynamically preferred, suggesting opportunities for further improvement.
Collapse
Affiliation(s)
- Hai Nguyen
- Department of Chemistry, ‡Laufer Center for Physical and Quantitative Biology and §Graduate Program in Biochemistry and Structural Biology, Stony Brook University , Stony Brook, New York 11794-5252, United States
| | | | | | | | | |
Collapse
|
695
|
Ruiz-Orera J, Messeguer X, Subirana JA, Alba MM. Long non-coding RNAs as a source of new peptides. eLife 2014; 3:e03523. [PMID: 25233276 PMCID: PMC4359382 DOI: 10.7554/elife.03523] [Citation(s) in RCA: 380] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 08/11/2014] [Indexed: 12/11/2022] Open
Abstract
Deep transcriptome sequencing has revealed the existence of many transcripts that lack long or conserved open reading frames (ORFs) and which have been termed long non-coding RNAs (lncRNAs). The vast majority of lncRNAs are lineage-specific and do not yet have a known function. In this study, we test the hypothesis that they may act as a repository for the synthesis of new peptides. We find that a large fraction of the lncRNAs expressed in cells from six different species is associated with ribosomes. The patterns of ribosome protection are consistent with the translation of short peptides. lncRNAs show similar coding potential and sequence constraints than evolutionary young protein coding sequences, indicating that they play an important role in de novo protein evolution. DOI:http://dx.doi.org/10.7554/eLife.03523.001 Despite the terms being largely interchangeable in modern language, ‘DNA’ and ‘gene’ do not mean the same thing. A gene is made of DNA and contains the instructions to make a protein, and it is the protein that performs the function of the gene. However, cells in the body also contain DNA that does not form genes. Far from being ‘junk’ DNA with no biological purpose; this DNA has a variety of roles, including affecting how other genes are used. To produce a protein, the DNA sequence of a gene is transcribed into an intermediate molecule called RNA, which is then translated to produce a protein. So-called long non-coding RNA (lncRNA) molecules are also transcribed from DNA, but whether these are translated to make proteins has been a subject of much debate. Indeed, the function of the vast majority of lncRNA molecules is unknown. Ruiz-Orera et al. analyzed RNA sequences collected from earlier experiments on six different species—humans, mice, fish, flies, yeast, and a plant—and found nearly 2500 as yet unstudied lncRNAs in addition to those previously identified. Many of the lncRNAs that Ruiz-Orera et al. investigated could be found lodged inside the cellular machinery used to translate RNA into proteins. Furthermore, these lncRNA molecules are oriented in the machinery as if they are primed and ready for translation, suggesting that many lncRNAs do produce proteins. However, it is unclear how many of these proteins have a useful function. Very few lncRNAs were found in more than one species, suggesting that they have evolved recently. The properties of lncRNA molecules also show many similarities with the properties of ‘young’—recently evolved—genes that are known to produce proteins. The combined findings of Ruiz-Orera et al. therefore suggest that lncRNAs are important for developing new proteins. The emergence of proteins with new functions has been an important driving force in evolution, and this work provides important clues into the first steps of this process. DOI:http://dx.doi.org/10.7554/eLife.03523.002
Collapse
Affiliation(s)
- Jorge Ruiz-Orera
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute, Universitat Pompeu Fabra, Barcelona, Spain
| | - Xavier Messeguer
- Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, Barcelona, Spain
| | - Juan Antonio Subirana
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute, Universitat Pompeu Fabra, Barcelona, Spain
| | - M Mar Alba
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute, Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
696
|
Patnaik SK, Helmberg W, Blumenfeld OO. BGMUT Database of Allelic Variants of Genes Encoding Human Blood Group Antigens. ACTA ACUST UNITED AC 2014; 41:346-51. [PMID: 25538536 DOI: 10.1159/000366108] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 05/19/2014] [Indexed: 12/30/2022]
Abstract
The Blood group antigen Gene MUTation (BGMUT) database documents variations in genes of human blood group systems. In March 2014, the database, accessible at www.ncbi.nlm.nih.gov/gv/mhc/xslcgi.cgi?cmd=bgmut, listed 1,545 alleles of 44 genes of 34 blood group systems. Besides allelic information, the BGMUT resource also presents comprehensive and current information on blood group systems. This review describes the database and notes its utility for the transfusion medicine and human genetics communities.
Collapse
Affiliation(s)
- Santosh Kumar Patnaik
- Department of Thoracic Surgery, Roswell Park Cancer Institute, Elm and Carlton Streets, Buffalo, NY, USA
| | - Wolfgang Helmberg
- Department of Blood Group Serology and Transfusion Medicine, Medical University of Graz, Graz, Austria
| | - Olga O Blumenfeld
- Department of Biochemistry, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|
697
|
Johnson S, Trost B, Long JR, Pittet V, Kusalik A. A better sequence-read simulator program for metagenomics. BMC Bioinformatics 2014; 15 Suppl 9:S14. [PMID: 25253095 PMCID: PMC4168713 DOI: 10.1186/1471-2105-15-s9-s14] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background There are many programs available for generating simulated whole-genome shotgun sequence reads. The data generated by many of these programs follow predefined models, which limits their use to the authors' original intentions. For example, many models assume that read lengths follow a uniform or normal distribution. Other programs generate models from actual sequencing data, but are limited to reads from single-genome studies. To our knowledge, there are no programs that allow a user to generate simulated data following non-parametric read-length distributions and quality profiles based on empirically-derived information from metagenomics sequencing data. Results We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine-learning approach to generate reads with lengths and quality values that closely match empirically-derived distributions. BEAR can emulate reads from various sequencing platforms, including Illumina, 454, and Ion Torrent. BEAR requires minimal user input, as it automatically determines appropriate parameter settings from user-supplied data. BEAR also uses a unique method for deriving run-specific error rates, and extracts useful statistics from the metagenomic data itself, such as quality-error models. Many existing simulators are specific to a particular sequencing technology; however, BEAR is not restricted in this way. Because of its flexibility, BEAR is particularly useful for emulating the behaviour of technologies like Ion Torrent, for which no dedicated sequencing simulators are currently available. BEAR is also the first metagenomic sequencing simulator program that automates the process of generating abundances, which can be an arduous task. Conclusions BEAR is useful for evaluating data processing tools in genomics. It has many advantages over existing comparable software, such as generating more realistic reads and being independent of sequencing technology, and has features particularly useful for metagenomics work.
Collapse
|
698
|
Fine mapping of eight psoriasis susceptibility loci. Eur J Hum Genet 2014; 23:844-53. [PMID: 25182136 DOI: 10.1038/ejhg.2014.172] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Revised: 06/03/2014] [Accepted: 06/06/2014] [Indexed: 01/04/2023] Open
Abstract
Previous studies have identified 41 independent genome-wide significant psoriasis susceptibility loci. After our first psoriasis genome-wide association study, we designed a custom genotyping array to fine-map eight genome-wide significant susceptibility loci known at that time (IL23R, IL13, IL12B, TNIP1, MHC, TNFAIP3, IL23A and RNF114) enabling genotyping of 2269 single-nucleotide polymorphisms (SNPs) in the eight loci for 2699 psoriasis cases and 2107 unaffected controls of European ancestry. We imputed these data using the latest 1000 Genome reference haplotypes, which included both indels and SNPs, to increase the marker density of the eight loci to 49 239 genetic variants. Using stepwise conditional association analysis, we identified nine independent signals distributed across six of the eight loci. In the major histocompatibility complex (MHC) region, we detected three independent signals at rs114255771 (P = 2.94 × 10(-74)), rs6924962 (P = 3.21 × 10(-19)) and rs892666 (P = 1.11 × 10(-10)). Near IL12B we detected two independent signals at rs62377586 (P = 7.42 × 10(-16)) and rs918518 (P = 3.22 × 10(-11)). Only one signal was observed in each of the TNIP1 (rs17728338; P = 4.15 × 10(-13)), IL13 (rs1295685; P = 1.65 × 10(-7)), IL23A (rs61937678; P = 1.82 × 10(-7)) and TNFAIP3 (rs642627; P = 5.90 × 10(-7)) regions. We also imputed variants for eight HLA genes and found that SNP rs114255771 yielded a more significant association than any HLA allele or amino-acid residue. Further analysis revealed that the HLA-C*06-B*57 haplotype tagged by this SNP had a significantly higher odds ratio than other HLA-C*06-bearing haplotypes. The results demonstrate allelic heterogeneity at IL12B and identify a high-risk MHC class I haplotype, consistent with the existence of multiple psoriasis effectors in the MHC.
Collapse
|
699
|
Gollin SM. Cytogenetic alterations and their molecular genetic correlates in head and neck squamous cell carcinoma: a next generation window to the biology of disease. Genes Chromosomes Cancer 2014; 53:972-90. [PMID: 25183546 DOI: 10.1002/gcc.22214] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2014] [Accepted: 08/15/2014] [Indexed: 01/14/2023] Open
Abstract
Cytogenetic alterations underlie the development of head and neck squamous cell carcinoma (HNSCC), whether tobacco and alcohol use, betel nut chewing, snuff or human papillomavirus (HPV) causes the disease. Many of the molecular genetic aberrations in HNSCC result from these cytogenetic alterations. This review presents a brief introduction to the epidemiology of HNSCC, and discusses the role of HPV in the disease, cytogenetic alterations and their frequencies in HNSCC, their molecular genetic and The Cancer Genome Atlas (TCGA) correlates, prognostic implications, and possible therapeutic considerations. The most frequent cytogenetic alterations in HNSCC are gains of 5p14-15, 8q11-12, and 20q12-13, gains or amplifications of 3q26, 7p11, 8q24, and 11q13, and losses of 3p, 4q35, 5q12, 8p23, 9p21-24, 11q14-23, 13q12-14, 18q23, and 21q22. To understand their effects on tumor cell biology and response to therapy, the cytogenetic findings in HNSCC are increasingly being examined in the context of the biochemical pathways they disrupt. The goal is to minimize morbidity and mortality from HNSCC using cytogenetic abnormalities to identify valuable diagnostic biomarkers for HNSCC, prognostic biomarkers of tumor behavior, recurrence risk, and outcome, and predictive biomarkers of therapeutic response to identify the most efficacious treatment for each individual patient's tumor, all based on a detailed understanding of the next generation biology of HNSCC.
Collapse
Affiliation(s)
- Susanne M Gollin
- Department of Human Genetics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, PA; Departments of Otolaryngology and Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA; University of Pittsburgh Cancer Institute, Pittsburgh, PA
| |
Collapse
|
700
|
Demeure K, Duriez E, Domon B, Niclou SP. PeptideManager: a peptide selection tool for targeted proteomic studies involving mixed samples from different species. Front Genet 2014; 5:305. [PMID: 25228907 PMCID: PMC4151198 DOI: 10.3389/fgene.2014.00305] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Accepted: 08/16/2014] [Indexed: 02/02/2023] Open
Abstract
The search for clinically useful protein biomarkers using advanced mass spectrometry approaches represents a major focus in cancer research. However, the direct analysis of human samples may be challenging due to limited availability, the absence of appropriate control samples, or the large background variability observed in patient material. As an alternative approach, human tumors orthotopically implanted into a different species (xenografts) are clinically relevant models that have proven their utility in pre-clinical research. Patient derived xenografts for glioblastoma have been extensively characterized in our laboratory and have been shown to retain the characteristics of the parental tumor at the phenotypic and genetic level. Such models were also found to adequately mimic the behavior and treatment response of human tumors. The reproducibility of such xenograft models, the possibility to identify their host background and perform tumor-host interaction studies, are major advantages over the direct analysis of human samples. At the proteome level, the analysis of xenograft samples is challenged by the presence of proteins from two different species which, depending on tumor size, type or location, often appear at variable ratios. Any proteomics approach aimed at quantifying proteins within such samples must consider the identification of species specific peptides in order to avoid biases introduced by the host proteome. Here, we present an in-house methodology and tool developed to select peptides used as surrogates for protein candidates from a defined proteome (e.g., human) in a host proteome background (e.g., mouse, rat) suited for a mass spectrometry analysis. The tools presented here are applicable to any species specific proteome, provided a protein database is available. By linking the information from both proteomes, PeptideManager significantly facilitates and expedites the selection of peptides used as surrogates to analyze proteins of interest.
Collapse
Affiliation(s)
- Kevin Demeure
- NorLux Neuro-Oncology Laboratory, Department of Oncology, Centre de Recherche Public de la Santé Luxembourg, Luxembourg
| | - Elodie Duriez
- LCP, Luxembourg Clinical Proteomics Center, Centre de Recherche Public de la Santé Strassen, Luxembourg
| | - Bruno Domon
- LCP, Luxembourg Clinical Proteomics Center, Centre de Recherche Public de la Santé Strassen, Luxembourg
| | - Simone P Niclou
- NorLux Neuro-Oncology Laboratory, Department of Oncology, Centre de Recherche Public de la Santé Luxembourg, Luxembourg
| |
Collapse
|