1
|
Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. THE LANCET. INFECTIOUS DISEASES 2020; 20:533-534. [PMID: 32087114 PMCID: PMC7159018 DOI: 10.1016/s1473-3099(20)30120-1] [Citation(s) in RCA: 6113] [Impact Index Per Article: 1222.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 02/13/2020] [Indexed: 02/07/2023]
|
Letter |
5 |
6113 |
2
|
The UniProt Consortium
BatemanAlexMartinMaria JesusO’DonovanClaireMagraneMicheleAlpiEmanueleAntunesRicardoBelyBenoitBingleyMarkBonillaCarlosBrittoRamonaBursteinasBorisasBye-A-JeeHemaCowleyAndrewSilvaAlan DaGiorgiMaurizio DeDoganTuncaFazziniFrancescoCastroLeyla GarciaFigueiraLuisGarmiriPenelopeGeorghiouGeorgeGonzalezDanielHatton-EllisEmmaLiWeizhongLiuWudongLopezRodrigoLuoJieLussiYvonneMacDougallAlistairNightingaleAndrewPalkaBarbaraPichlerKlemensPoggioliDiegoPundirSangyaPurezaLuisQiGuoyingRenauxAlexandreRosanoffStevenSaidiRabieSawfordTonyShypitsynaAleksandraSperettaElenaTurnerEdwardTyagiNidhiVolynkinVladimirWardellTonyWarnerKateWatkinsXavierZaruRossanaZellnerHermannXenariosIoannisBougueleretLydieBridgeAlanPouxSylvainRedaschiNicoleAimoLucilaArgoud-PuyGhislaineAuchinclossAndreaAxelsenKristianBansalParitBaratinDelphineBlatterMarie-ClaudeBoeckmannBrigitteBollemanJervenBoutetEmmanuelBreuzaLionelCasal-CasasCristinade CastroEdouardCoudertElisabethCucheBeatriceDocheMikaelDornevilDolnideDuvaudSeverineEstreicherAnneFamigliettiLiviaFeuermannMarcGasteigerElisabethGehantSebastienGerritsenVivienneGosArnaudGruaz-GumowskiNadineHinzUrsulaHuloChantalJungoFlorenceKellerGuillaumeLaraVicenteLemercierPhilippeLieberherrDamienLombardotThierryMartinXavierMassonPatrickMorgatAnneNetoTeresaNouspikelNevilaPaesanoSalvoPedruzziIvoPilboutSandrinePozzatoMonicaPruessManuelaRivoireCatherineRoechertBerndSchneiderMichelSigristChristianSonessonKarinStaehliSylvieStutzAndreSundaramShyamalaTognolliMichaelVerbregueLaureVeutheyAnne-LiseWuCathy HArighiCecilia NArminskiLeslieChenChumingChenYongxingGaravelliJohn SHuangHongzhanLaihoKatiMcGarveyPeterNataleDarren ARossKarenVinayakaC RWangQinghuaWangYuqiYehLai-SuZhangJian. UniProt: the universal protein knowledgebase. Nucleic Acids Res 2016; 45:D158-D169. [PMID: 27899622 PMCID: PMC5210571 DOI: 10.1093/nar/gkw1099] [Citation(s) in RCA: 3337] [Impact Index Per Article: 370.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Revised: 10/25/2016] [Accepted: 11/05/2016] [Indexed: 02/06/2023] Open
Abstract
The UniProt knowledgebase is a large resource of protein sequences and associated detailed annotation. The database contains over 60 million sequences, of which over half a million sequences have been curated by experts who critically review experimental and predicted data for each protein. The remainder are automatically annotated based on rule systems that rely on the expert curated knowledge. Since our last update in 2014, we have more than doubled the number of reference proteomes to 5631, giving a greater coverage of taxonomic diversity. We implemented a pipeline to remove redundant highly similar proteomes that were causing excessive redundancy in UniProt. The initial run of this pipeline reduced the number of sequences in UniProt by 47 million. For our users interested in the accessory proteomes, we have made available sets of pan proteome sequences that cover the diversity of sequences for each species that is found in its strains and sub-strains. To help interpretation of genomic variants, we provide tracks of detailed protein information for the major genome browsers. We provide a SPARQL endpoint that allows complex queries of the more than 22 billion triples of data in UniProt (http://sparql.uniprot.org/). UniProt resources can be accessed via the website at http://www.uniprot.org/.
Collapse
|
Research Support, Non-U.S. Gov't |
9 |
3337 |
3
|
Zerbino DR, Achuthan P, Akanni W, Amode M, Barrell D, Bhai J, Billis K, Cummins C, Gall A, Girón CG, Gil L, Gordon L, Haggerty L, Haskell E, Hourlier T, Izuogu OG, Janacek SH, Juettemann T, To JK, Laird MR, Lavidas I, Liu Z, Loveland JE, Maurel T, McLaren W, Moore B, Mudge J, Murphy DN, Newman V, Nuhn M, Ogeh D, Ong CK, Parker A, Patricio M, Riat HS, Schuilenburg H, Sheppard D, Sparrow H, Taylor K, Thormann A, Vullo A, Walts B, Zadissa A, Frankish A, Hunt SE, Kostadima M, Langridge N, Martin FJ, Muffato M, Perry E, Ruffier M, Staines DM, Trevanion SJ, Aken BL, Cunningham F, Yates A, Flicek P. Ensembl 2018. Nucleic Acids Res 2018; 46:D754-D761. [PMID: 29155950 PMCID: PMC5753206 DOI: 10.1093/nar/gkx1098] [Citation(s) in RCA: 1992] [Impact Index Per Article: 284.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Revised: 10/17/2017] [Accepted: 10/21/2017] [Indexed: 01/29/2023] Open
Abstract
The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.
Collapse
|
Research Support, N.I.H., Extramural |
7 |
1992 |
4
|
Shen W, Le S, Li Y, Hu F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS One 2016; 11:e0163962. [PMID: 27706213 PMCID: PMC5051824 DOI: 10.1371/journal.pone.0163962] [Citation(s) in RCA: 1640] [Impact Index Per Article: 182.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 09/16/2016] [Indexed: 11/23/2022] Open
Abstract
FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering, deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particularly efficiently, and some are only available for certain operating systems. Furthermore, the complicated installation process of required packages and running environments can render these programs less user friendly. This paper describes a cross-platform ultrafast comprehensive toolkit for FASTA/Q processing. SeqKit provides executable binary files for all major operating systems, including Windows, Linux, and Mac OSX, and can be directly used without any dependencies or pre-configurations. SeqKit demonstrates competitive performance in execution time and memory usage compared to similar tools. The efficiency and usability of SeqKit enable researchers to rapidly accomplish common FASTA/Q file manipulations. SeqKit is open source and available on Github at https://github.com/shenwei356/seqkit.
Collapse
|
Journal Article |
9 |
1640 |
5
|
Nilsson RH, Larsson KH, Taylor AF, Bengtsson-Palme J, Jeppesen TS, Schigel D, Kennedy P, Picard K, Glöckner FO, Tedersoo L, Saar I, Kõljalg U, Abarenkov K. The UNITE database for molecular identification of fungi: handling dark taxa and parallel taxonomic classifications. Nucleic Acids Res 2019; 47:D259-D264. [PMID: 30371820 PMCID: PMC6324048 DOI: 10.1093/nar/gky1022] [Citation(s) in RCA: 1582] [Impact Index Per Article: 263.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 10/11/2018] [Accepted: 10/12/2018] [Indexed: 12/12/2022] Open
Abstract
UNITE (https://unite.ut.ee/) is a web-based database and sequence management environment for the molecular identification of fungi. It targets the formal fungal barcode-the nuclear ribosomal internal transcribed spacer (ITS) region-and offers all ∼1 000 000 public fungal ITS sequences for reference. These are clustered into ∼459 000 species hypotheses and assigned digital object identifiers (DOIs) to promote unambiguous reference across studies. In-house and web-based third-party sequence curation and annotation have resulted in more than 275 000 improvements to the data over the past 15 years. UNITE serves as a data provider for a range of metabarcoding software pipelines and regularly exchanges data with all major fungal sequence databases and other community resources. Recent improvements include redesigned handling of unclassifiable species hypotheses, integration with the taxonomic backbone of the Global Biodiversity Information Facility, and support for an unlimited number of parallel taxonomic classification systems.
Collapse
|
research-article |
6 |
1582 |
6
|
The Gene Ontology Consortium. Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res 2016; 45:D331-D338. [PMID: 27899567 PMCID: PMC5210579 DOI: 10.1093/nar/gkw1108] [Citation(s) in RCA: 1374] [Impact Index Per Article: 152.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Accepted: 11/16/2016] [Indexed: 12/11/2022] Open
Abstract
The Gene Ontology (GO) is a comprehensive resource of computable knowledge regarding the functions of genes and gene products. As such, it is extensively used by the biomedical research community for the analysis of -omics and related data. Our continued focus is on improving the quality and utility of the GO resources, and we welcome and encourage input from researchers in all areas of biology. In this update, we summarize the current contents of the GO knowledgebase, and present several new features and improvements that have been made to the ontology, the annotations and the tools. Among the highlights are 1) developments that facilitate access to, and application of, the GO knowledgebase, and 2) extensions to the resource as well as increasing support for descriptions of causal models of biological systems and network biology. To learn more, visit http://geneontology.org/.
Collapse
|
Research Support, N.I.H., Extramural |
9 |
1374 |
7
|
Fornes O, Castro-Mondragon JA, Khan A, van der Lee R, Zhang X, Richmond PA, Modi BP, Correard S, Gheorghe M, Baranašić D, Santana-Garcia W, Tan G, Chèneby J, Ballester B, Parcy F, Sandelin A, Lenhard B, Wasserman WW, Mathelier A. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res 2020; 48:D87-D92. [PMID: 31701148 PMCID: PMC7145627 DOI: 10.1093/nar/gkz1001] [Citation(s) in RCA: 857] [Impact Index Per Article: 171.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 10/15/2019] [Accepted: 10/16/2019] [Indexed: 02/07/2023] Open
Abstract
JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for nematodes, 10 for insects, and 7 for fungi), and 156 PFMs were updated (125 for vertebrates, 28 for plants and 3 for insects). These new profiles represent an 18% expansion compared to the previous release. JASPAR 2020 comes with a novel collection of unvalidated TF-binding profiles for which our curators did not find orthogonal supporting evidence in the literature. This collection has a dedicated web form to engage the community in the curation of unvalidated TF-binding profiles. Moreover, we created a Q&A forum to ease the communication between the user community and JASPAR curators. Finally, we updated the genomic tracks, inference tool, and TF-binding profile similarity clusters. All the data is available through the JASPAR website, its associated RESTful API, and through the JASPAR2020 R/Bioconductor package.
Collapse
|
research-article |
5 |
857 |
8
|
Yates AD, Achuthan P, Akanni W, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, Azov AG, Bennett R, Bhai J, Billis K, Boddu S, Marugán JC, Cummins C, Davidson C, Dodiya K, Fatima R, Gall A, Giron CG, Gil L, Grego T, Haggerty L, Haskell E, Hourlier T, Izuogu OG, Janacek SH, Juettemann T, Kay M, Lavidas I, Le T, Lemos D, Martinez JG, Maurel T, McDowall M, McMahon A, Mohanan S, Moore B, Nuhn M, Oheh DN, Parker A, Parton A, Patricio M, Sakthivel MP, Abdul Salam AI, Schmitt BM, Schuilenburg H, Sheppard D, Sycheva M, Szuba M, Taylor K, Thormann A, Threadgold G, Vullo A, Walts B, Winterbottom A, Zadissa A, Chakiachvili M, Flint B, Frankish A, Hunt SE, IIsley G, Kostadima M, Langridge N, Loveland JE, Martin FJ, Morales J, Mudge JM, Muffato M, Perry E, Ruffier M, Trevanion SJ, Cunningham F, Howe KL, Zerbino DR, Flicek P. Ensembl 2020. Nucleic Acids Res 2020; 48:D682-D688. [PMID: 31691826 PMCID: PMC7145704 DOI: 10.1093/nar/gkz966] [Citation(s) in RCA: 767] [Impact Index Per Article: 153.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 10/09/2019] [Accepted: 10/10/2019] [Indexed: 12/11/2022] Open
Abstract
The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across the vertebrate subphylum and key model organisms. The Ensembl annotation pipeline is capable of integrating experimental and reference data from multiple providers into a single integrated resource. Here, we present 94 newly annotated and re-annotated genomes, bringing the total number of genomes offered by Ensembl to 227. This represents the single largest expansion of the resource since its inception. We also detail our continued efforts to improve human annotation, developments in our epigenome analysis and display, a new tool for imputing causal genes from genome-wide association studies and visualisation of variation within a 3D protein model. Finally, we present information on our new website. Both software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license) and data updates made available four times a year.
Collapse
|
Research Support, N.I.H., Extramural |
5 |
767 |
9
|
Anwyl-Irvine AL, Massonnié J, Flitton A, Kirkham N, Evershed JK. Gorilla in our midst: An online behavioral experiment builder. Behav Res Methods 2020; 52:388-407. [PMID: 31016684 PMCID: PMC7005094 DOI: 10.3758/s13428-019-01237-x] [Citation(s) in RCA: 718] [Impact Index Per Article: 143.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Behavioral researchers are increasingly conducting their studies online, to gain access to large and diverse samples that would be difficult to get in a laboratory environment. However, there are technical access barriers to building experiments online, and web browsers can present problems for consistent timing-an important issue with reaction-time-sensitive measures. For example, to ensure accuracy and test-retest reliability in presentation and response recording, experimenters need a working knowledge of programming languages such as JavaScript. We review some of the previous and current tools for online behavioral research, as well as how well they address the issues of usability and timing. We then present the Gorilla Experiment Builder (gorilla.sc), a fully tooled experiment authoring and deployment platform, designed to resolve many timing issues and make reliable online experimentation open and accessible to a wider range of technical abilities. To demonstrate the platform's aptitude for accessible, reliable, and scalable research, we administered a task with a range of participant groups (primary school children and adults), settings (without supervision, at home, and under supervision, in both schools and public engagement events), equipment (participant's own computer, computer supplied by the researcher), and connection types (personal internet connection, mobile phone 3G/4G). We used a simplified flanker task taken from the attentional network task (Rueda, Posner, & Rothbart, 2004). We replicated the "conflict network" effect in all these populations, demonstrating the platform's capability to run reaction-time-sensitive experiments. Unresolved limitations of running experiments online are then discussed, along with potential solutions and some future features of the platform.
Collapse
|
research-article |
5 |
718 |
10
|
Tang D, Chen M, Huang X, Zhang G, Zeng L, Zhang G, Wu S, Wang Y. SRplot: A free online platform for data visualization and graphing. PLoS One 2023; 18:e0294236. [PMID: 37943830 PMCID: PMC10635526 DOI: 10.1371/journal.pone.0294236] [Citation(s) in RCA: 689] [Impact Index Per Article: 344.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 10/27/2023] [Indexed: 11/12/2023] Open
Abstract
Graphics are widely used to provide summarization of complex data in scientific publications. Although there are many tools available for drawing graphics, their use is limited by programming skills, costs, and platform specificities. Here, we presented a freely accessible easy-to-use web server named SRplot that integrated more than a hundred of commonly used data visualization and graphing functions together. It can be run easily using all Web browsers and there are no strong requirements on the computing power of users' machines. With a user-friendly graphical interface, users can simply paste the contents of the input file into the text box according to the defined file format. Modification operations can be easily performed, and graphs can be generated in real-time. The resulting graphs can be easily downloaded in bitmap (PNG or TIFF) or vector (PDF or SVG) format in publication quality. The website is updated promptly and continuously. Functions in SRplot have been improved, optimized and updated depend on feedback and suggestions from users. The graphs prepared with SRplot have been featured in more than five hundred peer-reviewed publications. The SRplot web server is now freely available at http://www.bioinformatics.com.cn/SRplot.
Collapse
|
research-article |
2 |
689 |
11
|
Shen L, Shao N, Liu X, Nestler E. ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases. BMC Genomics 2014; 15:284. [PMID: 24735413 PMCID: PMC4028082 DOI: 10.1186/1471-2164-15-284] [Citation(s) in RCA: 681] [Impact Index Per Article: 61.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Accepted: 04/04/2014] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND Understanding the relationship between the millions of functional DNA elements and their protein regulators, and how they work in conjunction to manifest diverse phenotypes, is key to advancing our understanding of the mammalian genome. Next-generation sequencing technology is now used widely to probe these protein-DNA interactions and to profile gene expression at a genome-wide scale. As the cost of DNA sequencing continues to fall, the interpretation of the ever increasing amount of data generated represents a considerable challenge. RESULTS We have developed ngs.plot - a standalone program to visualize enrichment patterns of DNA-interacting proteins at functionally important regions based on next-generation sequencing data. We demonstrate that ngs.plot is not only efficient but also scalable. We use a few examples to demonstrate that ngs.plot is easy to use and yet very powerful to generate figures that are publication ready. CONCLUSIONS We conclude that ngs.plot is a useful tool to help fill the gap between massive datasets and genomic information in this era of big sequencing data.
Collapse
|
Research Support, N.I.H., Extramural |
11 |
681 |
12
|
Zhou Z, Alikhan NF, Mohamed K, Fan Y, Achtman M. The EnteroBase user's guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny, and Escherichia core genomic diversity. Genome Res 2020; 30:138-152. [PMID: 31809257 PMCID: PMC6961584 DOI: 10.1101/gr.251678.119] [Citation(s) in RCA: 605] [Impact Index Per Article: 121.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2019] [Accepted: 12/03/2019] [Indexed: 01/08/2023]
Abstract
EnteroBase is an integrated software environment that supports the identification of global population structures within several bacterial genera that include pathogens. Here, we provide an overview of how EnteroBase works, what it can do, and its future prospects. EnteroBase has currently assembled more than 300,000 genomes from Illumina short reads from Salmonella, Escherichia, Yersinia, Clostridioides, Helicobacter, Vibrio, and Moraxella and genotyped those assemblies by core genome multilocus sequence typing (cgMLST). Hierarchical clustering of cgMLST sequence types allows mapping a new bacterial strain to predefined population structures at multiple levels of resolution within a few hours after uploading its short reads. Case Study 1 illustrates this process for local transmissions of Salmonella enterica serovar Agama between neighboring social groups of badgers and humans. EnteroBase also supports single nucleotide polymorphism (SNP) calls from both genomic assemblies and after extraction from metagenomic sequences, as illustrated by Case Study 2 which summarizes the microevolution of Yersinia pestis over the last 5000 years of pandemic plague. EnteroBase can also provide a global overview of the genomic diversity within an entire genus, as illustrated by Case Study 3, which presents a novel, global overview of the population structure of all of the species, subspecies, and clades within Escherichia.
Collapse
|
research-article |
5 |
605 |
13
|
Lánczky A, Nagy Á, Bottai G, Munkácsy G, Szabó A, Santarpia L, Győrffy B. miRpower: a web-tool to validate survival-associated miRNAs utilizing expression data from 2178 breast cancer patients. Breast Cancer Res Treat 2016; 160:439-446. [PMID: 27744485 DOI: 10.1007/s10549-016-4013-7] [Citation(s) in RCA: 588] [Impact Index Per Article: 65.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 10/08/2016] [Indexed: 02/06/2023]
Abstract
PURPOSE The proper validation of prognostic biomarkers is an important clinical issue in breast cancer research. MicroRNAs (miRNAs) have emerged as a new class of promising breast cancer biomarkers. In the present work, we developed an integrated online bioinformatic tool to validate the prognostic relevance of miRNAs in breast cancer. METHODS A database was set up by searching the GEO, EGA, TCGA, and PubMed repositories to identify datasets with published miRNA expression and clinical data. Kaplan-Meier survival analysis was performed to validate the prognostic value of a set of 41 previously published survival-associated miRNAs. RESULTS All together 2178 samples from four independent datasets were integrated into the system including the expression of 1052 distinct human miRNAs. In addition, the web-tool allows for the selection of patients, which can be filtered by receptors status, lymph node involvement, histological grade, and treatments. The complete analysis tool can be accessed online at: www.kmplot.com/mirpower . We used this tool to analyze a large number of deregulated miRNAs associated with breast cancer features and outcome, and confirmed the prognostic value of 26 miRNAs. A significant correlation in three out of four datasets was validated only for miR-29c and miR-101. CONCLUSIONS In summary, we established an integrated platform capable to mine all available miRNA data to perform a survival analysis for the identification and validation of prognostic miRNA markers in breast cancer.
Collapse
|
Research Support, Non-U.S. Gov't |
9 |
588 |
14
|
Haeussler M, Zweig AS, Tyner C, Speir ML, Rosenbloom KR, Raney BJ, Lee CM, Lee BT, Hinrichs AS, Gonzalez JN, Gibson D, Diekhans M, Clawson H, Casper J, Barber GP, Haussler D, Kuhn RM, Kent W. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res 2019; 47:D853-D858. [PMID: 30407534 PMCID: PMC6323953 DOI: 10.1093/nar/gky1095] [Citation(s) in RCA: 546] [Impact Index Per Article: 91.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 10/17/2018] [Accepted: 10/19/2018] [Indexed: 01/17/2023] Open
Abstract
The UCSC Genome Browser (https://genome.ucsc.edu) is a graphical viewer for exploring genome annotations. For almost two decades, the Browser has provided visualization tools for genetics and molecular biology and continues to add new data and features. This year, we added a new tool that lets users interactively arrange existing graphing tracks into new groups. Other software additions include new formats for chromosome interactions, a ChIP-Seq peak display for track hubs and improved support for HGVS. On the annotation side, we have added gnomAD, TCGA expression, RefSeq Functional elements, GTEx eQTLs, CRISPR Guides, SNPpedia and created a 30-way primate alignment on the human genome. Nine assemblies now have RefSeq-mapped gene models.
Collapse
|
Research Support, N.I.H., Extramural |
6 |
546 |
15
|
Karczewski KJ, Weisburd B, Thomas B, Solomonson M, Ruderfer DM, Kavanagh D, Hamamsy T, Lek M, Samocha KE, Cummings BB, Birnbaum D, Daly MJ, MacArthur DG. The ExAC browser: displaying reference data information from over 60 000 exomes. Nucleic Acids Res 2017; 45:D840-D845. [PMID: 27899611 PMCID: PMC5210650 DOI: 10.1093/nar/gkw971] [Citation(s) in RCA: 538] [Impact Index Per Article: 67.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Revised: 10/09/2016] [Accepted: 10/11/2016] [Indexed: 11/30/2022] Open
Abstract
Worldwide, hundreds of thousands of humans have had their genomes or exomes sequenced, and access to the resulting data sets can provide valuable information for variant interpretation and understanding gene function. Here, we present a lightweight, flexible browser framework to display large population datasets of genetic variation. We demonstrate its use for exome sequence data from 60 706 individuals in the Exome Aggregation Consortium (ExAC). The ExAC browser provides gene- and transcript-centric displays of variation, a critical view for clinical applications. Additionally, we provide a variant display, which includes population frequency and functional annotation data as well as short read support for the called variant. This browser is open-source, freely available at http://exac.broadinstitute.org, and has already been used extensively by clinical laboratories worldwide.
Collapse
|
Research Support, N.I.H., Extramural |
8 |
538 |
16
|
Boutet E, Lieberherr D, Tognolli M, Schneider M, Bansal P, Bridge AJ, Poux S, Bougueleret L, Xenarios I. UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. Methods Mol Biol 2016; 1374:23-54. [PMID: 26519399 DOI: 10.1007/978-1-4939-3167-5_2] [Citation(s) in RCA: 515] [Impact Index Per Article: 57.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
The Universal Protein Resource (UniProt, http://www.uniprot.org ) consortium is an initiative of the SIB Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB), updated every 4 weeks, and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc).The Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot) contains publicly available expertly manually annotated protein sequences obtained from a broad spectrum of organisms. Plant protein entries are produced in the frame of the Plant Proteome Annotation Program (PPAP), with an emphasis on characterized proteins of Arabidopsis thaliana and Oryza sativa. High level annotations provided by UniProtKB/Swiss-Prot are widely used to predict annotation of newly available proteins through automatic pipelines.The purpose of this chapter is to present a guided tour of a UniProtKB/Swiss-Prot entry. We will also present some of the tools and databases that are linked to each entry.
Collapse
|
Research Support, N.I.H., Extramural |
9 |
515 |
17
|
Quang D, Xie X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res 2016; 44:e107. [PMID: 27084946 PMCID: PMC4914104 DOI: 10.1093/nar/gkw226] [Citation(s) in RCA: 454] [Impact Index Per Article: 50.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2015] [Revised: 02/27/2016] [Accepted: 03/22/2016] [Indexed: 01/19/2023] Open
Abstract
Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory 'grammar' to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ.
Collapse
|
research-article |
9 |
454 |
18
|
Gramates LS, Marygold SJ, Santos GD, Urbano JM, Antonazzo G, Matthews BB, Rey AJ, Tabone CJ, Crosby MA, Emmert DB, Falls K, Goodman JL, Hu Y, Ponting L, Schroeder AJ, Strelets VB, Thurmond J, Zhou P. FlyBase at 25: looking to the future. Nucleic Acids Res 2017; 45:D663-D671. [PMID: 27799470 PMCID: PMC5210523 DOI: 10.1093/nar/gkw1016] [Citation(s) in RCA: 407] [Impact Index Per Article: 50.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 10/14/2016] [Accepted: 10/18/2016] [Indexed: 01/12/2023] Open
Abstract
Since 1992, FlyBase (flybase.org) has been an essential online resource for the Drosophila research community. Concentrating on the most extensively studied species, Drosophila melanogaster, FlyBase includes information on genes (molecular and genetic), transgenic constructs, phenotypes, genetic and physical interactions, and reagents such as stocks and cDNAs. Access to data is provided through a number of tools, reports, and bulk-data downloads. Looking to the future, FlyBase is expanding its focus to serve a broader scientific community. In this update, we describe new features, datasets, reagent collections, and data presentations that address this goal, including enhanced orthology data, Human Disease Model Reports, protein domain search and visualization, concise gene summaries, a portal for external resources, video tutorials and the FlyBase Community Advisory Group.
Collapse
|
Research Support, N.I.H., Extramural |
8 |
407 |
19
|
Wang Y, Zhang S, Li F, Zhou Y, Zhang Y, Wang Z, Zhang R, Zhu J, Ren Y, Tan Y, Qin C, Li Y, Li X, Chen Y, Zhu F. Therapeutic target database 2020: enriched resource for facilitating research and early development of targeted therapeutics. Nucleic Acids Res 2020; 48:D1031-D1041. [PMID: 31691823 PMCID: PMC7145558 DOI: 10.1093/nar/gkz981] [Citation(s) in RCA: 402] [Impact Index Per Article: 80.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 10/10/2019] [Accepted: 10/12/2019] [Indexed: 12/12/2022] Open
Abstract
Knowledge of therapeutic targets and early drug candidates is useful for improved drug discovery. In particular, information about target regulators and the patented therapeutic agents facilitates research regarding druggability, systems pharmacology, new trends, molecular landscapes, and the development of drug discovery tools. To complement other databases, we constructed the Therapeutic Target Database (TTD) with expanded information about (i) target-regulating microRNAs and transcription factors, (ii) target-interacting proteins, and (iii) patented agents and their targets (structures and experimental activity values if available), which can be conveniently retrieved and is further enriched with regulatory mechanisms or biochemical classes. We also updated the TTD with the recently released International Classification of Diseases ICD-11 codes and additional sets of successful, clinical trial, and literature-reported targets that emerged since the last update. TTD is accessible at http://bidd.nus.edu.sg/group/ttd/ttd.asp. In case of possible web connectivity issues, two mirror sites of TTD are also constructed (http://db.idrblab.org/ttd/ and http://db.idrblab.net/ttd/).
Collapse
|
research-article |
5 |
402 |
20
|
Volders PJ, Anckaert J, Verheggen K, Nuytens J, Martens L, Mestdagh P, Vandesompele J. LNCipedia 5: towards a reference set of human long non-coding RNAs. Nucleic Acids Res 2020; 47:D135-D139. [PMID: 30371849 PMCID: PMC6323963 DOI: 10.1093/nar/gky1031] [Citation(s) in RCA: 371] [Impact Index Per Article: 74.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Accepted: 10/17/2018] [Indexed: 12/20/2022] Open
Abstract
While long non-coding RNA (lncRNA) research in the past has primarily focused on the discovery of novel genes, today it has shifted towards functional annotation of this large class of genes. With thousands of lncRNA studies published every year, the current challenge lies in keeping track of which lncRNAs are functionally described. This is further complicated by the fact that lncRNA nomenclature is not straightforward and lncRNA annotation is scattered across different resources with their own quality metrics and definition of a lncRNA. To overcome this issue, large scale curation and annotation is needed. Here, we present the fifth release of the human lncRNA database LNCipedia (https://lncipedia.org). The most notable improvements include manual literature curation of 2482 lncRNA articles and the use of official gene symbols when available. In addition, an improved filtering pipeline results in a higher quality reference lncRNA gene set.
Collapse
|
Research Support, Non-U.S. Gov't |
5 |
371 |
21
|
Wang Y, Song F, Zhang B, Zhang L, Xu J, Kuang D, Li D, Choudhary MNK, Li Y, Hu M, Hardison R, Wang T, Yue F. The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biol 2018; 19:151. [PMID: 30286773 PMCID: PMC6172833 DOI: 10.1186/s13059-018-1519-9] [Citation(s) in RCA: 360] [Impact Index Per Article: 51.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Accepted: 08/29/2018] [Indexed: 12/20/2022] Open
Abstract
Here, we introduce the 3D Genome Browser, http://3dgenome.org , which allows users to conveniently explore both their own and over 300 publicly available chromatin interaction data of different types. We design a new binary data format for Hi-C data that reduces the file size by at least a magnitude and allows users to visualize chromatin interactions over millions of base pairs within seconds. Our browser provides multiple methods linking distal cis-regulatory elements with their potential target genes. Users can seamlessly integrate thousands of other omics data to gain a comprehensive view of both regulatory landscape and 3D genome structure.
Collapse
|
Research Support, N.I.H., Extramural |
7 |
360 |
22
|
Deutsch EW, Bandeira N, Sharma V, Perez-Riverol Y, Carver JJ, Kundu DJ, García-Seisdedos D, Jarnuczak AF, Hewapathirana S, Pullman BS, Wertz J, Sun Z, Kawano S, Okuda S, Watanabe Y, Hermjakob H, MacLean B, MacCoss MJ, Zhu Y, Ishihama Y, Vizcaíno JA. The ProteomeXchange consortium in 2020: enabling 'big data' approaches in proteomics. Nucleic Acids Res 2020; 48:D1145-D1152. [PMID: 31686107 PMCID: PMC7145525 DOI: 10.1093/nar/gkz984] [Citation(s) in RCA: 356] [Impact Index Per Article: 71.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Revised: 10/11/2019] [Accepted: 10/14/2019] [Indexed: 11/24/2022] Open
Abstract
The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) has standardized data submission and dissemination of mass spectrometry proteomics data worldwide since 2012. In this paper, we describe the main developments since the previous update manuscript was published in Nucleic Acids Research in 2017. Since then, in addition to the four PX existing members at the time (PRIDE, PeptideAtlas including the PASSEL resource, MassIVE and jPOST), two new resources have joined PX: iProX (China) and Panorama Public (USA). We first describe the updated submission guidelines, now expanded to include six members. Next, with current data submission statistics, we demonstrate that the proteomics field is now actively embracing public open data policies. At the end of June 2019, more than 14 100 datasets had been submitted to PX resources since 2012, and from those, more than 9 500 in just the last three years. In parallel, an unprecedented increase of data re-use activities in the field, including 'big data' approaches, is enabling novel research and new data resources. At last, we also outline some of our future plans for the coming years.
Collapse
|
Research Support, N.I.H., Extramural |
5 |
356 |
23
|
Diesh C, Stevens GJ, Xie P, De Jesus Martinez T, Hershberg EA, Leung A, Guo E, Dider S, Zhang J, Bridge C, Hogue G, Duncan A, Morgan M, Flores T, Bimber BN, Haw R, Cain S, Buels RM, Stein LD, Holmes IH. JBrowse 2: a modular genome browser with views of synteny and structural variation. Genome Biol 2023; 24:74. [PMID: 37069644 PMCID: PMC10108523 DOI: 10.1186/s13059-023-02914-z] [Citation(s) in RCA: 341] [Impact Index Per Article: 170.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 03/20/2023] [Indexed: 04/19/2023] Open
Abstract
We present JBrowse 2, a general-purpose genome annotation browser offering enhanced visualization of complex structural variation and evolutionary relationships. It retains core features of JBrowse while adding new views for synteny, dotplots, breakpoints, gene fusions, and whole-genome overviews. It allows users to share sessions, open multiple genomes, and navigate between views. It can be embedded in a web page, used as a standalone application, or run from Jupyter notebooks or R sessions. These improvements are enabled by a ground-up redesign using modern web technology. We describe application functionality, use cases, performance benchmarks, and implementation notes for web administrators and developers.
Collapse
|
Research Support, N.I.H., Extramural |
2 |
341 |
24
|
Robinson J, Barker DJ, Georgiou X, Cooper MA, Flicek P, Marsh SGE. IPD-IMGT/HLA Database. Nucleic Acids Res 2020; 48:D948-D955. [PMID: 31667505 PMCID: PMC7145640 DOI: 10.1093/nar/gkz950] [Citation(s) in RCA: 334] [Impact Index Per Article: 66.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 10/03/2019] [Accepted: 10/29/2019] [Indexed: 11/14/2022] Open
Abstract
The IPD-IMGT/HLA Database, http://www.ebi.ac.uk/ipd/imgt/hla/, currently contains over 25 000 allele sequence for 45 genes, which are located within the Major Histocompatibility Complex (MHC) of the human genome. This region is the most polymorphic region of the human genome, and the levels of polymorphism seen exceed most other genes. Some of the genes have several thousand variants and are now termed hyperpolymorphic, rather than just simply polymorphic. The IPD-IMGT/HLA Database has provided a stable, highly accessible, user-friendly repository for this information, providing the scientific and medical community access to the many variant sequences of this gene system, that are critical for the successful outcome of transplantation. The number of currently known variants, and dramatic increase in the number of new variants being identified has necessitated a dedicated resource with custom tools for curation and publication. The challenge for the database is to continue to provide a highly curated database of sequence variants, while supporting the increased number of submissions and complexity of sequences. In order to do this, traditional methods of accessing and presenting data will be challenged, and new methods will need to be utilized to keep pace with new discoveries.
Collapse
|
research-article |
5 |
334 |
25
|
Alanis-Lobato G, Andrade-Navarro MA, Schaefer MH. HIPPIE v2.0: enhancing meaningfulness and reliability of protein-protein interaction networks. Nucleic Acids Res 2016; 45:D408-D414. [PMID: 27794551 PMCID: PMC5210659 DOI: 10.1093/nar/gkw985] [Citation(s) in RCA: 322] [Impact Index Per Article: 35.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Revised: 09/28/2016] [Accepted: 10/14/2016] [Indexed: 01/01/2023] Open
Abstract
The increasing number of experimentally detected interactions between proteins makes it difficult for researchers to extract the interactions relevant for specific biological processes or diseases. This makes it necessary to accompany the large-scale detection of protein–protein interactions (PPIs) with strategies and tools to generate meaningful PPI subnetworks. To this end, we generated the Human Integrated Protein–Protein Interaction rEference or HIPPIE (http://cbdm.uni-mainz.de/hippie/). HIPPIE is a one-stop resource for the generation and interpretation of PPI networks relevant to a specific research question. We provide means to generate highly reliable, context-specific PPI networks and to make sense out of them. We just released the second major update of HIPPIE, implementing various new features. HIPPIE grew substantially over the last years and now contains more than 270 000 confidence scored and annotated PPIs. We integrated different types of experimental information for the confidence scoring and the construction of context-specific networks. We implemented basic graph algorithms that highlight important proteins and interactions. HIPPIE's graphical interface implements several ways for wet lab and computational scientists alike to access the PPI data.
Collapse
|
Research Support, Non-U.S. Gov't |
9 |
322 |