1
|
Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA 2015; 6:11. [PMID: 26045719 PMCID: PMC4455052 DOI: 10.1186/s13100-015-0041-9] [Citation(s) in RCA: 1853] [Impact Index Per Article: 185.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 04/17/2015] [Indexed: 02/08/2023] Open
Abstract
Repbase Update (RU) is a database of representative repeat sequences in eukaryotic genomes. Since its first development as a database of human repetitive sequences in 1992, RU has been serving as a well-curated reference database fundamental for almost all eukaryotic genome sequence analyses. Here, we introduce recent updates of RU, focusing on technical issues concerning the submission and updating of Repbase entries and will give short examples of using RU data. RU sincerely invites a broader submission of repeat sequences from the research community.
Collapse
|
Journal Article |
10 |
1853 |
2
|
Jolley KA, Bray JE, Maiden MCJ. Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications. Wellcome Open Res 2018; 3:124. [PMID: 30345391 PMCID: PMC6192448 DOI: 10.12688/wellcomeopenres.14826.1] [Citation(s) in RCA: 1815] [Impact Index Per Article: 259.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/18/2018] [Indexed: 12/29/2022] Open
Abstract
The
PubMLST.org website hosts a collection of open-access, curated databases that integrate population sequence data with provenance and phenotype information for over 100 different microbial species and genera. Although the PubMLST website was conceived as part of the development of the first multi-locus sequence typing (MLST) scheme in 1998 the software it uses, the Bacterial Isolate Genome Sequence database (BIGSdb, published in 2010), enables PubMLST to include all levels of sequence data, from single gene sequences up to and including complete, finished genomes. Here we describe developments in the BIGSdb software made from publication to June 2018 and show how the platform realises microbial population genomics for a wide range of applications. The system is based on the gene-by-gene analysis of microbial genomes, with each deposited sequence annotated and curated to identify the genes present and systematically catalogue their variation. Originally intended as a means of characterising isolates with typing schemes, the synthesis of sequences and records of genetic variation with provenance and phenotype data permits highly scalable (whole genome sequence data for tens of thousands of isolates) means of addressing a wide range of functional questions, including: the prediction of antimicrobial resistance; likely cross-reactivity with vaccine antigens; and the functional activities of different variants that lead to key phenotypes. There are no limitations to the number of sequences, genetic loci, allelic variants or schemes (combinations of loci) that can be included, enabling each database to represent an expanding catalogue of the genetic variation of the population in question. In addition to providing web-accessible analyses and links to third-party analysis and visualisation tools, the BIGSdb software includes a RESTful application programming interface (API) that enables access to all the underlying data for third-party applications and data analysis pipelines.
Collapse
|
Journal Article |
7 |
1815 |
3
|
Abstract
The Gene Expression Omnibus (GEO) database is an international public repository that archives and freely distributes high-throughput gene expression and other functional genomics data sets. Created in 2000 as a worldwide resource for gene expression studies, GEO has evolved with rapidly changing technologies and now accepts high-throughput data for many other data applications, including those that examine genome methylation, chromatin structure, and genome-protein interactions. GEO supports community-derived reporting standards that specify provision of several critical study elements including raw data, processed data, and descriptive metadata. The database not only provides access to data for tens of thousands of studies, but also offers various Web-based tools and strategies that enable users to locate data relevant to their specific interests, as well as to visualize and analyze the data. This chapter includes detailed descriptions of methods to query and download GEO data and use the analysis and visualization tools. The GEO homepage is at http://www.ncbi.nlm.nih.gov/geo/.
Collapse
|
Research Support, N.I.H., Intramural |
9 |
1345 |
4
|
ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J Cheminform 2016; 8:61. [PMID: 27867422 PMCID: PMC5096306 DOI: 10.1186/s13321-016-0174-y] [Citation(s) in RCA: 754] [Impact Index Per Article: 83.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Accepted: 10/18/2016] [Indexed: 12/03/2022] Open
Abstract
Background Scientists have long been driven by the desire to describe, organize, classify, and compare objects using taxonomies and/or ontologies. In contrast to biology, geology, and many other scientific disciplines, the world of chemistry still lacks a standardized chemical ontology or taxonomy. Several attempts at chemical classification have been made; but they have mostly been limited to either manual, or semi-automated proof-of-principle applications. This is regrettable as comprehensive chemical classification and description tools could not only improve our understanding of chemistry but also improve the linkage between chemistry and many other fields. For instance, the chemical classification of a compound could help predict its metabolic fate in humans, its druggability or potential hazards associated with it, among others. However, the sheer number (tens of millions of compounds) and complexity of chemical structures is such that any manual classification effort would prove to be near impossible. Results We have developed a comprehensive, flexible, and computable, purely structure-based chemical taxonomy (ChemOnt), along with a computer program (ClassyFire) that uses only chemical structures and structural features to automatically assign all known chemical compounds to a taxonomy consisting of >4800 different categories. This new chemical taxonomy consists of up to 11 different levels (Kingdom, SuperClass, Class, SubClass, etc.) with each of the categories defined by unambiguous, computable structural rules. Furthermore each category is named using a consensus-based nomenclature and described (in English) based on the characteristic common structural properties of the compounds it contains. The ClassyFire webserver is freely accessible at http://classyfire.wishartlab.com/. Moreover, a Ruby API version is available at https://bitbucket.org/wishartlab/classyfire_api, which provides programmatic access to the ClassyFire server and database. ClassyFire has been used to annotate over 77 million compounds and has already been integrated into other software packages to automatically generate textual descriptions for, and/or infer biological properties of over 100,000 compounds. Additional examples and applications are provided in this paper. Conclusion ClassyFire, in combination with ChemOnt (ClassyFire’s comprehensive chemical taxonomy), now allows chemists and cheminformaticians to perform large-scale, rapid and automated chemical classification. Moreover, a freely accessible API allows easy access to more than 77 million “ClassyFire” classified compounds. The results can be used to help annotate well studied, as well as lesser-known compounds. In addition, these chemical classifications can be used as input for data integration, and many other cheminformatics-related tasks. Electronic supplementary material The online version of this article (doi:10.1186/s13321-016-0174-y) contains supplementary material, which is available to authorized users.
Collapse
|
Journal Article |
9 |
754 |
5
|
Zhao Z, Zhang KN, Wang Q, Li G, Zeng F, Zhang Y, Wu F, Chai R, Wang Z, Zhang C, Zhang W, Bao Z, Jiang T. Chinese Glioma Genome Atlas (CGGA): A Comprehensive Resource with Functional Genomic Data from Chinese Glioma Patients. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:1-12. [PMID: 33662628 PMCID: PMC8498921 DOI: 10.1016/j.gpb.2020.10.005] [Citation(s) in RCA: 539] [Impact Index Per Article: 134.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 10/01/2020] [Accepted: 12/26/2020] [Indexed: 11/16/2022]
Abstract
Gliomas are the most common and malignant intracranial tumors in adults. Recent studies have revealed the significance of functional genomics for glioma pathophysiological studies and treatments. However, access to comprehensive genomic data and analytical platforms is often limited. Here, we developed the Chinese Glioma Genome Atlas (CGGA), a user-friendly data portal for the storage and interactive exploration of cross-omics data, including nearly 2000 primary and recurrent glioma samples from Chinese cohort. Currently, open access is provided to whole-exome sequencing data (286 samples), mRNA sequencing (1018 samples) and microarray data (301 samples), DNA methylation microarray data (159 samples), and microRNA microarray data (198 samples), and to detailed clinical information (age, gender, chemoradiotherapy status, WHO grade, histological type, critical molecular pathological information, and survival data). In addition, we have developed several tools for users to analyze the mutation profiles, mRNA/microRNA expression, and DNA methylation profiles, and to perform survival and gene correlation analyses of specific glioma subtypes. This database removes the barriers for researchers, providing rapid and convenient access to high‐quality functional genomic data resources for biological studies and clinical applications. CGGA is available at http://www.cgga.org.cn.
Collapse
|
Journal Article |
4 |
539 |
6
|
Naas T, Oueslati S, Bonnin RA, Dabos ML, Zavala A, Dortet L, Retailleau P, Iorga BI. Beta-lactamase database (BLDB) - structure and function. J Enzyme Inhib Med Chem 2017; 32:917-919. [PMID: 28719998 PMCID: PMC6445328 DOI: 10.1080/14756366.2017.1344235] [Citation(s) in RCA: 386] [Impact Index Per Article: 48.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Revised: 06/08/2017] [Accepted: 06/11/2017] [Indexed: 11/25/2022] Open
Abstract
Beta-Lactamase Database (BLDB) is a comprehensive, manually curated public resource providing up-to-date structural and functional information focused on this superfamily of enzymes with a great impact on antibiotic resistance. All the enzymes reported and characterised in the literature are presented according to the class (A, B, C and D), family and subfamily to which they belong. All three-dimensional structures of β-lactamases present in the Protein Data Bank are also shown. The characterisation of representative mutants and hydrolytic profiles (kinetics) completes the picture and altogether these four elements constitute the essential foundation for a better understanding of the structure-function relationship within this enzymes family. BLDB can be queried using different protein- and nucleotide-based BLAST searches, which represents a key feature of particular importance in the context of the surveillance of the evolution of the antibiotic resistance. BLDB is available online at http://bldb.eu without any registration and supports all modern browsers.
Collapse
|
research-article |
8 |
386 |
7
|
Shang Q, Yang Z, Jia R, Ge S. The novel roles of circRNAs in human cancer. Mol Cancer 2019; 18:6. [PMID: 30626395 PMCID: PMC6325800 DOI: 10.1186/s12943-018-0934-6] [Citation(s) in RCA: 378] [Impact Index Per Article: 63.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 12/27/2018] [Indexed: 01/16/2023] Open
Abstract
Covalently closed single-stranded circular RNAs (circRNAs) consist of introns or exons and are widely present in eukaryotic cells. CircRNAs generally have low expression levels and relatively stable structures compared with messenger RNAs (mRNAs), most of which are located in the cytoplasm and often act in cell type and tissue-specific manners, indicating that they may serve as novel biomarkers. In recent years, circRNAs have gradually become a hotspot in the field of RNA and cancer research, but the functions of most circRNAs have not yet been discovered. Known circRNAs can affect the biogenesis of cancers in diverse ways, such as functioning as a microRNA (miRNA) sponges, combining with RNA binding proteins (RBPs), working as a transcription factor and translation of proteins. In this review, we summarize the characteristics and types of circRNAs, introduce the biogenesis of circRNAs, discuss the emerging functions and databases on circRNAs and present the current challenges of circRNAs studies.
Collapse
|
Review |
6 |
378 |
8
|
Liu X, Li C, Mou C, Dong Y, Tu Y. dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs. Genome Med 2020; 12:103. [PMID: 33261662 PMCID: PMC7709417 DOI: 10.1186/s13073-020-00803-9] [Citation(s) in RCA: 346] [Impact Index Per Article: 69.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 11/09/2020] [Indexed: 12/12/2022] Open
Abstract
Whole exome sequencing has been increasingly used in human disease studies. Prioritization based on appropriate functional annotations has been used as an indispensable step to select candidate variants. Here we present the latest updates to dbNSFP (version 4.1), a database designed to facilitate this step by providing deleteriousness prediction and functional annotation for all potential nonsynonymous and splice-site SNVs (a total of 84,013,093) in the human genome. The current version compiled 36 deleteriousness prediction scores, including 12 transcript-specific scores, and other variant and gene-level functional annotations. The database is available at http://database.liulab.science/dbNSFP with a downloadable version and a web-service.
Collapse
|
Research Support, N.I.H., Extramural |
5 |
346 |
9
|
Lin LY, Warren-Gash C, Smeeth L, Chen PC. Data resource profile: the National Health Insurance Research Database (NHIRD). Epidemiol Health 2018; 40:e2018062. [PMID: 30727703 PMCID: PMC6367203 DOI: 10.4178/epih.e2018062] [Citation(s) in RCA: 313] [Impact Index Per Article: 44.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 12/27/2018] [Indexed: 12/11/2022] Open
Abstract
Electronic health records (EHRs) can provide researchers with extraordinary opportunities for population-based research. The National Health Insurance system of Taiwan was established in 1995 and covers more than 99.6% of the Taiwanese population; this system’s claims data are released as the National Health Insurance Research Database (NHIRD). All data from primary outpatient departments and inpatient hospital care settings after 2000 are included in this database. After a change and update in 2016, the NHIRD is maintained and regulated by the Data Science Centre of the Ministry of Health and Welfare of Taiwan. Datasets for approved research are released in three forms: sampling datasets comprising 2 million subjects, disease-specific databases, and full population datasets. These datasets are de-identified and contain basic demographic information, disease diagnoses, prescriptions, operations, and investigations. Data can be linked to government surveys or other research datasets. While only a small number of validation studies with small sample sizes have been undertaken, they have generally reported positive predictive values of over 70% for various diagnoses. Currently, patients cannot opt out of inclusion in the database, although this requirement is under review. In conclusion, the NHIRD is a large, powerful data source for biomedical research.
Collapse
|
Review |
7 |
313 |
10
|
Zhang Q, Liu W, Zhang HM, Xie GY, Miao YR, Xia M, Guo AY. hTFtarget: A Comprehensive Database for Regulations of Human Transcription Factors and Their Targets. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 18:120-128. [PMID: 32858223 PMCID: PMC7647694 DOI: 10.1016/j.gpb.2019.09.006] [Citation(s) in RCA: 277] [Impact Index Per Article: 55.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Revised: 08/17/2019] [Accepted: 10/23/2019] [Indexed: 01/07/2023]
Abstract
Transcription factors (TFs) as key regulators play crucial roles in biological processes. The identification of TF–target regulatory relationships is a key step for revealing functions of TFs and their regulations on gene expression. The accumulated data of chromatin immunoprecipitation sequencing (ChIP-seq) provide great opportunities to discover the TF–target regulations across different conditions. In this study, we constructed a database named hTFtarget, which integrated huge human TF target resources (7190 ChIP-seq samples of 659 TFs and high-confidence binding sites of 699 TFs) and epigenetic modification information to predict accurate TF–target regulations. hTFtarget offers the following functions for users to explore TF–target regulations: (1) browse or search general targets of a query TF across datasets; (2) browse TF–target regulations for a query TF in a specific dataset or tissue; (3) search potential TFs for a given target gene or non-coding RNA; (4) investigate co-association between TFs in cell lines; (5) explore potential co-regulations for given target genes or TFs; (6) predict candidate TF binding sites on given DNA sequences; (7) visualize ChIP-seq peaks for different TFs and conditions in a genome browser. hTFtarget provides a comprehensive, reliable and user-friendly resource for exploring human TF–target regulations, which will be very useful for a wide range of users in the TF and gene expression regulation community. hTFtarget is available at http://bioinfo.life.hust.edu.cn/hTFtarget.
Collapse
|
Research Support, Non-U.S. Gov't |
5 |
277 |
11
|
Primpke S, Wirth M, Lorenz C, Gerdts G. Reference database design for the automated analysis of microplastic samples based on Fourier transform infrared (FTIR) spectroscopy. Anal Bioanal Chem 2018; 410:5131-5141. [PMID: 29978249 PMCID: PMC6113679 DOI: 10.1007/s00216-018-1156-x] [Citation(s) in RCA: 267] [Impact Index Per Article: 38.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Revised: 04/20/2018] [Accepted: 05/18/2018] [Indexed: 11/30/2022]
Abstract
The identification of microplastics becomes increasingly challenging
with decreasing particle size and increasing sample heterogeneity. The analysis of
microplastic samples by Fourier transform infrared (FTIR) spectroscopy is a
versatile, bias-free tool to succeed at this task. In this study, we provide an
adaptable reference database, which can be applied to single-particle identification
as well as methods like chemical imaging based on FTIR microscopy. The large
datasets generated by chemical imaging can be further investigated by automated
analysis, which does, however, require a carefully designed database. The novel
database design is based on the hierarchical cluster analysis of reference spectra
in the spectral range from 3600 to 1250 cm−1. The hereby
generated database entries were optimized for the automated analysis software with
defined reference datasets. The design was further tested for its customizability
with additional entries. The final reference database was extensively tested on
reference datasets and environmental samples. Data quality by means of correct
particle identification and depiction significantly increased compared to that of
previous databases, proving the applicability of the concept and highlighting the
importance of this work. Our novel database provides a reference point for data
comparison with future and previous microplastic studies that are based on different
databases.
|
Journal Article |
7 |
267 |
12
|
Sorokina M, Merseburger P, Rajan K, Yirik MA, Steinbeck C. COCONUT online: Collection of Open Natural Products database. J Cheminform 2021; 13:2. [PMID: 33423696 PMCID: PMC7798278 DOI: 10.1186/s13321-020-00478-9] [Citation(s) in RCA: 237] [Impact Index Per Article: 59.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 11/23/2020] [Indexed: 12/20/2022] Open
Abstract
Natural products (NPs) are small molecules produced by living organisms with potential applications in pharmacology and other industries as many of them are bioactive. This potential raised great interest in NP research around the world and in different application fields, therefore, over the years a multiplication of generalistic and thematic NP databases has been observed. However, there is, at this moment, no online resource regrouping all known NPs in just one place, which would greatly simplify NPs research and allow computational screening and other in silico applications. In this manuscript we present the online version of the COlleCtion of Open Natural prodUcTs (COCONUT): an aggregated dataset of elucidated and predicted NPs collected from open sources and a web interface to browse, search and easily and quickly download NPs. COCONUT web is freely available at https://coconut.naturalproducts.net .
Collapse
|
research-article |
4 |
237 |
13
|
Zhu T, Liang C, Meng Z, Sun G, Meng Z, Guo S, Zhang R. CottonFGD: an integrated functional genomics database for cotton. BMC PLANT BIOLOGY 2017; 17:101. [PMID: 28595571 PMCID: PMC5465443 DOI: 10.1186/s12870-017-1039-x] [Citation(s) in RCA: 211] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Accepted: 05/18/2017] [Indexed: 05/18/2023]
Abstract
BACKGROUND Cotton (Gossypium spp.) is the most important fiber and oil crop in the world. With the emergence of huge -omics data sets, it is essential to have an integrated functional genomics database that allows worldwide users to quickly and easily fetch and visualize genomic information. Currently available cotton-related databases have some weakness in integrating multiple kinds of -omics data from multiple Gossypium species. Therefore, it is necessary to establish an integrated functional genomics database for cotton. DESCRIPTION We developed CottonFGD (Cotton Functional Genomic Database, https://cottonfgd.org ), an integrated database that includes genomic sequences, gene structural and functional annotations, genetic marker data, transcriptome data, and population genome resequencing data for all four of the sequenced Gossypium species. It consists of three interconnected modules: search, profile, and analysis. These modules make CottonFGD enable both single gene review and batch analysis with multiple kinds of -omics data and multiple species. CottonFGD also includes additional pages for data statistics, bulk data download, and a detailed user manual. CONCLUSION Equipped with specialized functional modules and modernized visualization tools, and populated with multiple kinds of -omics data, CottonFGD provides a quick and easy-to-use data analysis platform for cotton researchers worldwide.
Collapse
|
research-article |
8 |
211 |
14
|
Purps J, Siegert S, Willuweit S, Nagy M, Alves C, Salazar R, Angustia SMT, Santos LH, Anslinger K, Bayer B, Ayub Q, Wei W, Xue Y, Tyler-Smith C, Bafalluy MB, Martínez-Jarreta B, Egyed B, Balitzki B, Tschumi S, Ballard D, Court DS, Barrantes X, Bäßler G, Wiest T, Berger B, Niederstätter H, Parson W, Davis C, Budowle B, Burri H, Borer U, Koller C, Carvalho EF, Domingues PM, Chamoun WT, Coble MD, Hill CR, Corach D, Caputo M, D'Amato ME, Davison S, Decorte R, Larmuseau MHD, Ottoni C, Rickards O, Lu D, Jiang C, Dobosz T, Jonkisz A, Frank WE, Furac I, Gehrig C, Castella V, Grskovic B, Haas C, Wobst J, Hadzic G, Drobnic K, Honda K, Hou Y, Zhou D, Li Y, Hu S, Chen S, Immel UD, Lessig R, Jakovski Z, Ilievska T, Klann AE, García CC, de Knijff P, Kraaijenbrink T, Kondili A, Miniati P, Vouropoulou M, Kovacevic L, Marjanovic D, Lindner I, Mansour I, Al-Azem M, Andari AE, Marino M, Furfuro S, Locarno L, Martín P, Luque GM, Alonso A, Miranda LS, Moreira H, Mizuno N, Iwashima Y, Neto RSM, Nogueira TLS, Silva R, Nastainczyk-Wulf M, Edelmann J, Kohl M, Nie S, Wang X, Cheng B, Núñez C, Pancorbo MMD, Olofsson JK, Morling N, Onofri V, Tagliabracci A, Pamjav H, Volgyi A, Barany G, Pawlowski R, Maciejewska A, Pelotti S, Pepinski W, Abreu-Glowacka M, Phillips C, Cárdenas J, Rey-Gonzalez D, Salas A, Brisighelli F, Capelli C, Toscanini U, Piccinini A, Piglionica M, Baldassarra SL, Ploski R, Konarzewska M, Jastrzebska E, Robino C, Sajantila A, Palo JU, Guevara E, Salvador J, Ungria MCD, Rodriguez JJR, Schmidt U, Schlauderer N, Saukko P, Schneider PM, Sirker M, Shin KJ, Oh YN, Skitsa I, Ampati A, Smith TG, Calvit LSD, Stenzl V, Capal T, Tillmar A, Nilsson H, Turrina S, De Leo D, Verzeletti A, Cortellini V, Wetton JH, Gwynne GM, Jobling MA, Whittle MR, Sumita DR, Wolańska-Nowak P, Yong RYY, Krawczak M, Nothnagel M, Roewer L. A global analysis of Y-chromosomal haplotype diversity for 23 STR loci. Forensic Sci Int Genet 2014; 12:12-23. [PMID: 24854874 PMCID: PMC4127773 DOI: 10.1016/j.fsigen.2014.04.008] [Citation(s) in RCA: 204] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Accepted: 04/19/2014] [Indexed: 02/05/2023]
Abstract
In a worldwide collaborative effort, 19,630 Y-chromosomes were sampled from 129 different populations in 51 countries. These chromosomes were typed for 23 short-tandem repeat (STR) loci (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385ab, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, GATAH4, DYS481, DYS533, DYS549, DYS570, DYS576, and DYS643) and using the PowerPlex Y23 System (PPY23, Promega Corporation, Madison, WI). Locus-specific allelic spectra of these markers were determined and a consistently high level of allelic diversity was observed. A considerable number of null, duplicate and off-ladder alleles were revealed. Standard single-locus and haplotype-based parameters were calculated and compared between subsets of Y-STR markers established for forensic casework. The PPY23 marker set provides substantially stronger discriminatory power than other available kits but at the same time reveals the same general patterns of population structure as other marker sets. A strong correlation was observed between the number of Y-STRs included in a marker set and some of the forensic parameters under study. Interestingly a weak but consistent trend toward smaller genetic distances resulting from larger numbers of markers became apparent.
Collapse
|
research-article |
11 |
204 |
15
|
Lian Q, Wang S, Zhang G, Wang D, Luo G, Tang J, Chen L, Gu J. HCCDB: A Database of Hepatocellular Carcinoma Expression Atlas. GENOMICS PROTEOMICS & BIOINFORMATICS 2018; 16:269-275. [PMID: 30266410 PMCID: PMC6205074 DOI: 10.1016/j.gpb.2018.07.003] [Citation(s) in RCA: 202] [Impact Index Per Article: 28.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2018] [Revised: 07/09/2018] [Accepted: 07/16/2018] [Indexed: 02/07/2023]
Abstract
Hepatocellular carcinoma (HCC) is highly heterogeneous in nature and has been one of the most common cancer types worldwide. To ensure repeatability of identified gene expression patterns and comprehensively annotate the transcriptomes of HCC, we carefully curated 15 public HCC expression datasets that cover around 4000 clinical samples and developed the database HCCDB to serve as a one-stop online resource for exploring HCC gene expression with user-friendly interfaces. The global differential gene expression landscape of HCC was established by analyzing the consistently differentially expressed genes across multiple datasets. Moreover, a 4D metric was proposed to fully characterize the expression pattern of each gene by integrating data from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx). To facilitate a comprehensive understanding of gene expression patterns in HCC, HCCDB also provides links to third-party databases on drug, proteomics, and literatures, and graphically displays the results from computational analyses, including differential expression analysis, tissue-specific and tumor-specific expression analysis, survival analysis, and co-expression analysis. HCCDB is freely accessible at http://lifeome.net/database/hccdb.
Collapse
|
Research Support, Non-U.S. Gov't |
7 |
202 |
16
|
Dong R, Ma XK, Li GW, Yang L. CIRCpedia v2: An Updated Database for Comprehensive Circular RNA Annotation and Expression Comparison. GENOMICS PROTEOMICS & BIOINFORMATICS 2018; 16:226-233. [PMID: 30172046 PMCID: PMC6203687 DOI: 10.1016/j.gpb.2018.08.001] [Citation(s) in RCA: 195] [Impact Index Per Article: 27.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2018] [Revised: 07/17/2018] [Accepted: 08/10/2018] [Indexed: 12/17/2022]
Abstract
Circular RNAs (circRNAs) from back-splicing of exon(s) have been recently identified to be broadly expressed in eukaryotes, in tissue- and species-specific manners. Although functions of most circRNAs remain elusive, some circRNAs are shown to be functional in gene expression regulation and potentially relate to diseases. Due to their stability, circRNAs can also be used as biomarkers for diagnosis. Profiling circRNAs by integrating their expression among different samples thus provides molecular basis for further functional study of circRNAs and their potential application in clinic. Here, we report CIRCpedia v2, an updated database for comprehensive circRNA annotation from over 180 RNA-seq datasets across six different species. This atlas allows users to search, browse, and download circRNAs with expression features in various cell types/tissues, including disease samples. In addition, the updated database incorporates conservation analysis of circRNAs between humans and mice. Finally, the web interface also contains computational tools to compare circRNA expression among samples. CIRCpedia v2 is accessible at http://www.picb.ac.cn/rnomics/circpedia.
Collapse
|
Research Support, Non-U.S. Gov't |
7 |
195 |
17
|
Mennes M, Biswal B, Castellanos FX, Milham MP. Making data sharing work: the FCP/INDI experience. Neuroimage 2013; 82:683-91. [PMID: 23123682 PMCID: PMC3959872 DOI: 10.1016/j.neuroimage.2012.10.064] [Citation(s) in RCA: 172] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2012] [Accepted: 10/22/2012] [Indexed: 11/26/2022] Open
Abstract
Over a decade ago, the fMRI Data Center (fMRIDC) pioneered open-access data sharing in the task-based functional neuroimaging community. Well ahead of its time, the fMRIDC effort encountered logistical, sociocultural and funding barriers that impeded the field-wise instantiation of open-access data sharing. In 2009, ambitions for open-access data sharing were revived in the resting state functional MRI community in the form of two grassroots initiatives: the 1000 Functional Connectomes Project (FCP) and its successor, the International Neuroimaging Datasharing Initiative (INDI). Beyond providing open access to thousands of clinical and non-clinical imaging datasets, the FCP and INDI have demonstrated the feasibility of large-scale data aggregation for hypothesis generation and testing. Yet, the success of the FCP and INDI should not be confused with widespread embracement of open-access data sharing. Reminiscent of the challenges faced by fMRIDC, key controversies persist and include participant privacy, the role of informatics, and the logistical and cultural challenges of establishing an open science ethos. We discuss the FCP and INDI in the context of these challenges, highlighting the promise of current initiatives and suggesting solutions for possible pitfalls.
Collapse
|
Research Support, N.I.H., Extramural |
12 |
172 |
18
|
Jones MR, Pinto E, Torres MA, Dörr F, Mazur-Marzec H, Szubert K, Tartaglione L, Dell'Aversano C, Miles CO, Beach DG, McCarron P, Sivonen K, Fewer DP, Jokela J, Janssen EML. CyanoMetDB, a comprehensive public database of secondary metabolites from cyanobacteria. WATER RESEARCH 2021; 196:117017. [PMID: 33765498 DOI: 10.1016/j.watres.2021.117017] [Citation(s) in RCA: 155] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 02/26/2021] [Accepted: 03/06/2021] [Indexed: 05/06/2023]
Abstract
Harmful cyanobacterial blooms, which frequently contain toxic secondary metabolites, are reported in aquatic environments around the world. More than two thousand cyanobacterial secondary metabolites have been reported from diverse sources over the past fifty years. A comprehensive, publically-accessible database detailing these secondary metabolites would facilitate research into their occurrence, functions and toxicological risks. To address this need we created CyanoMetDB, a highly curated, flat-file, openly-accessible database of cyanobacterial secondary metabolites collated from 850 peer-reviewed articles published between 1967 and 2020. CyanoMetDB contains 2010 cyanobacterial metabolites and 99 structurally related compounds. This has nearly doubled the number of entries with complete literature metadata and structural composition information compared to previously available open access databases. The dataset includes microcytsins, cyanopeptolins, other depsipeptides, anabaenopeptins, microginins, aeruginosins, cyclamides, cryptophycins, saxitoxins, spumigins, microviridins, and anatoxins among other metabolite classes. A comprehensive database dedicated to cyanobacterial secondary metabolites facilitates: (1) the detection and dereplication of known cyanobacterial toxins and secondary metabolites; (2) the identification of novel natural products from cyanobacteria; (3) research on biosynthesis of cyanobacterial secondary metabolites, including substructure searches; and (4) the investigation of their abundance, persistence, and toxicity in natural environments.
Collapse
|
|
4 |
155 |
19
|
Marco-Ramell A, Palau-Rodriguez M, Alay A, Tulipani S, Urpi-Sarda M, Sanchez-Pla A, Andres-Lacueva C. Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data. BMC Bioinformatics 2018; 19:1. [PMID: 29291722 PMCID: PMC5749025 DOI: 10.1186/s12859-017-2006-0] [Citation(s) in RCA: 148] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Accepted: 12/18/2017] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Bioinformatic tools for the enrichment of 'omics' datasets facilitate interpretation and understanding of data. To date few are suitable for metabolomics datasets. The main objective of this work is to give a critical overview, for the first time, of the performance of these tools. To that aim, datasets from metabolomic repositories were selected and enriched data were created. Both types of data were analysed with these tools and outputs were thoroughly examined. RESULTS An exploratory multivariate analysis of the most used tools for the enrichment of metabolite sets, based on a non-metric multidimensional scaling (NMDS) of Jaccard's distances, was performed and mirrored their diversity. Codes (identifiers) of the metabolites of the datasets were searched in different metabolite databases (HMDB, KEGG, PubChem, ChEBI, BioCyc/HumanCyc, LipidMAPS, ChemSpider, METLIN and Recon2). The databases that presented more identifiers of the metabolites of the dataset were PubChem, followed by METLIN and ChEBI. However, these databases had duplicated entries and might present false positives. The performance of over-representation analysis (ORA) tools, including BioCyc/HumanCyc, ConsensusPathDB, IMPaLA, MBRole, MetaboAnalyst, Metabox, MetExplore, MPEA, PathVisio and Reactome and the mapping tool KEGGREST, was examined. Results were mostly consistent among tools and between real and enriched data despite the variability of the tools. Nevertheless, a few controversial results such as differences in the total number of metabolites were also found. Disease-based enrichment analyses were also assessed, but they were not found to be accurate probably due to the fact that metabolite disease sets are not up-to-date and the difficulty of predicting diseases from a list of metabolites. CONCLUSIONS We have extensively reviewed the state-of-the-art of the available range of tools for metabolomic datasets, the completeness of metabolite databases, the performance of ORA methods and disease-based analyses. Despite the variability of the tools, they provided consistent results independent of their analytic approach. However, more work on the completeness of metabolite and pathway databases is required, which strongly affects the accuracy of enrichment analyses. Improvements will be translated into more accurate and global insights of the metabolome.
Collapse
|
Comparative Study |
7 |
148 |
20
|
Obayashi T, Aoki Y, Tadaka S, Kagaya Y, Kinoshita K. ATTED-II in 2018: A Plant Coexpression Database Based on Investigation of the Statistical Property of the Mutual Rank Index. PLANT & CELL PHYSIOLOGY 2018; 59:e3. [PMID: 29216398 PMCID: PMC5914358 DOI: 10.1093/pcp/pcx191] [Citation(s) in RCA: 144] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Accepted: 11/25/2017] [Indexed: 05/17/2023]
Abstract
ATTED-II (http://atted.jp) is a coexpression database for plant species to aid in the discovery of relationships of unknown genes within a species. As an advanced coexpression analysis method, multispecies comparisons have the potential to detect alterations in gene relationships within an evolutionary context. However, determining the validity of comparative coexpression studies is difficult without quantitative assessments of the quality of coexpression data. ATTED-II (version 9) provides 16 coexpression platforms for nine plant species, including seven species supported by both microarray- and RNA sequencing (RNAseq)-based coexpression data. Two independent sources of coexpression data enable the assessment of the reproducibility of coexpression. The latest coexpression data for Arabidopsis (Ath-m.c7-1 and Ath-r.c3-0) showed the highest reproducibility (Jaccard coefficient = 0.13) among previous coexpression data in ATTED-II. We also investigated the statistical basis of the mutual rank (MR) index as a coexpression measure by bootstrap sampling of experimental units. We found that the error distribution of the logit-transformed MR index showed normality with equal variances for each coexpression platform. Because the MR error was strongly correlated with the number of samples for the coexpression data, typical confidence intervals for the MR index can be estimated for any coexpression platform. These new, high-quality coexpression data can be analyzed with any tool in ATTED-II and combined with external resources to obtain insight into plant biology.
Collapse
|
research-article |
7 |
144 |
21
|
Kindu M, Schneider T, Teketay D, Knoke T. Changes of ecosystem service values in response to land use/land cover dynamics in Munessa-Shashemene landscape of the Ethiopian highlands. THE SCIENCE OF THE TOTAL ENVIRONMENT 2016; 547:137-147. [PMID: 26780139 DOI: 10.1016/j.scitotenv.2015.12.127] [Citation(s) in RCA: 144] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Revised: 12/16/2015] [Accepted: 12/24/2015] [Indexed: 05/17/2023]
Abstract
Land use/land cover (LULC) dynamics alter ecosystem services values (ESVs), yet quantitative evaluations of changes in ESVs are seldom attempted. Using Munessa-Shashemene landscape of the Ethiopian highlands as an example, we showed estimate of changes in ESVs in response to LULC dynamics over the past four decades (1973-2012). Estimation and change analyses of ESVs were conducted, mainly, by employing GIS using LULC datasets of the year 1973, 1986, 2000 and 2012 with their corresponding global value coefficients developed earlier and our own modified conservative value coefficients for the studied landscape. The results between periods revealed a decrease of total ESVs from US$ 130.5 million in 1973, to US$ 118.5, 114.8 and 111.1 million in 1986, 2000 and 2012, respectively. While using global value coefficients, the total ESVs declined from US$ 164.6 million in 1973, to US$ 135.8, 127.2 and 118.7 million in 1986, 2000 and 2012, respectively. The results from the analyses of changes in the four decades revealed a total loss of ESVs ranging from US$ 19.3 million when using our own modified value coefficients to US$ 45.9 million when employing global value coefficients. Changes have also occurred in values of individual ecosystem service functions, such as erosion control, nutrient cycling, climate regulation and water treatment, which were among the highest contributors of the total ESVs. However, the value of food production service function consistently increased during the study periods although not drastically. All in all, it must be considered a minimum estimate of ESV changes due to uncertainties in the value coefficients used in this study. We conclude that the decline of ESVs reflected the effects of ecological degradation in the studied landscape and suggest further studies to explore future options and formulate intervention strategies.
Collapse
|
|
9 |
144 |
22
|
Buerba RA, Fu MC, Gruskay JA, Long WD, Grauer JN. Obese Class III patients at significantly greater risk of multiple complications after lumbar surgery: an analysis of 10,387 patients in the ACS NSQIP database. Spine J 2014; 14:2008-18. [PMID: 24316118 DOI: 10.1016/j.spinee.2013.11.047] [Citation(s) in RCA: 130] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/05/2013] [Revised: 10/27/2013] [Accepted: 11/26/2013] [Indexed: 02/07/2023]
Abstract
BACKGROUND CONTEXT Prior studies on the impact of obesity on spine surgery outcomes have focused mostly on lumbar fusions, do not examine lumbar discectomies or decompressions, and have shown mixed results regarding complications. Differences in sample sizes and body mass index (BMI) thresholds for the definition of the obese versus comparison cohorts could account for the inconsistencies in the literature. PURPOSE The purpose of the study was to analyze whether different degrees of obesity influence the complication rates in patients undergoing lumbar spine surgery. STUDY DESIGN/SETTING This was a retrospective cohort analysis of prospectively collected data using the American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) database from 2005 to 2010. PATIENT SAMPLE Patients in the de-identified, risk-adjusted, and multi-institutional ACS NSQIP database undergoing lumbar anterior fusion, posterior fusion, transforaminal lumbar interbody fusion/posterior lumbar interbody fusion (TLIF/PLIF), discectomy, or decompression were included. OUTCOME MEASURES Primary outcome measures were 30-day postsurgical complications, including pulmonary embolism and deep vein thrombosis, death, system-specific complications (wound, pulmonary, urinary, central nervous system, and cardiac), septic complications, and having one or more complications overall. Secondary outcomes were time spent in the operating room, blood transfusions, length of stay, and reoperation within 30 days. METHODS Patients undergoing lumbar anterior fusion, posterior fusion, TLIF/PLIF, discectomy, or decompression in the ACS NSQIP, 2005 to 2010, were categorized into four BMI groups: nonobese (18.5-29.9 kg/m(2)), Obese I (30-34.9 kg/m(2)), Obese II (35-39.9 kg/m(2)), and Obese III (greater than or equal to 40 kg/m(2)). Obese I to III patients were compared with patients in the nonobese category using chi-square test and analysis of variance. Multivariate linear/logistic regression models were used to adjust for preoperative risk factors. RESULTS Data were available for 10,387 patients undergoing lumbar surgery. Of these, 4.5% underwent anterior fusion, 17.9% posterior fusion, 6.3% TLIF/PLIF, 40.7% discectomy, and 30.5% decompression. Among all patients, 25.6% were in the Obese I group, 11.5% Obese II, and 6.9% Obese III. On multivariate analysis, Obese I and III had a significantly increased risk of urinary complications, and Obese II and III patients had a significantly increased risk of wound complications. Only Obese III patients, however, had a statistically increased risk of having increased time spent in the operating room, an extended length of stay, pulmonary complications, and having one or more complications (all p<.05). CONCLUSIONS Patients with high BMI appear to have higher complication rates after lumbar surgery than patients who are nonobese. However, the complication rates seem to increase substantially for Obese III patients. These patients have longer times spent in the operating room, extended hospitals stays, and an increased risk for wound, urinary, and pulmonary complications and for having at least one or more complications overall. Surgeons should be aware of the increased risk of multiple complications for patients with BMI greater than or equal to 40 kg/m(2).
Collapse
|
|
11 |
130 |
23
|
Nierychlo M, Andersen KS, Xu Y, Green N, Jiang C, Albertsen M, Dueholm MS, Nielsen PH. MiDAS 3: An ecosystem-specific reference database, taxonomy and knowledge platform for activated sludge and anaerobic digesters reveals species-level microbiome composition of activated sludge. WATER RESEARCH 2020; 182:115955. [PMID: 32777640 DOI: 10.1016/j.watres.2020.115955] [Citation(s) in RCA: 124] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 05/13/2020] [Accepted: 05/16/2020] [Indexed: 05/13/2023]
Abstract
The function of the microbiomes in wastewater treatment systems and anaerobic digesters is dictated by the physiological activity of their members and complex interactions between them. Since functional traits are often conserved at low taxonomic ranks (genus, species, strain), high resolution taxonomic classification is crucial to understand the role of microbes in any ecosystem. Here we present MiDAS 3, a comprehensive 16S rRNA gene reference database based on full-length 16S rRNA gene amplicon sequence variants (FL-ASVs) derived from activated sludge and anaerobic digester systems in Denmark. The new database proposes unique provisional names for all unclassified microorganisms down to species level, providing a new and much-needed tool for microbiome research. The MiDAS 3 database was used to analyze the microbiome in 20 Danish wastewater treatment plants with nutrient removal, sampled over 13 years. The 50 most abundant species belonged to 42 genera, including 14 genera with provisional 'midas' name. Of those, 20 have no known function in the system, which highlights the need for more efforts towards elucidating the role of important members of wastewater treatment ecosystems. The new MiDAS 3 database also forms the backbone of the MiDAS Field Guide - an online resource linking the identity of microorganisms in wastewater treatment systems to available data related to their functional importance. The new field guide contains a complete list of genera (>1800) and species (>4200) found in activated sludge and anaerobic digesters in Denmark, but is also relevant to wastewater systems across the world. The identity of the microbes is linked to functional information, where available, and the website provides the possibility to BLAST new sequences against the MiDAS 3 database. The MiDAS Field Guide is a collaborative platform acting as an online knowledge repository, facilitating understanding of wastewater treatment ecosystem function.
Collapse
|
|
5 |
124 |
24
|
Kushner T, Serper M, Kaplan DE. Delta hepatitis within the Veterans Affairs medical system in the United States: Prevalence, risk factors, and outcomes. J Hepatol 2015; 63:586-92. [PMID: 25962883 PMCID: PMC4574953 DOI: 10.1016/j.jhep.2015.04.025] [Citation(s) in RCA: 124] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Revised: 04/24/2015] [Accepted: 04/27/2015] [Indexed: 02/07/2023]
Abstract
BACKGROUND & AIMS Low hepatitis delta prevalence estimates in the United States are likely biased due to low testing rates. The objectives of this study were to quantify the prevalence of testing and identify factors associated with hepatitis D positive status among chronic hepatitis B patients in the Veterans Health Administration. METHODS We performed a nationwide retrospective study of all veterans who tested positive for HBsAg from October 1999 to December 2013. Hepatitis D antibody testing results were used to stratify patients into three groups: HDV-positive, HDV-negative, and HDV-not tested. Demographics, comorbidities, additional laboratory data and clinical outcomes were compared across these groups of patients using standard statistical approaches. RESULTS Among 25,603 patients with a positive hepatitis B surface antigen, 2175 (8.5%) were tested for HDV; 73 (3.4%) patients tested positive. Receiving HDV testing was associated with receipt of testing for HBV, HIV, and HCV. Predictors of positive HDV results included substance abuse and cirrhosis. Fitting a predefined high-risk profile (abnormal ALT with suppressed HBV DNA titers) was strongly associated with testing positive for HDV (OR 3.2, 95%CI 1.4-7.5). Most (59%) of HDV-positive patients were HCV co-infected. HDV-positive subjects had higher risks of all-cause mortality. Incidence rates of HCC were 2.9 fold higher in HDV-positive relative to HDV-negative individuals (p=0.002). In adjusted analyses, HDV was independently associated with HCC (OR 2.1, 95%CI 1.1-3.9). CONCLUSIONS Testing rates for hepatitis delta in chronic hepatitis B patients in the United States are inappropriately low. Approaches to increase testing for HDV particularly in high-risk subsets should be explored.
Collapse
|
research-article |
10 |
124 |
25
|
Xin J, Mark A, Afrasiabi C, Tsueng G, Juchler M, Gopal N, Stupp GS, Putman TE, Ainscough BJ, Griffith OL, Torkamani A, Whetzel PL, Mungall CJ, Mooney SD, Su AI, Wu C. High-performance web services for querying gene and variant annotation. Genome Biol 2016; 17:91. [PMID: 27154141 PMCID: PMC4858870 DOI: 10.1186/s13059-016-0953-9] [Citation(s) in RCA: 122] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 04/14/2016] [Indexed: 01/18/2023] Open
Abstract
Efficient tools for data management and integration are essential for many aspects of high-throughput biology. In particular, annotations of genes and human genetic variants are commonly used but highly fragmented across many resources. Here, we describe MyGene.info and MyVariant.info, high-performance web services for querying gene and variant annotation information. These web services are currently accessed more than three million times permonth. They also demonstrate a generalizable cloud-based model for organizing and querying biological annotation information. MyGene.info and MyVariant.info are provided as high-performance web services, accessible at http://mygene.info and http://myvariant.info . Both are offered free of charge to the research community.
Collapse
|
Research Support, N.I.H., Extramural |
9 |
122 |