1
|
Ullah S, Rahman W, Ullah F, Ahmad G, Ijaz M, Gao T. DBHR: a collection of databases relevant to human research. Future Sci OA 2022; 8:FSO780. [PMID: 35251694 PMCID: PMC8890137 DOI: 10.2144/fsoa-2021-0101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 01/05/2022] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND The achievement of the human genome project provides a basis for the systematic study of the human genome from evolutionary history to disease-specific medicine. With the explosive growth of biological data, a growing number of biological databases are being established to support human-related research. OBJECTIVE The main objective of our study is to store, organize and share data in a structured and searchable manner. In short, we have planned the future development of new features in the database research area. MATERIALS & METHODS In total, we collected and integrated 680 human databases from scientific published work. Multiple options are presented for accessing the data, while original links and short descriptions are also presented for each database. RESULTS & DISCUSSION We have provided the latest collection of human research databases on a single platform with six categories: DNA database, RNA database, protein database, expression database, pathway database and disease database. CONCLUSION Taken together, our database will be useful for further human research study and will be modified over time. The database has been implemented in PHP, HTML, CSS and MySQL and is available freely at https://habdsk.org/database.php.
Collapse
Affiliation(s)
| | | | | | | | | | - Tianshun Gao
- Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen, Guangzhou, China
| |
Collapse
|
2
|
Affiliation(s)
- Mohamed Helmy
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
| | | | - Gary D. Bader
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- * E-mail:
| |
Collapse
|
3
|
Duck G, Kovacevic A, Robertson DL, Stevens R, Nenadic G. Ambiguity and variability of database and software names in bioinformatics. J Biomed Semantics 2015; 6:29. [PMID: 26131352 PMCID: PMC4485340 DOI: 10.1186/s13326-015-0026-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2013] [Accepted: 06/05/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There are numerous options available to achieve various tasks in bioinformatics, but until recently, there were no tools that could systematically identify mentions of databases and tools within the literature. In this paper we explore the variability and ambiguity of database and software name mentions and compare dictionary and machine learning approaches to their identification. RESULTS Through the development and analysis of a corpus of 60 full-text documents manually annotated at the mention level, we report high variability and ambiguity in database and software mentions. On a test set of 25 full-text documents, a baseline dictionary look-up achieved an F-score of 46 %, highlighting not only variability and ambiguity but also the extensive number of new resources introduced. A machine learning approach achieved an F-score of 63 % (with precision of 74 %) and 70 % (with precision of 83 %) for strict and lenient matching respectively. We characterise the issues with various mention types and propose potential ways of capturing additional database and software mentions in the literature. CONCLUSIONS Our analyses show that identification of mentions of databases and tools is a challenging task that cannot be achieved by relying on current manually-curated resource repositories. Although machine learning shows improvement and promise (primarily in precision), more contextual information needs to be taken into account to achieve a good degree of accuracy.
Collapse
Affiliation(s)
- Geraint Duck
- />School of Computer Science, The University of Manchester, Oxford Road, Manchester, M13 9PL UK
| | | | - David L. Robertson
- />Computational and Evolutionary Biology, Faculty of Life Sciences, The University of Manchester, Oxford Road, Manchester, M13 9PT UK
| | - Robert Stevens
- />School of Computer Science, The University of Manchester, Oxford Road, Manchester, M13 9PL UK
| | - Goran Nenadic
- />School of Computer Science, The University of Manchester, Oxford Road, Manchester, M13 9PL UK
- />Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester, M1 7DN UK
| |
Collapse
|
4
|
Duck G, Nenadic G, Brass A, Robertson DL, Stevens R. Extracting patterns of database and software usage from the bioinformatics literature. Bioinformatics 2015; 30:i601-8. [PMID: 25161253 PMCID: PMC4147923 DOI: 10.1093/bioinformatics/btu471] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION As a natural consequence of being a computer-based discipline, bioinformatics has a strong focus on database and software development, but the volume and variety of resources are growing at unprecedented rates. An audit of database and software usage patterns could help provide an overview of developments in bioinformatics and community common practice, and comparing the links between resources through time could demonstrate both the persistence of existing software and the emergence of new tools. RESULTS We study the connections between bioinformatics resources and construct networks of database and software usage patterns, based on resource co-occurrence, that correspond to snapshots of common practice in the bioinformatics community. We apply our approach to pairings of phylogenetics software reported in the literature and argue that these could provide a stepping stone into the identification of scientific best practice. AVAILABILITY AND IMPLEMENTATION The extracted resource data, the scripts used for network generation and the resulting networks are available at http://bionerds.sourceforge.net/networks/.
Collapse
Affiliation(s)
- Geraint Duck
- School of Computer Science, Manchester Institute of Biotechnology and Computational and Evolutionary Biology, Faculty of Life Sciences, The University of Manchester, Manchester M13 9PL, UK
| | - Goran Nenadic
- School of Computer Science, Manchester Institute of Biotechnology and Computational and Evolutionary Biology, Faculty of Life Sciences, The University of Manchester, Manchester M13 9PL, UK School of Computer Science, Manchester Institute of Biotechnology and Computational and Evolutionary Biology, Faculty of Life Sciences, The University of Manchester, Manchester M13 9PL, UK
| | - Andy Brass
- School of Computer Science, Manchester Institute of Biotechnology and Computational and Evolutionary Biology, Faculty of Life Sciences, The University of Manchester, Manchester M13 9PL, UK School of Computer Science, Manchester Institute of Biotechnology and Computational and Evolutionary Biology, Faculty of Life Sciences, The University of Manchester, Manchester M13 9PL, UK
| | - David L Robertson
- School of Computer Science, Manchester Institute of Biotechnology and Computational and Evolutionary Biology, Faculty of Life Sciences, The University of Manchester, Manchester M13 9PL, UK
| | - Robert Stevens
- School of Computer Science, Manchester Institute of Biotechnology and Computational and Evolutionary Biology, Faculty of Life Sciences, The University of Manchester, Manchester M13 9PL, UK
| |
Collapse
|
5
|
Yu Q, Ding Y, Song M, Song S, Liu J, Zhang B. Tracing database usage: Detecting main paths in database link networks. J Informetr 2015. [DOI: 10.1016/j.joi.2014.10.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
6
|
Duck G, Nenadic G, Brass A, Robertson DL, Stevens R. bioNerDS: exploring bioinformatics' database and software use through literature mining. BMC Bioinformatics 2013; 14:194. [PMID: 23768135 PMCID: PMC3693927 DOI: 10.1186/1471-2105-14-194] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2012] [Accepted: 06/11/2013] [Indexed: 11/10/2022] Open
Abstract
Background Biology-focused databases and software define bioinformatics and their use is central to computational biology. In such a complex and dynamic field, it is of interest to understand what resources are available, which are used, how much they are used, and for what they are used. While scholarly literature surveys can provide some insights, large-scale computer-based approaches to identify mentions of bioinformatics databases and software from primary literature would automate systematic cataloguing, facilitate the monitoring of usage, and provide the foundations for the recovery of computational methods for analysing biological data, with the long-term aim of identifying best/common practice in different areas of biology. Results We have developed bioNerDS, a named entity recogniser for the recovery of bioinformatics databases and software from primary literature. We identify such entities with an F-measure ranging from 63% to 91% at the mention level and 63-78% at the document level, depending on corpus. Not attaining a higher F-measure is mostly due to high ambiguity in resource naming, which is compounded by the on-going introduction of new resources. To demonstrate the software, we applied bioNerDS to full-text articles from BMC Bioinformatics and Genome Biology. General mention patterns reflect the remit of these journals, highlighting BMC Bioinformatics’s emphasis on new tools and Genome Biology’s greater emphasis on data analysis. The data also illustrates some shifts in resource usage: for example, the past decade has seen R and the Gene Ontology join BLAST and GenBank as the main components in bioinformatics processing. Abstract Conclusions We demonstrate the feasibility of automatically identifying resource names on a large-scale from the scientific literature and show that the generated data can be used for exploration of bioinformatics database and software usage. For example, our results help to investigate the rate of change in resource usage and corroborate the suspicion that a vast majority of resources are created, but rarely (if ever) used thereafter. bioNerDS is available at http://bionerds.sourceforge.net/.
Collapse
Affiliation(s)
- Geraint Duck
- School of Computer Science, The University of Manchester, Manchester, UK
| | | | | | | | | |
Collapse
|
7
|
Abstract
Understanding regulation of gene transcription is central to molecular biology as well as being of great interest in medicine. The molecular syntax of the concerted transcriptional activation/repression of gene networks in mammal cells, which shape the physiological response to the molecular signals, is often unknown or not completely understood. Combining genome-wide experiments with in silico approaches opens the way to a more systematic comprehension of the molecular mechanisms of transcription regulation. Diverse bioinformatics tools have been developed to help unravel these mechanisms, by handling and processing data at different stages: from data collection and storage to the identification of molecular targets and from the detection of DNA motif signatures in the regulatory sequences of functionally related genes to the identification of relevant regulatory networks. Moreover, the large amount of genome-wide scale data recently produced has attracted professionals from diverse backgrounds to this cutting-edge realm of molecular biology. This mini-review is intended as an orientation for multidisciplinary professionals, introducing a streamlined workflow in gene transcription regulation with emphasis on sequence analysis. It provides an outlook on tools and methods, selected from a host of bioinformatics resources available today. It has been designed for the benefit of students, investigators, and professionals who seek a coherent yet quick introduction to in silico approaches to analyzing regulation of gene transcription in the post-genomic era.
Collapse
Affiliation(s)
- Gioia Altobelli
- Department of Endocrinology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK.
| |
Collapse
|
8
|
Brazas MD, Yim D, Yeung W, Ouellette BFF. A decade of Web Server updates at the Bioinformatics Links Directory: 2003-2012. Nucleic Acids Res 2012; 40:W3-W12. [PMID: 22700703 PMCID: PMC3394264 DOI: 10.1093/nar/gks632] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
The 2012 Bioinformatics Links Directory update marks the 10th special Web Server issue from Nucleic Acids Research. Beginning with content from their 2003 publication, the Bioinformatics Links Directory in collaboration with Nucleic Acids Research has compiled and published a comprehensive list of freely accessible, online tools, databases and resource materials for the bioinformatics and life science research communities. The past decade has exhibited significant growth and change in the types of tools, databases and resources being put forth, reflecting both technology changes and the nature of research over that time. With the addition of 90 web server tools and 12 updates from the July 2012 Web Server issue of Nucleic Acids Research, the Bioinformatics Links Directory at http://bioinformatics.ca/links_directory/ now contains an impressive 134 resources, 455 databases and 1205 web server tools, mirroring the continued activity and efforts of our field.
Collapse
Affiliation(s)
- Michelle D Brazas
- Ontario Institute for Cancer Research, 101 College St., Suite 800, Toronto, Ontario, Canada M5G 0A3
| | | | | | | |
Collapse
|
9
|
Primig M. The bioinformatics tool box for reproductive biology. Biochim Biophys Acta Mol Basis Dis 2012; 1822:1880-95. [PMID: 22687534 DOI: 10.1016/j.bbadis.2012.05.018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2011] [Revised: 05/04/2012] [Accepted: 05/28/2012] [Indexed: 10/28/2022]
Abstract
Genetics and molecular biology have been instrumental for a better understanding of heritable defects causing human infertility over the past decades. More recently, the field of reproductive biology has harnessed genome biological approaches to gain insight into molecular processes underlying normal and pathological gametogenesis and gamete function. We are currently witnessing yet another quantum leap in our ability to monitor the flow of information from the genome via the transcriptome to the proteome: tiling arrays that cover both strands of a given target genome and RNA-Seq, a method based on ultra-high throughput DNA sequencing, enable us to study noncoding and protein-coding transcripts with unprecedented precision and depth at a reasonable cost. These technologies have spawned a thriving discipline within the bioinformatics field that employs information technology for managing and interpreting biological high-throughput data. This review outlines database projects and online analysis tools useful for life scientists in general and discusses in detail selected projects that have specifically been developed for researchers and clinicians in the field of reproductive biology. This article is part of a Special Issue entitled: Molecular Genetics of Human Reproductive Failure.
Collapse
Affiliation(s)
- Michael Primig
- Inserm UMR1085-Irset, Université de Rennes 1, Rennes, France.
| |
Collapse
|
10
|
Ram S, Laxman Rao N. iBIRA – integrated bioinformatics information resource access. REFERENCE SERVICES REVIEW 2012. [DOI: 10.1108/00907321211228354] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeBioinformatics is an emerging discipline where the interdisciplinary research holds great promise for the advancement of research and development in many complex areas. The research output generates a huge amount of data and information. Because of the heterogeneous nature of bioinformatics resources, difficulty in accessing pertinent information is the biggest challenge for the bioinformatics community. The integration of bioinformatics resources in a comprehensive manner is advocated by the bioinformatics user community as well as by information scientists serving this community. There are have already been some efforts made for integration of bioinformatics resources by the discrete bioinformatics community, but these are based on the requirement of their own area and arena. This paper aims to discuss the design and development of a tool for the integration of various heterogeneous bioinformatics information resources available over the internet.Design/methodology/approachThe authors have developed a tool with the acronym “iBIRA” (Integrated Bioinformatics Information Resource Access) that associates the bioinformatics community with the bioinformatics “resourceome” (the term suggested for the “full set of bioinformatics resources” by Cannata et al.). Available over the internet. iBIRA (www.ibiranet.in) integrates bioinformatics resources in a way such that it is possible to locate, connect and communicate different categories of resources in a cohesive manner. A software engineering and database‐driven approach was used for the integration and organization of bioinformatics resources. Computational programming such as Hypertext Preprocessor (PHP), a server‐side dynamic web programming language, and MySQL as a database management system have been used. Dublin Core Metadata Standards have been used for the design of metadata for bioinformatics resources..FindingsThe term “resource” in the area of bioinformatics covers various entities such as journals, molecular biology databases, online annotation tools, patents, published documents (articles, books, etc), protocols, software tools, and web servers. It has been found that bioinformatics resources are heterogeneous in nature and available over the internet in different forms and formats. The fact that bioinformatics resources are scattered over the internet makes resource discovery difficult for the bioinformatics community, and there is need for a system that reorganizes these resources. The integration of all the resources of bioinformatics at a single platform (called “iBIRA”) provides significant “value added” to the bioinformatics community, those serving this population.Originality/valueThe iBIRA tool is a meta‐server developed to provide information service about the availability of various bioinformatics resources to the bioinformatics community. This will provide a value‐added benefit to the population in helping them to locate relevant resources for their education, research and training.
Collapse
|
11
|
Schultheiss SJ, Münch MC, Andreeva GD, Rätsch G. Persistence and availability of Web services in computational biology. PLoS One 2011; 6:e24914. [PMID: 21966383 PMCID: PMC3178567 DOI: 10.1371/journal.pone.0024914] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2011] [Accepted: 08/22/2011] [Indexed: 11/19/2022] Open
Abstract
We have conducted a study on the long-term availability of bioinformatics Web services: an observation of 927 Web services published in the annual Nucleic Acids Research Web Server Issues between 2003 and 2009. We found that 72% of Web sites are still available at the published addresses, only 9% of services are completely unavailable. Older addresses often redirect to new pages. We checked the functionality of all available services: for 33%, we could not test functionality because there was no example data or a related problem; 13% were truly no longer working as expected; we could positively confirm functionality only for 45% of all services. Additionally, we conducted a survey among 872 Web Server Issue corresponding authors; 274 replied. 78% of all respondents indicate their services have been developed solely by students and researchers without a permanent position. Consequently, these services are in danger of falling into disrepair after the original developers move to another institution, and indeed, for 24% of services, there is no plan for maintenance, according to the respondents. We introduce a Web service quality scoring system that correlates with the number of citations: services with a high score are cited 1.8 times more often than low-scoring services. We have identified key characteristics that are predictive of a service's survival, providing reviewers, editors, and Web service developers with the means to assess or improve Web services. A Web service conforming to these criteria receives more citations and provides more reliable service for its users. The most effective way of ensuring continued access to a service is a persistent Web address, offered either by the publishing journal, or created on the authors' own initiative, for example at http://bioweb.me. The community would benefit the most from a policy requiring any source code needed to reproduce results to be deposited in a public repository.
Collapse
Affiliation(s)
- Sebastian J Schultheiss
- Machine Learning in Biology Research Group, Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany.
| | | | | | | |
Collapse
|