1
|
Qi Y, Li J, Zhang M. Enabling CMF estimation in data-constrained scenarios: A semantic-encoding knowledge mining model. ACCIDENT; ANALYSIS AND PREVENTION 2024; 205:107662. [PMID: 38897141 DOI: 10.1016/j.aap.2024.107662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 05/20/2024] [Accepted: 05/28/2024] [Indexed: 06/21/2024]
Abstract
Availability of more accurate Crash Modification Factors (CMFs) is crucial for evaluating the effectiveness of various road safety treatments and prioritizing infrastructure investment accordingly. While customized study for each countermeasure scenario is desired, the conventional CMF estimation approaches rely heavily on the availability of crash data at specific sites. This dependency may hinder the development of CMFs when it is impractical to collect data for recent implementations. Additionally, the transferability of CMF knowledge faces challenges, as the intrinsic similarities between different safety countermeasure scenarios are not fully explored. Aiming to fill these gaps, this study introduces a novel knowledge-mining framework for CMF prediction. This framework delves into the connections of existing countermeasure scenarios and reduces the reliance of CMF estimation on crash data availability and manual data collection. Specifically, it draws inspiration from human comprehension processes and introduces advanced Natural Language Processing (NLP) techniques to extract intricate variations and patterns from existing CMF knowledge. It effectively encodes unstructured countermeasure scenarios into machine-readable representations and models the complex relationships between scenarios and CMF values. This new data-driven framework provides a cost-effective and adaptable solution that complements the case-specific approaches for CMF estimation, which is particularly beneficial when availability of crash data imposes constraints. Experimental validation using real-world CMF Clearinghouse data demonstrates the effectiveness of this new approach, which shows significant accuracy improvements compared to the baseline methods. This approach provides insights into new possibilities of harnessing accumulated transportation knowledge in various applications.
Collapse
Affiliation(s)
- Yanlin Qi
- Institute of Transportation Studies, University of California, Davis, CA 95616, USA
| | - Jia Li
- Department of Civil and Environmental Engineering, Washington State University, WA 99164, USA
| | - Michael Zhang
- Institute of Transportation Studies, University of California, Davis, CA 95616, USA.
| |
Collapse
|
2
|
Kokoli M, Karatzas E, Baltoumas FA, Schneider R, Pafilis E, Paragkamian S, Doncheva NT, Jensen L, Pavlopoulos G. Arena3D web: interactive 3D visualization of multilayered networks supporting multiple directional information channels, clustering analysis and application integration. NAR Genom Bioinform 2023; 5:lqad053. [PMID: 37260509 PMCID: PMC10227371 DOI: 10.1093/nargab/lqad053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 04/25/2023] [Accepted: 05/18/2023] [Indexed: 06/02/2023] Open
Abstract
Arena3Dweb is an interactive web tool that visualizes multi-layered networks in 3D space. In this update, Arena3Dweb supports directed networks as well as up to nine different types of connections between pairs of nodes with the use of Bézier curves. It comes with different color schemes (light/gray/dark mode), custom channel coloring, four node clustering algorithms which one can run on-the-fly, visualization in VR mode and predefined layer layouts (zig-zag, star and cube). This update also includes enhanced navigation controls (mouse orbit controls, layer dragging and layer/node selection), while its newly developed API allows integration with external applications as well as saving and loading of sessions in JSON format. Finally, a dedicated Cytoscape app has been developed, through which users can automatically send their 2D networks from Cytoscape to Arena3Dweb for 3D multi-layer visualization. Arena3Dweb is accessible at http://arena3d.pavlopouloslab.info or http://arena3d.org.
Collapse
Affiliation(s)
| | | | - Fotis A Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari16672, Greece
| | - Reinhard Schneider
- University of Luxembourg, Luxembourg Centre for Systems Biomedicine, Bioinformatics Core, Esch-sur-Alzette, Luxembourg
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, Heraklion 71003, Greece
| | - Savvas Paragkamian
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, Heraklion 71003, Greece
- Department of Biology, University of Crete, Voutes University Campus, P.O. Box 2208, 70013 Heraklion, Crete, Greece
| | - Nadezhda T Doncheva
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen N DK-2200, Denmark
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen N DK-2200, Denmark
| | | |
Collapse
|
3
|
Antuamwine BB, Bosnjakovic R, Hofmann-Vega F, Wang X, Theodosiou T, Iliopoulos I, Brandau S. N1 versus N2 and PMN-MDSC: A critical appraisal of current concepts on tumor-associated neutrophils and new directions for human oncology. Immunol Rev 2022; 314:250-279. [PMID: 36504274 DOI: 10.1111/imr.13176] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Research on tumor-associated neutrophils (TAN) currently surges because of the well-documented strong clinical relevance of tumor-infiltrating neutrophils. This relevance is illustrated by strong correlations between high frequencies of intratumoral neutrophils and poor outcome in the majority of human cancers. Recent high-dimensional analysis of murine neutrophils provides evidence for unexpected plasticity of neutrophils in murine models of cancer and other inflammatory non-malignant diseases. New analysis tools enable deeper insight into the process of neutrophil differentiation and maturation. These technological and scientific developments led to the description of an ever-increasing number of distinct transcriptional states and associated phenotypes in murine models of disease and more recently also in humans. At present, functional validation of these different transcriptional states and potential phenotypes in cancer is lacking. Current functional concepts on neutrophils in cancer rely mainly on the myeloid-derived suppressor cell (MDSC) concept and the dichotomous and simple N1-N2 paradigm. In this manuscript, we review the historic development of those concepts, critically evaluate these concepts against the background of our own work and provide suggestions for a refinement of current concepts in order to facilitate the transition of TAN research from experimental insight to clinical translation.
Collapse
Affiliation(s)
- Benedict Boateng Antuamwine
- Experimental and Translational Research, Department of Otorhinolaryngology, University Hospital Essen, Essen, Germany
| | - Rebeka Bosnjakovic
- Experimental and Translational Research, Department of Otorhinolaryngology, University Hospital Essen, Essen, Germany
| | - Francisca Hofmann-Vega
- Experimental and Translational Research, Department of Otorhinolaryngology, University Hospital Essen, Essen, Germany
| | - Xi Wang
- Experimental and Translational Research, Department of Otorhinolaryngology, University Hospital Essen, Essen, Germany
| | - Theodosios Theodosiou
- Department of Basic Sciences, School of Medicine, University of Crete, Heraklion, Greece
| | - Ioannis Iliopoulos
- Department of Basic Sciences, School of Medicine, University of Crete, Heraklion, Greece
| | - Sven Brandau
- Experimental and Translational Research, Department of Otorhinolaryngology, University Hospital Essen, Essen, Germany.,German Cancer Consortium, Partner Site Essen-Düsseldorf, Essen, Germany
| |
Collapse
|
4
|
Prediction and Ranking of Biomarkers Using multiple UniReD. Int J Mol Sci 2022; 23:ijms231911112. [PMID: 36232413 PMCID: PMC9569535 DOI: 10.3390/ijms231911112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 09/06/2022] [Accepted: 09/17/2022] [Indexed: 11/23/2022] Open
Abstract
Protein–protein interactions (PPIs) are of key importance for understanding how cells and organisms function. Thus, in recent decades, many approaches have been developed for the identification and discovery of such interactions. These approaches addressed the problem of PPI identification either by an experimental point of view or by a computational one. Here, we present an updated version of UniReD, a computational prediction tool which takes advantage of biomedical literature aiming to extract documented, already published protein associations and predict undocumented ones. The usefulness of this computational tool has been previously evaluated by experimentally validating predicted interactions and by benchmarking it against public databases of experimentally validated PPIs. In its updated form, UniReD allows the user to provide a list of proteins of known implication in, e.g., a particular disease, as well as another list of proteins that are potentially associated with the proteins of the first list. UniReD then automatically analyzes both lists and ranks the proteins of the second list by their association with the proteins of the first list, thus serving as a potential biomarker discovery/validation tool.
Collapse
|
5
|
Darling: A Web Application for Detecting Disease-Related Biomedical Entity Associations with Literature Mining. Biomolecules 2022; 12:biom12040520. [PMID: 35454109 PMCID: PMC9028073 DOI: 10.3390/biom12040520] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 03/24/2022] [Accepted: 03/28/2022] [Indexed: 12/15/2022] Open
Abstract
Finding, exploring and filtering frequent sentence-based associations between a disease and a biomedical entity, co-mentioned in disease-related PubMed literature, is a challenge, as the volume of publications increases. Darling is a web application, which utilizes Name Entity Recognition to identify human-related biomedical terms in PubMed articles, mentioned in OMIM, DisGeNET and Human Phenotype Ontology (HPO) disease records, and generates an interactive biomedical entity association network. Nodes in this network represent genes, proteins, chemicals, functions, tissues, diseases, environments and phenotypes. Users can search by identifiers, terms/entities or free text and explore the relevant abstracts in an annotated format.
Collapse
|
6
|
Tissue-Specific Methylation Biosignatures for Monitoring Diseases: An In Silico Approach. Int J Mol Sci 2022; 23:ijms23062959. [PMID: 35328380 PMCID: PMC8952417 DOI: 10.3390/ijms23062959] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 03/01/2022] [Accepted: 03/03/2022] [Indexed: 02/06/2023] Open
Abstract
Tissue-specific gene methylation events are key to the pathogenesis of several diseases and can be utilized for diagnosis and monitoring. Here, we established an in silico pipeline to analyze high-throughput methylome datasets to identify specific methylation fingerprints in three pathological entities of major burden, i.e., breast cancer (BrCa), osteoarthritis (OA) and diabetes mellitus (DM). Differential methylation analysis was conducted to compare tissues/cells related to the pathology and different types of healthy tissues, revealing Differentially Methylated Genes (DMGs). Highly performing and low feature number biosignatures were built with automated machine learning, including: (1) a five-gene biosignature discriminating BrCa tissue from healthy tissues (AUC 0.987 and precision 0.987), (2) three equivalent OA cartilage-specific biosignatures containing four genes each (AUC 0.978 and precision 0.986) and (3) a four-gene pancreatic β-cell-specific biosignature (AUC 0.984 and precision 0.995). Next, the BrCa biosignature was validated using an independent ccfDNA dataset showing an AUC and precision of 1.000, verifying the biosignature’s applicability in liquid biopsy. Functional and protein interaction prediction analysis revealed that most DMGs identified are involved in pathways known to be related to the studied diseases or pointed to new ones. Overall, our data-driven approach contributes to the maximum exploitation of high-throughput methylome readings, helping to establish specific disease profiles to be applied in clinical practice and to understand human pathology.
Collapse
|
7
|
Baltoumas FA, Zafeiropoulou S, Karatzas E, Paragkamian S, Thanati F, Iliopoulos I, Eliopoulos AG, Schneider R, Jensen LJ, Pafilis E, Pavlopoulos GA. OnTheFly 2.0: a text-mining web application for automated biomedical entity recognition, document annotation, network and functional enrichment analysis. NAR Genom Bioinform 2021; 3:lqab090. [PMID: 34632381 PMCID: PMC8494211 DOI: 10.1093/nargab/lqab090] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 09/09/2021] [Accepted: 09/20/2021] [Indexed: 02/06/2023] Open
Abstract
Extracting and processing information from documents is of great importance as lots of experimental results and findings are stored in local files. Therefore, extracting and analyzing biomedical terms from such files in an automated way is absolutely necessary. In this article, we present OnTheFly2.0, a web application for extracting biomedical entities from individual files such as plain texts, office documents, PDF files or images. OnTheFly2.0 can generate informative summaries in popup windows containing knowledge related to the identified terms along with links to various databases. It uses the EXTRACT tagging service to perform named entity recognition (NER) for genes/proteins, chemical compounds, organisms, tissues, environments, diseases, phenotypes and gene ontology terms. Multiple files can be analyzed, whereas identified terms such as proteins or genes can be explored through functional enrichment analysis or be associated with diseases and PubMed entries. Finally, protein-protein and protein-chemical networks can be generated with the use of STRING and STITCH services. To demonstrate its capacity for knowledge discovery, we interrogated published meta-analyses of clinical biomarkers of severe COVID-19 and uncovered inflammatory and senescence pathways that impact disease pathogenesis. OnTheFly2.0 currently supports 197 species and is available at http://bib.fleming.gr:3838/OnTheFly/ and http://onthefly.pavlopouloslab.info.
Collapse
Affiliation(s)
- Fotis A Baltoumas
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari 16672, Greece
| | - Sofia Zafeiropoulou
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari 16672, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari 16672, Greece
| | - Savvas Paragkamian
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes P.O. Box 2214, 71003 Heraklion, Crete, Greece
| | - Foteini Thanati
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari 16672, Greece
| | - Ioannis Iliopoulos
- Department of Basic Sciences, School of Medicine, University of Crete, Heraklion 71003, Crete, Greece
| | - Aristides G Eliopoulos
- Department of Biology, School of Medicine, National and Kapodistrian University of Athens, Athens, 70013, Greece
| | - Reinhard Schneider
- University of Luxembourg, Luxembourg Centre for Systems Biomedicine, Bioinformatics Core, Esch-sur-Alzette, L-4365, Luxembourg
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, 2200, Denmark
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes P.O. Box 2214, 71003 Heraklion, Crete, Greece
| | - Georgios A Pavlopoulos
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center "Alexander Fleming", Vari 16672, Greece
| |
Collapse
|
8
|
Sanyal DK, Bhowmick PK, Das PP. A review of author name disambiguation techniques for the PubMed bibliographic database. J Inf Sci 2019. [DOI: 10.1177/0165551519888605] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Author names in bibliographic databases often suffer from ambiguity owing to the same author appearing under different names and multiple authors possessing similar names. It creates difficulty in associating a scholarly work with the person who wrote it, thereby introducing inaccuracy in credit attribution, bibliometric analysis, search-by-author in a digital library and expert discovery. A plethora of techniques for disambiguation of author names has been proposed in the literature. In this article, we focus on the research efforts targeted to disambiguate author names specifically in the PubMed bibliographic database. We believe this concentrated review will be useful to the research community because it discusses techniques applied to a very large real database that is actively used worldwide. We make a comprehensive survey of the existing author name disambiguation (AND) approaches that have been applied to the PubMed database: we organise the approaches into a taxonomy; describe the major characteristics of each approach including its performance, strengths, and limitations; and perform a comparative analysis of them. We also identify the datasets from PubMed that are publicly available for researchers to evaluate AND algorithms. Finally, we outline a few directions for future work.
Collapse
Affiliation(s)
| | | | - Partha Pratim Das
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, India
| |
Collapse
|
9
|
Ouzounis CA. Developing computational biology at meridian 23° E, and a little eastwards. ACTA ACUST UNITED AC 2018; 25:18. [PMID: 30460210 PMCID: PMC6237004 DOI: 10.1186/s40709-018-0091-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Accepted: 11/09/2018] [Indexed: 11/23/2022]
Abstract
Modern biology is experiencing a deep transformation by the expansion of molecular-level measurements at all scales, using omics technologies. A key element in this transformation is the field of bioinformatics, that has—in the meanwhile—permeated pretty much all of biological and biomedical research and is now emerging as a key inter-disciplinary area that connects the natural sciences, chemical and electrical engineering, science education and science policy, on a number of science and technology fronts. The strong tradition of open access for large volumes of raw data, collections of complex results and high-quality algorithm implementations in bioinformatics makes the field a unique, special case of open science. We report on our recent research activities, the development of training initiatives in the wider region during the past years, and the lessons learned regarding our efforts away from major epicenters, within the general context of open science.
Collapse
Affiliation(s)
- Christos A Ouzounis
- Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, 57001 Thessaloníki, Greece
| |
Collapse
|
10
|
Allot A, Chennen K, Nevers Y, Poidevin L, Kress A, Ripp R, Thompson JD, Poch O, Lecompte O. MyGeneFriends: A Social Network Linking Genes, Genetic Diseases, and Researchers. J Med Internet Res 2017. [PMID: 28623182 PMCID: PMC5493784 DOI: 10.2196/jmir.6676] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Background The constant and massive increase of biological data offers unprecedented opportunities to decipher the function and evolution of genes and their roles in human diseases. However, the multiplicity of sources and flow of data mean that efficient access to useful information and knowledge production has become a major challenge. This challenge can be addressed by taking inspiration from Web 2.0 and particularly social networks, which are at the forefront of big data exploration and human-data interaction. Objective MyGeneFriends is a Web platform inspired by social networks, devoted to genetic disease analysis, and organized around three types of proactive agents: genes, humans, and genetic diseases. The aim of this study was to improve exploration and exploitation of biological, postgenomic era big data. Methods MyGeneFriends leverages conventions popularized by top social networks (Facebook, LinkedIn, etc), such as networks of friends, profile pages, friendship recommendations, affinity scores, news feeds, content recommendation, and data visualization. Results MyGeneFriends provides simple and intuitive interactions with data through evaluation and visualization of connections (friendships) between genes, humans, and diseases. The platform suggests new friends and publications and allows agents to follow the activity of their friends. It dynamically personalizes information depending on the user’s specific interests and provides an efficient way to share information with collaborators. Furthermore, the user’s behavior itself generates new information that constitutes an added value integrated in the network, which can be used to discover new connections between biological agents. Conclusions We have developed MyGeneFriends, a Web platform leveraging conventions from popular social networks to redefine the relationship between humans and biological big data and improve human processing of biomedical data. MyGeneFriends is available at lbgi.fr/mygenefriends.
Collapse
Affiliation(s)
- Alexis Allot
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| | - Kirsley Chennen
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| | - Yannis Nevers
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| | - Laetitia Poidevin
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| | - Arnaud Kress
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| | - Raymond Ripp
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| | - Julie Dawn Thompson
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| | - Olivier Poch
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| | - Odile Lecompte
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| |
Collapse
|
11
|
Mukherjee S, Stamatis D, Bertsch J, Ovchinnikova G, Verezemska O, Isbandi M, Thomas AD, Ali R, Sharma K, Kyrpides NC, Reddy TBK. Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements. Nucleic Acids Res 2017; 45:D446-D456. [PMID: 27794040 PMCID: PMC5210664 DOI: 10.1093/nar/gkw992] [Citation(s) in RCA: 135] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 10/11/2016] [Accepted: 10/19/2016] [Indexed: 01/28/2023] Open
Abstract
The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is a manually curated data management system that catalogs sequencing projects with associated metadata from around the world. In the current version of GOLD (v.6), all projects are organized based on a four level classification system in the form of a Study, Organism (for isolates) or Biosample (for environmental samples), Sequencing Project and Analysis Project. Currently, GOLD provides information for 26 117 Studies, 239 100 Organisms, 15 887 Biosamples, 97 212 Sequencing Projects and 78 579 Analysis Projects. These are integrated with over 312 metadata fields from which 58 are controlled vocabularies with 2067 terms. The web interface facilitates submission of a diverse range of Sequencing Projects (such as isolate genome, single-cell genome, metagenome, metatranscriptome) and complex Analysis Projects (such as genome from metagenome, or combined assembly from multiple Sequencing Projects). GOLD provides a seamless interface with the Integrated Microbial Genomes (IMG) system and supports and promotes the Genomic Standards Consortium (GSC) Minimum Information standards. This paper describes the data updates and additional features added during the last two years.
Collapse
Affiliation(s)
- Supratim Mukherjee
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Dimitri Stamatis
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Jon Bertsch
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Galina Ovchinnikova
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Olena Verezemska
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Michelle Isbandi
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Alex D Thomas
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Rida Ali
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Kaushal Sharma
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| | - Nikos C Kyrpides
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - T B K Reddy
- Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, 94598 CA, USA
| |
Collapse
|
12
|
Papanikolaou N, Pavlopoulos GA, Theodosiou T, Vizirianakis IS, Iliopoulos I. DrugQuest - a text mining workflow for drug association discovery. BMC Bioinformatics 2016; 17 Suppl 5:182. [PMID: 27295093 PMCID: PMC4905607 DOI: 10.1186/s12859-016-1041-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Background Text mining and data integration methods are gaining ground in the field of health sciences due to the exponential growth of bio-medical literature and information stored in biological databases. While such methods mostly try to extract bioentity associations from PubMed, very few of them are dedicated in mining other types of repositories such as chemical databases. Results Herein, we apply a text mining approach on the DrugBank database in order to explore drug associations based on the DrugBank “Description”, “Indication”, “Pharmacodynamics” and “Mechanism of Action” text fields. We apply Name Entity Recognition (NER) techniques on these fields to identify chemicals, proteins, genes, pathways, diseases, and we utilize the TextQuest algorithm to find additional biologically significant words. Using a plethora of similarity and partitional clustering techniques, we group the DrugBank records based on their common terms and investigate possible scenarios why these records are clustered together. Different views such as clustered chemicals based on their textual information, tag clouds consisting of Significant Terms along with the terms that were used for clustering are delivered to the user through a user-friendly web interface. Conclusions DrugQuest is a text mining tool for knowledge discovery: it is designed to cluster DrugBank records based on text attributes in order to find new associations between drugs. The service is freely available at http://bioinformatics.med.uoc.gr/drugquest.
Collapse
Affiliation(s)
- Nikolas Papanikolaou
- Division of Basic Sciences, University of Crete, Medical School, Gouves, 71003, Heraklion, Crete, Greece
| | - Georgios A Pavlopoulos
- Division of Basic Sciences, University of Crete, Medical School, Gouves, 71003, Heraklion, Crete, Greece
| | - Theodosios Theodosiou
- Division of Basic Sciences, University of Crete, Medical School, Gouves, 71003, Heraklion, Crete, Greece
| | - Ioannis S Vizirianakis
- School of Pharmacy, Laboratory of Pharmacology, Aristotle University of Thessaloniki, University Campus, 54124, Thessaloniki, Greece
| | - Ioannis Iliopoulos
- Division of Basic Sciences, University of Crete, Medical School, Gouves, 71003, Heraklion, Crete, Greece.
| |
Collapse
|
13
|
Gupta R, Mantri SS. Biomolecular Relationships Discovered from Biological Labyrinth and Lost in Ocean of Literature: Community Efforts Can Rescue Until Automated Artificial Intelligence Takes Over. Front Genet 2016; 7:46. [PMID: 27066067 PMCID: PMC4814459 DOI: 10.3389/fgene.2016.00046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Accepted: 03/15/2016] [Indexed: 11/30/2022] Open
|