1
|
Hammad R, Barhoush M, Abed-alguni BH. A Semantic-Based Approach for Managing Healthcare Big Data: A Survey. JOURNAL OF HEALTHCARE ENGINEERING 2020; 2020:8865808. [PMID: 33489061 PMCID: PMC7787845 DOI: 10.1155/2020/8865808] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 11/02/2020] [Accepted: 11/09/2020] [Indexed: 12/20/2022]
Abstract
Healthcare information systems can reduce the expenses of treatment, foresee episodes of pestilences, help stay away from preventable illnesses, and improve personal life satisfaction. As of late, considerable volumes of heterogeneous and differing medicinal services data are being produced from different sources covering clinic records of patients, lab results, and wearable devices, making it hard for conventional data processing to handle and manage this amount of data. Confronted with the difficulties and challenges facing the process of managing healthcare big data such as volume, velocity, and variety, healthcare information systems need to use new methods and techniques for managing and processing such data to extract useful information and knowledge. In the recent few years, a large number of organizations and companies have shown enthusiasm for using semantic web technologies with healthcare big data to convert data into knowledge and intelligence. In this paper, we review the state of the art on the semantic web for the healthcare industry. Based on our literature review, we will discuss how different techniques, standards, and points of view created by the semantic web community can participate in addressing the challenges related to healthcare big data.
Collapse
|
2
|
Mou X, Jamil HM. Visual Life Sciences Workflow Design Using Distributed and Heterogeneous Resources. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1459-1473. [PMID: 30561349 DOI: 10.1109/tcbb.2018.2886185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Programming or querying usually presupposes some degree of technical familiarity with the syntax of a language and the peculiarity of the objects it manipulates to produce useful information. The degree of abstractions supported in a language helps lessen the depth of such familiarity needed, and aids in improving access to and usability of these resources. To help biologists concentrate more on their science questions and not on how to compute it, several successful workflow orchestration languages and systems have been proposed. Despite their popularity, significant limitations reduce their usability and limit applicability in novel applications. In this paper, we present a visual language, called VisFlow, for workflow orchestration using heterogeneous and distributed resources. We advance the idea that once resources are minimally described and abstracted, arbitrary workflows can be designed solely using query primitives supported in VisFlow. Its capabilities can be augmented by including computational artifacts in the form of library functions written in R, Python, and Java, or even in SQL and XQuery, making it a truly extensible system. We discuss its salient features and illustrate its capabilities using a substantial set of examples.
Collapse
|
3
|
Jamil HM. Optimizing Phylogenetic Queries for Performance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1692-1705. [PMID: 28858810 DOI: 10.1109/tcbb.2017.2743706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The vast majority of phylogenetic databases do not support declarative querying using which their contents can be flexibly and conveniently accessed and the template based query interfaces they support do not allow arbitrary speculative queries. They therefore also do not support query optimization leveraging unique phylogeny properties. While a small number of graph query languages such as XQuery, Cypher, and GraphQL exist for computer savvy users, most are too general and complex to be useful for biologists, and too inefficient for large phylogeny querying. In this paper, we discuss a recently introduced visual query language, called PhyQL, that leverages phylogeny specific properties to support essential and powerful constructs for a large class of phylogentic queries. We develop a range of pruning aids, and propose a substantial set of query optimization strategies using these aids suitable for large phylogeny querying. A hybrid optimization technique that exploits a set of indices and "graphlet" partitioning is discussed. A "fail soonest" strategy is used to avoid hopeless processing and is shown to produce dividends. Possible novel optimization techniques yet to be explored are also discussed.
Collapse
|
4
|
ASP Applications in Bio-informatics: A Short Tour. KUNSTLICHE INTELLIGENZ 2018. [DOI: 10.1007/s13218-018-0551-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
5
|
Zhang H, Guo Y, Li Q, George TJ, Shenkman E, Modave F, Bian J. An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival. BMC Med Inform Decis Mak 2018; 18:41. [PMID: 30066664 PMCID: PMC6069766 DOI: 10.1186/s12911-018-0636-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Cancer is the second leading cause of death in the United States, exceeded only by heart disease. Extant cancer survival analyses have primarily focused on individual-level factors due to limited data availability from a single data source. There is a need to integrate data from different sources to simultaneously study as much risk factors as possible. Thus, we proposed an ontology-based approach to integrate heterogeneous datasets addressing key data integration challenges. METHODS Following best practices in ontology engineering, we created the Ontology for Cancer Research Variables (OCRV) adapting existing semantic resources such as the National Cancer Institute (NCI) Thesaurus. Using the global-as-view data integration approach, we created mapping axioms to link the data elements in different sources to OCRV. Implemented upon the Ontop platform, we built a data integration pipeline to query, extract, and transform data in relational databases using semantic queries into a pooled dataset according to the downstream multi-level Integrative Data Analysis (IDA) needs. RESULTS Based on our use cases in the cancer survival IDA, we created tailored ontological structures in OCRV to facilitate the data integration tasks. Specifically, we created a flexible framework addressing key integration challenges: (1) using a shared, controlled vocabulary to make data understandable to both human and computers, (2) explicitly modeling the semantic relationships makes it possible to compute and reason with the data, (3) linking patients to contextual and environmental factors through geographic variables, (4) being able to document the data manipulation and integration processes clearly in the ontologies. CONCLUSIONS Using an ontology-based data integration approach not only standardizes the definitions of data variables through a common, controlled vocabulary, but also makes the semantic relationships among variables from different sources explicit and clear to all users of the same datasets. Such an approach resolves the ambiguity in variable selection, extraction and integration processes and thus improve reproducibility of the IDA.
Collapse
Affiliation(s)
- Hansi Zhang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Clinical and Translational Research Building Suite 3228, 2004 Mowry Road, PO Box 100219, Gainesville, FL, 32610-0219, USA
| | - Yi Guo
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Clinical and Translational Research Building Suite 3228, 2004 Mowry Road, PO Box 100219, Gainesville, FL, 32610-0219, USA
| | - Qian Li
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Clinical and Translational Research Building Suite 3228, 2004 Mowry Road, PO Box 100219, Gainesville, FL, 32610-0219, USA
| | - Thomas J George
- Division of Hematology and Oncology, Department of Medicine, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Elizabeth Shenkman
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Clinical and Translational Research Building Suite 3228, 2004 Mowry Road, PO Box 100219, Gainesville, FL, 32610-0219, USA
| | - François Modave
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Clinical and Translational Research Building Suite 3228, 2004 Mowry Road, PO Box 100219, Gainesville, FL, 32610-0219, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Clinical and Translational Research Building Suite 3228, 2004 Mowry Road, PO Box 100219, Gainesville, FL, 32610-0219, USA.
| |
Collapse
|
6
|
Vogt L. The logical basis for coding ontologically dependent characters. Cladistics 2017; 34:438-458. [DOI: 10.1111/cla.12209] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/23/2017] [Indexed: 01/26/2023] Open
Affiliation(s)
- Lars Vogt
- Institut für Evolutionsbiologie und Ökologie; Universität Bonn; An der Immenburg 1 D-53121 Bonn Germany
| |
Collapse
|
7
|
Jamil HM. A Visual Interface for Querying Heterogeneous Phylogenetic Databases. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:131-144. [PMID: 26812733 DOI: 10.1109/tcbb.2016.2520943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Despite the recent growth in the number of phylogenetic databases, access to these wealth of resources remain largely tool or form-based interface driven. It is our thesis that the flexibility afforded by declarative query languages may offer the opportunity to access these repositories in a better way, and to use such a language to pose truly powerful queries in unprecedented ways. In this paper, we propose a substantially enhanced closed visual query language, called PhyQL, that can be used to query phylogenetic databases represented in a canonical form. The canonical representation presented helps capture most phylogenetic tree formats in a convenient way, and is used as the storage model for our PhyloBase database for which PhyQL serves as the query language. We have implemented a visual interface for the end users to pose PhyQL queries using visual icons, and drag and drop operations defined over them. Once a query is posed, the interface translates the visual query into a Datalog query for execution over the canonical database. Responses are returned as hyperlinks to phylogenies that can be viewed in several formats using the tree viewers supported by PhyloBase. Results cached in PhyQL buffer allows secondary querying on the computed results making it a truly powerful querying architecture.
Collapse
|
8
|
Strejcek M, Wang Q, Ridl J, Uhlik O. Hunting Down Frame Shifts: Ecological Analysis of Diverse Functional Gene Sequences. Front Microbiol 2015; 6:1267. [PMID: 26635739 PMCID: PMC4656815 DOI: 10.3389/fmicb.2015.01267] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Accepted: 10/30/2015] [Indexed: 01/19/2023] Open
Abstract
Functional gene ecological analyses using amplicon sequencing can be challenging as translated sequences are often burdened with shifted reading frames. The aim of this work was to evaluate several bioinformatics tools designed to correct errors which arise during sequencing in an effort to reduce the number of frameshifts (FS). Genes encoding for alpha subunits of biphenyl (bphA) and benzoate (benA) dioxygenases were used as model sequences. FrameBot, a FS correction tool, was able to reduce the number of detected FS to zero. However, up to 44% of sequences were discarded by FrameBot as non-specific targets. Therefore, we proposed a de novo mode of FrameBot for FS correction, which works on a similar basis as common chimera identifying platforms and is not dependent on reference sequences. By nature of FrameBot de novo design, it is crucial to provide it with data as error free as possible. We tested the ability of several publicly available correction tools to decrease the number of errors in the data sets. The combination of maximum expected error filtering and single linkage pre-clustering proved to be the most efficient read processing approach. Applying FrameBot de novo on the processed data enabled analysis of BphA sequences with minimal losses of potentially functional sequences not homologous to those previously known. This experiment also demonstrated the extensive diversity of dioxygenases in soil. A script which performs FrameBot de novo is presented in the supplementary material to the study or available at https://github.com/strejcem/FBdenovo. The tool was also implemented into FunGene Pipeline available at http://fungene.cme.msu.edu/FunGenePipeline/.
Collapse
Affiliation(s)
- Michal Strejcek
- Department of Biochemistry and Microbiology, Faculty of Food and Biochemical Technology, University of Chemistry and Technology, Prague Prague, Czech Republic
| | - Qiong Wang
- Center for Microbial Ecology, Michigan State University East Lansing, MI, USA
| | - Jakub Ridl
- Department of Genomics and Bioinformatics, Institute of Molecular Genetics, Academy of Sciences of the Czech Republic Prague, Czech Republic
| | - Ondrej Uhlik
- Department of Biochemistry and Microbiology, Faculty of Food and Biochemical Technology, University of Chemistry and Technology, Prague Prague, Czech Republic
| |
Collapse
|
9
|
ReproPhylo: An Environment for Reproducible Phylogenomics. PLoS Comput Biol 2015; 11:e1004447. [PMID: 26335558 PMCID: PMC4559436 DOI: 10.1371/journal.pcbi.1004447] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 07/13/2015] [Indexed: 11/19/2022] Open
Abstract
The reproducibility of experiments is key to the scientific process, and particularly necessary for accurate reporting of analyses in data-rich fields such as phylogenomics. We present ReproPhylo, a phylogenomic analysis environment developed to ensure experimental reproducibility, to facilitate the handling of large-scale data, and to assist methodological experimentation. Reproducibility, and instantaneous repeatability, is built in to the ReproPhylo system and does not require user intervention or configuration because it stores the experimental workflow as a single, serialized Python object containing explicit provenance and environment information. This ‘single file’ approach ensures the persistence of provenance across iterations of the analysis, with changes automatically managed by the version control program Git. This file, along with a Git repository, are the primary reproducibility outputs of the program. In addition, ReproPhylo produces an extensive human-readable report and generates a comprehensive experimental archive file, both of which are suitable for submission with publications. The system facilitates thorough experimental exploration of both parameters and data. ReproPhylo is a platform independent CC0 Python module and is easily installed as a Docker image or a WinPython self-sufficient package, with a Jupyter Notebook GUI, or as a slimmer version in a Galaxy distribution.
Collapse
|
10
|
Robson B, Caruso TP, Balis UG. Suggestions for a web based universal exchange and inference language for medicine. Continuity of patient care with PCAST disaggregation. Comput Biol Med 2015; 56:51-66. [DOI: 10.1016/j.compbiomed.2014.10.022] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Revised: 10/23/2014] [Accepted: 10/25/2014] [Indexed: 10/24/2022]
|
11
|
Panahiazar M, Sheth AP, Ranabahu A, Vos RA, Leebens-Mack J. Advancing data reuse in phyloinformatics using an ontology-driven Semantic Web approach. BMC Med Genomics 2013; 6 Suppl 3:S5. [PMID: 24565381 PMCID: PMC3980757 DOI: 10.1186/1755-8794-6-s3-s5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Phylogenetic analyses can resolve historical relationships among genes, organisms or higher taxa. Understanding such relationships can elucidate a wide range of biological phenomena, including, for example, the importance of gene and genome duplications in the evolution of gene function, the role of adaptation as a driver of diversification, or the evolutionary consequences of biogeographic shifts. Phyloinformaticists are developing data standards, databases and communication protocols (e.g. Application Programming Interfaces, APIs) to extend the accessibility of gene trees, species trees, and the metadata necessary to interpret these trees, thus enabling researchers across the life sciences to reuse phylogenetic knowledge. Specifically, Semantic Web technologies are being developed to make phylogenetic knowledge interpretable by web agents, thereby enabling intelligently automated, high-throughput reuse of results generated by phylogenetic research. This manuscript describes an ontology-driven, semantic problem-solving environment for phylogenetic analyses and introduces artefacts that can promote phyloinformatic efforts to promote accessibility of trees and underlying metadata. PhylOnt is an extensible ontology with concepts describing tree types and tree building methodologies including estimation methods, models and programs. In addition we present the PhylAnt platform for annotating scientific articles and NeXML files with PhylOnt concepts. The novelty of this work is the annotation of NeXML files and phylogenetic related documents with PhylOnt Ontology. This approach advances data reuse in phyloinformatics.
Collapse
|
12
|
Miyoshi NSB, Pinheiro DG, Silva WA, Felipe JC. Computational framework to support integration of biomolecular and clinical data within a translational approach. BMC Bioinformatics 2013; 14:180. [PMID: 23742129 PMCID: PMC3688149 DOI: 10.1186/1471-2105-14-180] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Accepted: 05/24/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information. RESULTS We have implemented an extension of Chado - the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications. CONCLUSIONS Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different "omics" technologies with patient's clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans.
Collapse
Affiliation(s)
- Newton Shydeo Brandão Miyoshi
- Department of Computing and Mathematics, Faculty of Philosophy, Sciences and Languages of Ribeirão Preto, University of São Paulo, São Paulo, Brazil
| | | | | | | |
Collapse
|
13
|
Parr CS, Guralnick R, Cellinese N, Page RD. Evolutionary informatics: unifying knowledge about the diversity of life. Trends Ecol Evol 2012; 27:94-103. [PMID: 22154516 DOI: 10.1016/j.tree.2011.11.001] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2011] [Revised: 10/31/2011] [Accepted: 11/01/2011] [Indexed: 01/23/2023]
|