Liu J, Qu Z, Yang M, Sun J, Su S, Zhang L. Jointly Integrating VCF-Based Variants and OWL-Based Biomedical Ontologies in MongoDB.
IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;
17:1504-1515. [PMID:
31689201 DOI:
10.1109/tcbb.2019.2951137]
[Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The development of the next-generation sequencing (NGS) technologies has led to massive amounts of VCF (Variant Call Format) files, which have been the standard formats developed with 1000 Genomes Project. At the same time, with the widespread use of biomedical ontologies in the biomedical community, more and more applications have accepted the Web Ontology Language (OWL) as the dominant data format for the specifications of biomedical ontology descriptions, leading to the rapid growth of OWL-based biomedical ontology scale. In this paper, we seek to explore an effective method for the management of VCF-based genetic variants and OWL-based biological ontologies using the MongoDB database. Considering many current applications (such as the short genetic variations database dbSNP, etc.) are transitioning to the new design by using JSON (JavaScript Object Notation) to support future massive data expansion and interchanges. We firstly propose a series of rules for the mapping from VCF and OWL files to JSON files, and then present rule-based algorithms for transforming VCF-based genetic variants and OWL-based biological ontologies into JSON objects. On this basis, we introduce effective approaches of integrating the mapped JSON files in MongoDB. Finally, we complement this work with a set of experiments to show the performance of our proposed approaches. The source code of the proposed approaches could be freely available at https://github.com/lyotvincent/AJIA.
Collapse