1
|
D’Agostino D, Liò P, Aldinucci M, Merelli I. Advantages of using graph databases to explore chromatin conformation capture experiments. BMC Bioinformatics 2021; 22:43. [PMID: 33902433 PMCID: PMC8073886 DOI: 10.1186/s12859-020-03937-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 12/15/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND High-throughput sequencing Chromosome Conformation Capture (Hi-C) allows the study of DNA interactions and 3D chromosome folding at the genome-wide scale. Usually, these data are represented as matrices describing the binary contacts among the different chromosome regions. On the other hand, a graph-based representation can be advantageous to describe the complex topology achieved by the DNA in the nucleus of eukaryotic cells. METHODS Here we discuss the use of a graph database for storing and analysing data achieved by performing Hi-C experiments. The main issue is the size of the produced data and, working with a graph-based representation, the consequent necessity of adequately managing a large number of edges (contacts) connecting nodes (genes), which represents the sources of information. For this, currently available graph visualisation tools and libraries fall short with Hi-C data. The use of graph databases, instead, supports both the analysis and the visualisation of the spatial pattern present in Hi-C data, in particular for comparing different experiments or for re-mapping omics data in a space-aware context efficiently. In particular, the possibility of describing graphs through statistical indicators and, even more, the capability of correlating them through statistical distributions allows highlighting similarities and differences among different Hi-C experiments, in different cell conditions or different cell types. RESULTS These concepts have been implemented in NeoHiC, an open-source and user-friendly web application for the progressive visualisation and analysis of Hi-C networks based on the use of the Neo4j graph database (version 3.5). CONCLUSION With the accumulation of more experiments, the tool will provide invaluable support to compare neighbours of genes across experiments and conditions, helping in highlighting changes in functional domains and identifying new co-organised genomic compartments.
Collapse
Affiliation(s)
- Daniele D’Agostino
- Institute of Electronics, Computer and Telecommunication Engineering, National Research Council of Italy, Genoa, Italy
| | - Pietro Liò
- Computer Laboratory, University of Cambridge, Cambridge, UK
| | - Marco Aldinucci
- Computer Science Department, University of Turin, Turin, Italy
| | - Ivan Merelli
- Institute for Biomedical Technologies, National Research Council of Italy, Segrate, MI Italy
| |
Collapse
|
2
|
Deng L, Zhang Y, Chen Z, Zhao Z, Zhang K, Wu J. Regional Upstroke Tracking for Transit Time Detection to Improve the Ultrasound-Based Local PWV Estimation in Carotid Arteries. IEEE TRANSACTIONS ON ULTRASONICS, FERROELECTRICS, AND FREQUENCY CONTROL 2020; 67:691-702. [PMID: 31714222 DOI: 10.1109/tuffc.2019.2951922] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Pulse wave velocity (PWV) is the most important index for quantifying the elasticity of an artery. The accurate estimation of the local PWV is of great relevance to the early diagnosis and effective prevention of arterial stiffness. In ultrasonic transit time-based local PWV estimation, the locations of time fiduciary point (TFP) in the upstrokes of the propagating pulse waves (PWs) are inconsistent because of the reflected waves and ultrasonic noise. In this study, a regional upstroke tracking (RUT) approach that involved identifying the most similar TFP-centered region in the upstrokes is proposed to detect the time delay for improving the local PWV estimation. Five RUT algorithms with different tracking points are assessed via simulation and clinical experiments. To quantitatively evaluate the RUT algorithms, the normalized root-mean-squared errors and standard deviations of the estimated PWVs are calculated using an ultrasound simulation model. The reproducibility of the five RUT algorithms based on 30 human subjects is also evaluated using the Bland-Altman analysis and coefficient of variation (CV). The obtained results show that the RUT algorithms with only three tracking points provide greater accuracy, precision, and reproducibility for the local PWV estimation than the TFP methods. Compared with the TFP methods, the RUT algorithms reduce the mean errors from 12.23% ± 3.10% to 7.13% ± 2.31%, as well as the CVs from 21.76% to 13.39%. In conclusion, the proposed RUT algorithms are superior to the TFP methods for local carotid PWV estimation.
Collapse
|
3
|
Banegas-Luna AJ, Imbernón B, Llanes Castro A, Pérez-Garrido A, Cerón-Carrasco JP, Gesing S, Merelli I, D'Agostino D, Pérez-Sánchez H. Advances in distributed computing with modern drug discovery. Expert Opin Drug Discov 2018; 14:9-22. [PMID: 30484337 DOI: 10.1080/17460441.2019.1552936] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
INTRODUCTION Computational chemistry dramatically accelerates the drug discovery process and high-performance computing (HPC) can be used to speed up the most expensive calculations. Supporting a local HPC infrastructure is both costly and time-consuming, and, therefore, many research groups are moving from in-house solutions to remote-distributed computing platforms. Areas covered: The authors focus on the use of distributed technologies, solutions, and infrastructures to gain access to HPC capabilities, software tools, and datasets to run the complex simulations required in computational drug discovery (CDD). Expert opinion: The use of computational tools can decrease the time to market of new drugs. HPC has a crucial role in handling the complex algorithms and large volumes of data required to achieve specificity and avoid undesirable side-effects. Distributed computing environments have clear advantages over in-house solutions in terms of cost and sustainability. The use of infrastructures relying on virtualization reduces set-up costs. Distributed computing resources can be difficult to access, although web-based solutions are becoming increasingly available. There is a trade-off between cost-effectiveness and accessibility in using on-demand computing resources rather than free/academic resources. Graphics processing unit computing, with its outstanding parallel computing power, is becoming increasingly important.
Collapse
Affiliation(s)
- Antonio Jesús Banegas-Luna
- a Bioinformatics and High Performance Computing Research Group (BIO-HPC) , Universidad Católica de Murcia (UCAM) , Murcia , Spain
| | - Baldomero Imbernón
- a Bioinformatics and High Performance Computing Research Group (BIO-HPC) , Universidad Católica de Murcia (UCAM) , Murcia , Spain
| | - Antonio Llanes Castro
- a Bioinformatics and High Performance Computing Research Group (BIO-HPC) , Universidad Católica de Murcia (UCAM) , Murcia , Spain
| | - Alfonso Pérez-Garrido
- a Bioinformatics and High Performance Computing Research Group (BIO-HPC) , Universidad Católica de Murcia (UCAM) , Murcia , Spain
| | - José Pedro Cerón-Carrasco
- a Bioinformatics and High Performance Computing Research Group (BIO-HPC) , Universidad Católica de Murcia (UCAM) , Murcia , Spain
| | - Sandra Gesing
- b Center for Research Computing , University of Notre Dame , Notre Dame , IN , USA
| | - Ivan Merelli
- c Institute for Biomedical Technologies , National Research Council of Italy , Segrate (Milan) , Italy
| | - Daniele D'Agostino
- d Institute for Applied Mathematics and Information Technologies "E. Magenes" , National Research Council of Italy , Genoa , Italy
| | - Horacio Pérez-Sánchez
- a Bioinformatics and High Performance Computing Research Group (BIO-HPC) , Universidad Católica de Murcia (UCAM) , Murcia , Spain
| |
Collapse
|
4
|
Shivashankar N, Patil S, Bhosle A, Chandra N, Natarajan V. MS3ALIGN: an efficient molecular surface aligner using the topology of surface curvature. BMC Bioinformatics 2016; 17:26. [PMID: 26753741 PMCID: PMC4710026 DOI: 10.1186/s12859-015-0874-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Accepted: 12/15/2015] [Indexed: 11/17/2022] Open
Abstract
Background Aligning similar molecular structures is an important step in the process of bio-molecular structure and function analysis. Molecular surfaces are simple representations of molecular structure that are easily constructed from various forms of molecular data such as 3D atomic coordinates (PDB) and Electron Microscopy (EM) data. Methods We present a Multi-Scale Morse-Smale Molecular-Surface Alignment tool, MS3ALIGN, which aligns molecular surfaces based on significant protrusions on the molecular surface. The input is a pair of molecular surfaces represented as triangle meshes. A key advantage of MS3ALIGN is computational efficiency that is achieved because it processes only a few carefully chosen protrusions on the molecular surface. Furthermore, the alignments are partial in nature and therefore allows for inexact surfaces to be aligned. Results The method is evaluated in four settings. First, we establish performance using known alignments with varying overlap and noise values. Second, we compare the method with SurfComp, an existing surface alignment method. We show that we are able to determine alignments reported by SurfComp, as well as report relevant alignments not found by SurfComp. Third, we validate the ability of MS3ALIGN to determine alignments in the case of structurally dissimilar binding sites. Fourth, we demonstrate the ability of MS3ALIGN to align iso-surfaces derived from cryo-electron microscopy scans. Conclusions We have presented an algorithm that aligns Molecular Surfaces based on the topology of surface curvature. A webserver and standalone software implementation of the algorithm available at http://vgl.serc.iisc.ernet.in/ms3align. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0874-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nithin Shivashankar
- Department of Computer Science and Automation, Indian Institute of Science, Bangalore, 560012, India.
| | - Sonali Patil
- Department of Computer Science and Automation, Indian Institute of Science, Bangalore, 560012, India
| | - Amrisha Bhosle
- Department of Biochemistry, Indian Institute of Science, Bangalore, 560012, India
| | - Nagasuma Chandra
- Department of Biochemistry, Indian Institute of Science, Bangalore, 560012, India
| | - Vijay Natarajan
- Department of Computer Science and Automation, and Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore, 560012, India.
| |
Collapse
|
5
|
Managing, analysing, and integrating big data in medical bioinformatics: open problems and future perspectives. BIOMED RESEARCH INTERNATIONAL 2014; 2014:134023. [PMID: 25254202 PMCID: PMC4165507 DOI: 10.1155/2014/134023] [Citation(s) in RCA: 95] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Accepted: 08/13/2014] [Indexed: 11/25/2022]
Abstract
The explosion of the data both in the biomedical research and in the healthcare systems demands urgent solutions. In particular, the research in omics sciences is moving from a hypothesis-driven to a data-driven approach. Healthcare is additionally always asking for a tighter integration with biomedical data in order to promote personalized medicine and to provide better treatments. Efficient analysis and interpretation of Big Data opens new avenues to explore molecular biology, new questions to ask about physiological and pathological states, and new ways to answer these open issues. Such analyses lead to better understanding of diseases and development of better and personalized diagnostics and therapeutics. However, such progresses are directly related to the availability of new solutions to deal with this huge amount of information. New paradigms are needed to store and access data, for its annotation and integration and finally for inferring knowledge and making it available to researchers. Bioinformatics can be viewed as the “glue” for all these processes. A clear awareness of present high performance computing (HPC) solutions in bioinformatics, Big Data analysis paradigms for computational biology, and the issues that are still open in the biomedical and healthcare fields represent the starting point to win this challenge.
Collapse
|
6
|
Cloud infrastructures for in silico drug discovery: economic and practical aspects. BIOMED RESEARCH INTERNATIONAL 2013; 2013:138012. [PMID: 24106693 PMCID: PMC3782806 DOI: 10.1155/2013/138012] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Revised: 06/26/2013] [Accepted: 06/27/2013] [Indexed: 11/17/2022]
Abstract
Cloud computing opens new perspectives for small-medium biotechnology laboratories that need to perform bioinformatics analysis in a flexible and effective way. This seems particularly true for hybrid clouds that couple the scalability offered by general-purpose public clouds with the greater control and ad hoc customizations supplied by the private ones. A hybrid cloud broker, acting as an intermediary between users and public providers, can support customers in the selection of the most suitable offers, optionally adding the provisioning of dedicated services with higher levels of quality. This paper analyses some economic and practical aspects of exploiting cloud computing in a real research scenario for the in silico drug discovery in terms of requirements, costs, and computational load based on the number of expected users. In particular, our work is aimed at supporting both the researchers and the cloud broker delivering an IaaS cloud infrastructure for biotechnology laboratories exposing different levels of nonfunctional requirements.
Collapse
|
7
|
von Behren MM, Volkamer A, Henzler AM, Schomburg KT, Urbaczek S, Rarey M. Fast protein binding site comparison via an index-based screening technology. J Chem Inf Model 2013; 53:411-22. [PMID: 23390978 DOI: 10.1021/ci300469h] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
We present TrixP, a new index-based method for fast protein binding site comparison and function prediction. TrixP determines binding site similarities based on the comparison of descriptors that encode pharmacophoric and spatial features. Therefore, it adopts the efficient core components of TrixX, a structure-based virtual screening technology for large compound libraries. TrixP expands this technology by new components in order to allow a screening of protein libraries. TrixP accounts for the inherent flexibility of proteins employing a partial shape matching routine. After the identification of structures with matching pharmacophoric features and geometric shape, TrixP superimposes the binding sites and, finally, assesses their similarity according to the fit of pharmacophoric properties. TrixP is able to find analogies between closely and distantly related binding sites. Recovery rates of 81.8% for similar binding site pairs, assisted by rejecting rates of 99.5% for dissimilar pairs on a test data set containing 1331 pairs, confirm this ability. TrixP exclusively identifies members of the same protein family on top ranking positions out of a library consisting of 9802 binding sites. Furthermore, 30 predicted kinase binding sites can almost perfectly be classified into their known subfamilies.
Collapse
Affiliation(s)
- Mathias M von Behren
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | | | | | | | | | | |
Collapse
|
8
|
Pang B, Zhao N, Korkin D, Shyu CR. Fast protein binding site comparisons using visual words representation. ACTA ACUST UNITED AC 2012; 28:1345-52. [PMID: 22492639 DOI: 10.1093/bioinformatics/bts138] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Finding geometrically similar protein binding sites is crucial for understanding protein functions and can provide valuable information for protein-protein docking and drug discovery. As the number of known protein-protein interaction structures has dramatically increased, a high-throughput and accurate protein binding site comparison method is essential. Traditional alignment-based methods can provide accurate correspondence between the binding sites but are computationally expensive. RESULTS In this article, we present a novel method for the comparisons of protein binding sites using a 'visual words' representation (PBSword). We first extract geometric features of binding site surfaces and build a vocabulary of visual words by clustering a large set of feature descriptors. We then describe a binding site surface with a high-dimensional vector that encodes the frequency of visual words, enhanced by the spatial relationships among them. Finally, we measure the similarity of binding sites by utilizing metric space operations, which provide speedy comparisons between protein binding sites. Our experimental results show that PBSword achieves a comparable classification accuracy to an alignment-based method and improves accuracy of a feature-based method by 36% on a non-redundant dataset. PBSword also exhibits a significant efficiency improvement over an alignment-based method.
Collapse
Affiliation(s)
- Bin Pang
- Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| | | | | | | |
Collapse
|