1
|
D’Agostino D, Liò P, Aldinucci M, Merelli I. Advantages of using graph databases to explore chromatin conformation capture experiments. BMC Bioinformatics 2021; 22:43. [PMID: 33902433 PMCID: PMC8073886 DOI: 10.1186/s12859-020-03937-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 12/15/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND High-throughput sequencing Chromosome Conformation Capture (Hi-C) allows the study of DNA interactions and 3D chromosome folding at the genome-wide scale. Usually, these data are represented as matrices describing the binary contacts among the different chromosome regions. On the other hand, a graph-based representation can be advantageous to describe the complex topology achieved by the DNA in the nucleus of eukaryotic cells. METHODS Here we discuss the use of a graph database for storing and analysing data achieved by performing Hi-C experiments. The main issue is the size of the produced data and, working with a graph-based representation, the consequent necessity of adequately managing a large number of edges (contacts) connecting nodes (genes), which represents the sources of information. For this, currently available graph visualisation tools and libraries fall short with Hi-C data. The use of graph databases, instead, supports both the analysis and the visualisation of the spatial pattern present in Hi-C data, in particular for comparing different experiments or for re-mapping omics data in a space-aware context efficiently. In particular, the possibility of describing graphs through statistical indicators and, even more, the capability of correlating them through statistical distributions allows highlighting similarities and differences among different Hi-C experiments, in different cell conditions or different cell types. RESULTS These concepts have been implemented in NeoHiC, an open-source and user-friendly web application for the progressive visualisation and analysis of Hi-C networks based on the use of the Neo4j graph database (version 3.5). CONCLUSION With the accumulation of more experiments, the tool will provide invaluable support to compare neighbours of genes across experiments and conditions, helping in highlighting changes in functional domains and identifying new co-organised genomic compartments.
Collapse
Affiliation(s)
- Daniele D’Agostino
- Institute of Electronics, Computer and Telecommunication Engineering, National Research Council of Italy, Genoa, Italy
| | - Pietro Liò
- Computer Laboratory, University of Cambridge, Cambridge, UK
| | - Marco Aldinucci
- Computer Science Department, University of Turin, Turin, Italy
| | - Ivan Merelli
- Institute for Biomedical Technologies, National Research Council of Italy, Segrate, MI Italy
| |
Collapse
|
2
|
Managing, analysing, and integrating big data in medical bioinformatics: open problems and future perspectives. BIOMED RESEARCH INTERNATIONAL 2014; 2014:134023. [PMID: 25254202 PMCID: PMC4165507 DOI: 10.1155/2014/134023] [Citation(s) in RCA: 95] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Accepted: 08/13/2014] [Indexed: 11/25/2022]
Abstract
The explosion of the data both in the biomedical research and in the healthcare systems demands urgent solutions. In particular, the research in omics sciences is moving from a hypothesis-driven to a data-driven approach. Healthcare is additionally always asking for a tighter integration with biomedical data in order to promote personalized medicine and to provide better treatments. Efficient analysis and interpretation of Big Data opens new avenues to explore molecular biology, new questions to ask about physiological and pathological states, and new ways to answer these open issues. Such analyses lead to better understanding of diseases and development of better and personalized diagnostics and therapeutics. However, such progresses are directly related to the availability of new solutions to deal with this huge amount of information. New paradigms are needed to store and access data, for its annotation and integration and finally for inferring knowledge and making it available to researchers. Bioinformatics can be viewed as the “glue” for all these processes. A clear awareness of present high performance computing (HPC) solutions in bioinformatics, Big Data analysis paradigms for computational biology, and the issues that are still open in the biomedical and healthcare fields represent the starting point to win this challenge.
Collapse
|
3
|
Morris L, Tsui A, Crichton C, Harris S, Maccallum PH, Howat WJ, Davies J, Brenton JD, Caldas C. A metadata-aware application for remote scoring and exchange of tissue microarray images. BMC Bioinformatics 2013; 14:147. [PMID: 23635078 PMCID: PMC3659093 DOI: 10.1186/1471-2105-14-147] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2013] [Accepted: 04/23/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The use of tissue microarrays (TMA) and advances in digital scanning microscopy has enabled the collection of thousands of tissue images. There is a need for software tools to annotate, query and share this data amongst researchers in different physical locations. RESULTS We have developed an open source web-based application for remote scoring of TMA images, which exploits the value of Microsoft Silverlight Deep Zoom to provide a intuitive interface for zooming and panning around digital images. We use and extend existing XML-based standards to ensure that the data collected can be archived and that our system is interoperable with other standards-compliant systems. CONCLUSION The application has been used for multi-centre scoring of TMA slides composed of tissues from several Phase III breast cancer trials and ten different studies participating in the International Breast Cancer Association Consortium (BCAC). The system has enabled researchers to simultaneously score large collections of TMA and export the standardised data to integrate with pathological and clinical outcome data, thereby facilitating biomarker discovery.
Collapse
Affiliation(s)
- Lorna Morris
- Department of Oncology, University of Cambridge and Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, Cambridge, CB2 0RE, UK
| | - Andrew Tsui
- Department of Computer Science, University of Oxford, Wolfson Building, Parks Road, Oxford, OX1 3QD, UK
| | - Charles Crichton
- Department of Computer Science, University of Oxford, Wolfson Building, Parks Road, Oxford, OX1 3QD, UK
| | - Steve Harris
- Department of Computer Science, University of Oxford, Wolfson Building, Parks Road, Oxford, OX1 3QD, UK
| | - Peter H Maccallum
- Department of Oncology, University of Cambridge and Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, Cambridge, CB2 0RE, UK
| | - William J Howat
- Department of Oncology, University of Cambridge and Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, Cambridge, CB2 0RE, UK
| | - Jim Davies
- Department of Computer Science, University of Oxford, Wolfson Building, Parks Road, Oxford, OX1 3QD, UK
| | - James D Brenton
- Department of Oncology, University of Cambridge and Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, Cambridge, CB2 0RE, UK
- Cambridge Experimental Cancer Medicine Centre, Li Ka Shing Centre, Cambridge, CB2 0RE, UK
- Addenbrooke’s Hospital, Cambridge University Hospital NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge, CB2 2QQ, UK
| | - Carlos Caldas
- Department of Oncology, University of Cambridge and Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, Cambridge, CB2 0RE, UK
- Cambridge Experimental Cancer Medicine Centre, Li Ka Shing Centre, Cambridge, CB2 0RE, UK
- Addenbrooke’s Hospital, Cambridge University Hospital NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge, CB2 2QQ, UK
| |
Collapse
|
4
|
Foran DJ, Yang L, Chen W, Hu J, Goodell LA, Reiss M, Wang F, Kurc T, Pan T, Sharma A, Saltz JH. ImageMiner: a software system for comparative analysis of tissue microarrays using content-based image retrieval, high-performance computing, and grid technology. J Am Med Inform Assoc 2011; 18:403-15. [PMID: 21606133 PMCID: PMC3128405 DOI: 10.1136/amiajnl-2011-000170] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2011] [Accepted: 04/09/2011] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVE AND DESIGN The design and implementation of ImageMiner, a software platform for performing comparative analysis of expression patterns in imaged microscopy specimens such as tissue microarrays (TMAs), is described. ImageMiner is a federated system of services that provides a reliable set of analytical and data management capabilities for investigative research applications in pathology. It provides a library of image processing methods, including automated registration, segmentation, feature extraction, and classification, all of which have been tailored, in these studies, to support TMA analysis. The system is designed to leverage high-performance computing machines so that investigators can rapidly analyze large ensembles of imaged TMA specimens. To support deployment in collaborative, multi-institutional projects, ImageMiner features grid-enabled, service-based components so that multiple instances of ImageMiner can be accessed remotely and federated. RESULTS The experimental evaluation shows that: (1) ImageMiner is able to support reliable detection and feature extraction of tumor regions within imaged tissues; (2) images and analysis results managed in ImageMiner can be searched for and retrieved on the basis of image-based features, classification information, and any correlated clinical data, including any metadata that have been generated to describe the specified tissue and TMA; and (3) the system is able to reduce computation time of analyses by exploiting computing clusters, which facilitates analysis of larger sets of tissue samples.
Collapse
Affiliation(s)
- David J Foran
- Center for Biomedical Imaging & Informatics, UMDNJ-Robert Wood Johnson Medical School, New Brunswick, New Jersey, USA
- The Cancer Institute of New Jersey, University of Medicine and Dentistry of New Jersey, New Brunswick, New Jersey, USA
| | - Lin Yang
- Center for Biomedical Imaging & Informatics, UMDNJ-Robert Wood Johnson Medical School, New Brunswick, New Jersey, USA
| | - Wenjin Chen
- Center for Biomedical Imaging & Informatics, UMDNJ-Robert Wood Johnson Medical School, New Brunswick, New Jersey, USA
- The Cancer Institute of New Jersey, University of Medicine and Dentistry of New Jersey, New Brunswick, New Jersey, USA
| | - Jun Hu
- Center for Biomedical Imaging & Informatics, UMDNJ-Robert Wood Johnson Medical School, New Brunswick, New Jersey, USA
- The Cancer Institute of New Jersey, University of Medicine and Dentistry of New Jersey, New Brunswick, New Jersey, USA
| | - Lauri A Goodell
- Center for Biomedical Imaging & Informatics, UMDNJ-Robert Wood Johnson Medical School, New Brunswick, New Jersey, USA
| | - Michael Reiss
- The Cancer Institute of New Jersey, University of Medicine and Dentistry of New Jersey, New Brunswick, New Jersey, USA
| | - Fusheng Wang
- Center for Comprehensive Informatics, Emory University School of Medicine, Atlanta, Georgia, USA
| | - Tahsin Kurc
- Center for Comprehensive Informatics, Emory University School of Medicine, Atlanta, Georgia, USA
- Department of Biomedical Engineering, Emory University, Atlanta, Georgia, USA
| | - Tony Pan
- Center for Comprehensive Informatics, Emory University School of Medicine, Atlanta, Georgia, USA
| | - Ashish Sharma
- Center for Comprehensive Informatics, Emory University School of Medicine, Atlanta, Georgia, USA
- Department of Biomedical Engineering, Emory University, Atlanta, Georgia, USA
| | - Joel H Saltz
- Center for Comprehensive Informatics, Emory University School of Medicine, Atlanta, Georgia, USA
| |
Collapse
|
5
|
Viti F, Merelli I, Timmermans M, den Bakker M, Beltrame F, Riegman P, Milanesi L. Semi-automatic identification of punching areas for tissue microarray building: the tubular breast cancer pilot study. BMC Bioinformatics 2010; 11:566. [PMID: 21087464 PMCID: PMC2996409 DOI: 10.1186/1471-2105-11-566] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2009] [Accepted: 11/18/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Tissue MicroArray technology aims to perform immunohistochemical staining on hundreds of different tissue samples simultaneously. It allows faster analysis, considerably reducing costs incurred in staining. A time consuming phase of the methodology is the selection of tissue areas within paraffin blocks: no utilities have been developed for the identification of areas to be punched from the donor block and assembled in the recipient block. RESULTS The presented work supports, in the specific case of a primary subtype of breast cancer (tubular breast cancer), the semi-automatic discrimination and localization between normal and pathological regions within the tissues. The diagnosis is performed by analysing specific morphological features of the sample such as the absence of a double layer of cells around the lumen and the decay of a regular glands-and-lobules structure. These features are analysed using an algorithm which performs the extraction of morphological parameters from images and compares them to experimentally validated threshold values. Results are satisfactory since in most of the cases the automatic diagnosis matches the response of the pathologists. In particular, on a total of 1296 sub-images showing normal and pathological areas of breast specimens, algorithm accuracy, sensitivity and specificity are respectively 89%, 84% and 94%. CONCLUSIONS The proposed work is a first attempt to demonstrate that automation in the Tissue MicroArray field is feasible and it can represent an important tool for scientists to cope with this high-throughput technique.
Collapse
Affiliation(s)
- Federica Viti
- Institute for Biomedical Technologies of the National Research Council, Segrate (Milan), Italy
| | - Ivan Merelli
- Institute for Biomedical Technologies of the National Research Council, Segrate (Milan), Italy
| | - Mieke Timmermans
- Department of Pathology of the Josephine Nefkens Institute, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Michael den Bakker
- Department of Pathology of the Josephine Nefkens Institute, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Francesco Beltrame
- University of Genoa, Department of of Communication Computer and System Sciences, Genoa, Italy
| | - Peter Riegman
- Department of Pathology of the Josephine Nefkens Institute, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Luciano Milanesi
- Institute for Biomedical Technologies of the National Research Council, Segrate (Milan), Italy
| |
Collapse
|
6
|
Song YS, Lee HW, Park YR, Kim DK, Sim J, Kang HP, Kim JH. TMA-TAB: a spreadsheet-based document for exchange of tissue microarray data based on the tissue microarray-object model. J Biomed Inform 2009; 43:435-41. [PMID: 19835983 DOI: 10.1016/j.jbi.2009.10.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2009] [Revised: 10/05/2009] [Accepted: 10/07/2009] [Indexed: 10/20/2022]
Abstract
The importance of tissue microarrays (TMA) as clinical validation tools for cDNA microarray results is increasing, whereas researchers are still suffering from TMA data management issues. After we developed a comprehensive data model for TMA data storage, exchange and analysis, TMA-OM, we focused our attention on the development of a user-friendly exchange format with high expressivity in order to promote data communication of TMA results and TMA-OM supportive database applications. We developed TMA-TAB, a spreadsheet-based data format for TMA data submission to the TMA-OM supportive TMA database system. TMA-TAB was developed by simplifying, modifying and reorganizing classes, attributes and templates of TMA-OM into five entities: experiment, block, slide, core_in_block, and core_in_slide. Five tab-delimited formats (investigation design format, block description format, slide description format, core clinicohistopathological data format, and core result data format) were made, each representing the entities of experiment, block, slide, core_in_block, and core_in_slide. We implemented TMA-TAB import and export modules on Xperanto-TMA, a TMA-OM supportive database application, to facilitate data submission. Development and implementation of TMA-TAB and TMA-OM provide a strong infrastructure for powerful and user-friendly TMA data management.
Collapse
Affiliation(s)
- Young Soo Song
- Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul, Republic of Korea
| | | | | | | | | | | | | |
Collapse
|
7
|
Cannata N, Schröder M, Marangoni R, Romano P. A Semantic Web for bioinformatics: goals, tools, systems, applications. BMC Bioinformatics 2008; 9 Suppl 4:S1. [PMID: 18460170 PMCID: PMC2367628 DOI: 10.1186/1471-2105-9-s4-s1] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Affiliation(s)
- Nicola Cannata
- Department of Mathematics and Computer Science, University of Camerino, Camerino (MC), I-62032, Italy
| | | | - Roberto Marangoni
- Computer Science Department, University of Pisa, Pisa, I-56127, Italy
| | - Paolo Romano
- Bioinformatics, National Cancer Research Institute, Genova, I-16132, Italy
| |
Collapse
|