1
|
Abinaya E, Narang P, Bhardwaj A. FROG - Fingerprinting Genomic Variation Ontology. PLoS One 2015; 10:e0134693. [PMID: 26244889 PMCID: PMC4526677 DOI: 10.1371/journal.pone.0134693] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Accepted: 07/13/2015] [Indexed: 11/19/2022] Open
Abstract
Genetic variations play a crucial role in differential phenotypic outcomes. Given the complexity in establishing this correlation and the enormous data available today, it is imperative to design machine-readable, efficient methods to store, label, search and analyze this data. A semantic approach, FROG: “FingeRprinting Ontology of Genomic variations” is implemented to label variation data, based on its location, function and interactions. FROG has six levels to describe the variation annotation, namely, chromosome, DNA, RNA, protein, variations and interactions. Each level is a conceptual aggregation of logically connected attributes each of which comprises of various properties for the variant. For example, in chromosome level, one of the attributes is location of variation and which has two properties, allosomes or autosomes. Another attribute is variation kind which has four properties, namely, indel, deletion, insertion, substitution. Likewise, there are 48 attributes and 278 properties to capture the variation annotation across six levels. Each property is then assigned a bit score which in turn leads to generation of a binary fingerprint based on the combination of these properties (mostly taken from existing variation ontologies). FROG is a novel and unique method designed for the purpose of labeling the entire variation data generated till date for efficient storage, search and analysis. A web-based platform is designed as a test case for users to navigate sample datasets and generate fingerprints. The platform is available at http://ab-openlab.csir.res.in/frog.
Collapse
Affiliation(s)
- E. Abinaya
- Department of Bioinformatics, SASTRA University, Thanjavur, Tamil Nadu, India
| | - Pankaj Narang
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Anshu Bhardwaj
- Open Source Drug Discovery Unit, Council of Scientific and Industrial Research (CSIR), Anusandhan Bhawan, 2 Rafi Marg, New Delhi, 110001, India
- * E-mail:
| |
Collapse
|
2
|
Rajput NK, Singh V, Bhardwaj A. Resources, challenges and way forward in rare mitochondrial diseases research. F1000Res 2015; 4:70. [PMID: 26180633 PMCID: PMC4490798 DOI: 10.12688/f1000research.6208.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/10/2015] [Indexed: 12/19/2022] Open
Abstract
Over 300 million people are affected by about 7000 rare diseases globally. There are tremendous resource limitations and challenges in driving research and drug development for rare diseases. Hence, innovative approaches are needed to identify potential solutions. This review focuses on the resources developed over the past years for analysis of genome data towards understanding disease biology especially in the context of mitochondrial diseases, given that mitochondria are central to major cellular pathways and their dysfunction leads to a broad spectrum of diseases. Platforms for collaboration of research groups, clinicians and patients and the advantages of community collaborative efforts in addressing rare diseases are also discussed. The review also describes crowdsourcing and crowdfunding efforts in rare diseases research and how the upcoming initiatives for understanding disease biology including analyses of large number of genomes are also applicable to rare diseases.
Collapse
Affiliation(s)
- Neeraj Kumar Rajput
- Open Source Drug Discovery (OSDD) Unit, Council of Scientific and Industrial Research, New Delhi, 110001, India
| | - Vipin Singh
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201301, India
| | - Anshu Bhardwaj
- Open Source Drug Discovery (OSDD) Unit, Council of Scientific and Industrial Research, New Delhi, 110001, India
| |
Collapse
|
3
|
Rajput NK, Singh V, Bhardwaj A. Resources, challenges and way forward in rare mitochondrial diseases research. F1000Res 2015; 4:70. [PMID: 26180633 DOI: 10.12688/f1000research.6208.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/13/2015] [Indexed: 12/27/2022] Open
Abstract
Over 300 million people are affected by about 7000 rare diseases globally. There are tremendous resource limitations and challenges in driving research and drug development for rare diseases. Hence, innovative approaches are needed to identify potential solutions. This review focuses on the resources developed over the past years for analysis of genome data towards understanding disease biology especially in the context of mitochondrial diseases, given that mitochondria are central to major cellular pathways and their dysfunction leads to a broad spectrum of diseases. Platforms for collaboration of research groups, clinicians and patients and the advantages of community collaborative efforts in addressing rare diseases are also discussed. The review also describes crowdsourcing and crowdfunding efforts in rare diseases research and how the upcoming initiatives for understanding disease biology including analyses of large number of genomes are also applicable to rare diseases.
Collapse
Affiliation(s)
- Neeraj Kumar Rajput
- Open Source Drug Discovery (OSDD) Unit, Council of Scientific and Industrial Research, New Delhi, 110001, India
| | - Vipin Singh
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201301, India
| | - Anshu Bhardwaj
- Open Source Drug Discovery (OSDD) Unit, Council of Scientific and Industrial Research, New Delhi, 110001, India
| |
Collapse
|
4
|
Byrne M, Fokkema IF, Lancaster O, Adamusiak T, Ahonen-Bishopp A, Atlan D, Béroud C, Cornell M, Dalgleish R, Devereau A, Patrinos GP, Swertz MA, Taschner PE, Thorisson GA, Vihinen M, Brookes AJ, Muilu J. VarioML framework for comprehensive variation data representation and exchange. BMC Bioinformatics 2012; 13:254. [PMID: 23031277 PMCID: PMC3507772 DOI: 10.1186/1471-2105-13-254] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2012] [Accepted: 09/23/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Sharing of data about variation and the associated phenotypes is a critical need, yet variant information can be arbitrarily complex, making a single standard vocabulary elusive and re-formatting difficult. Complex standards have proven too time-consuming to implement. RESULTS The GEN2PHEN project addressed these difficulties by developing a comprehensive data model for capturing biomedical observations, Observ-OM, and building the VarioML format around it. VarioML pairs a simplified open specification for describing variants, with a toolkit for adapting the specification into one's own research workflow. Straightforward variant data can be captured, federated, and exchanged with no overhead; more complex data can be described, without loss of compatibility. The open specification enables push-button submission to gene variant databases (LSDBs) e.g., the Leiden Open Variation Database, using the Cafe Variome data publishing service, while VarioML bidirectionally transforms data between XML and web-application code formats, opening up new possibilities for open source web applications building on shared data. A Java implementation toolkit makes VarioML easily integrated into biomedical applications. VarioML is designed primarily for LSDB data submission and transfer scenarios, but can also be used as a standard variation data format for JSON and XML document databases and user interface components. CONCLUSIONS VarioML is a set of tools and practices improving the availability, quality, and comprehensibility of human variation information. It enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity.
Collapse
Affiliation(s)
- Myles Byrne
- Institute for Molecular Medicine Finland-FIMM, University of Helsinki, Helsinki, Finland.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Adamusiak T, Parkinson H, Muilu J, Roos E, van der Velde KJ, Thorisson GA, Byrne M, Pang C, Gollapudi S, Ferretti V, Hillege H, Brookes AJ, Swertz MA. Observ-OM and Observ-TAB: Universal syntax solutions for the integration, search, and exchange of phenotype and genotype information. Hum Mutat 2012; 33:867-73. [DOI: 10.1002/humu.22070] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Accepted: 02/22/2012] [Indexed: 11/12/2022]
|
6
|
Vihinen M, den Dunnen JT, Dalgleish R, Cotton RGH. Guidelines for establishing locus specific databases. Hum Mutat 2011; 33:298-305. [PMID: 22052659 DOI: 10.1002/humu.21646] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2011] [Accepted: 10/25/2011] [Indexed: 11/06/2022]
Abstract
Information about genetic variation has been collected for some 20 years into registries, known as locus specific databases (LSDBs), which nowadays often contain information in addition to the actual genetic variation. Several issues have to be taken into account when considering establishing and maintaining LSDBs and these have been discussed previously in a number of articles describing guidelines and recommendations. This information is widely scattered and, for a newcomer, it would be difficult to obtain the latest information and guidance. Here, a sequence of steps essential for establishing an LSDB is discussed together with guidelines for each step. Curators need to collect information from various sources, code it in systematic way, and distribute to the research and clinical communities. In doing this, ethical issues have to be taken into account. To facilitate integration of information to, for example, analyze genotype-phenotype correlations, systematic data representation using established nomenclatures, data models, and ontologies is essential. LSDB curation and maintenance comprises a number of tasks that can be managed by following logical steps. These resources are becoming ever more important and new curators are essential to ensure that we will have expertly curated databases for all disease-related genes in the near future.
Collapse
Affiliation(s)
- Mauno Vihinen
- Institute of Biomedical Technology, University of Tampere, Finland.
| | | | | | | |
Collapse
|
7
|
Clarity and claims in variation/mutation databasing. Nat Biotechnol 2011; 29:790-2; author reply 792-4. [DOI: 10.1038/nbt.1961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
8
|
Webb AJ, Thorisson GA, Brookes AJ. An informatics project and online "Knowledge Centre" supporting modern genotype-to-phenotype research. Hum Mutat 2011; 32:543-50. [PMID: 21438073 DOI: 10.1002/humu.21469] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2011] [Accepted: 01/28/2011] [Indexed: 11/06/2022]
Abstract
Explosive growth in the generation of genotype-to-phenotype (G2P) data necessitates a concerted effort to tackle the logistical and informatics challenges this presents. The GEN2PHEN Project represents one such effort, with a broad strategy of uniting disparate G2P resources into a hybrid centralized-federated network. This is achieved through a holistic strategy focussed on three overlapping areas: data input standards and pipelines through which to submit and collect data (data in); federated, independent, extendable, yet interoperable database platforms on which to store and curate widely diverse datasets (data storage); and data formats and mechanisms with which to exchange, combine, and extract data (data exchange and output). To fully leverage this data network, we have constructed the "G2P Knowledge Centre" (http://www.gen2phen.org). This central platform provides holistic searching of the G2P data domain allied with facilities for data annotation and user feedback, access to extensive G2P and informatics resources, and tools for constructing online working communities centered on the G2P domain. Through the efforts of GEN2PHEN, and through combining data with broader community-derived knowledge, the Knowledge Centre opens up exciting possibilities for organizing, integrating, sharing, and interpreting new waves of G2P data in a collaborative fashion.
Collapse
Affiliation(s)
- Adam J Webb
- Department of Genetics, University of Leicester, University Road, Leicester, United Kingdom.
| | | | | | | |
Collapse
|
9
|
Maier D, Kalus W, Wolff M, Kalko SG, Roca J, Marin de Mas I, Turan N, Cascante M, Falciani F, Hernandez M, Villà-Freixa J, Losko S. Knowledge management for systems biology a general and visually driven framework applied to translational medicine. BMC SYSTEMS BIOLOGY 2011; 5:38. [PMID: 21375767 PMCID: PMC3060864 DOI: 10.1186/1752-0509-5-38] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2010] [Accepted: 03/05/2011] [Indexed: 12/21/2022]
Abstract
BACKGROUND To enhance our understanding of complex biological systems like diseases we need to put all of the available data into context and use this to detect relations, pattern and rules which allow predictive hypotheses to be defined. Life science has become a data rich science with information about the behaviour of millions of entities like genes, chemical compounds, diseases, cell types and organs, which are organised in many different databases and/or spread throughout the literature. Existing knowledge such as genotype-phenotype relations or signal transduction pathways must be semantically integrated and dynamically organised into structured networks that are connected with clinical and experimental data. Different approaches to this challenge exist but so far none has proven entirely satisfactory. RESULTS To address this challenge we previously developed a generic knowledge management framework, BioXM™, which allows the dynamic, graphic generation of domain specific knowledge representation models based on specific objects and their relations supporting annotations and ontologies. Here we demonstrate the utility of BioXM for knowledge management in systems biology as part of the EU FP6 BioBridge project on translational approaches to chronic diseases. From clinical and experimental data, text-mining results and public databases we generate a chronic obstructive pulmonary disease (COPD) knowledge base and demonstrate its use by mining specific molecular networks together with integrated clinical and experimental data. CONCLUSIONS We generate the first semantically integrated COPD specific public knowledge base and find that for the integration of clinical and experimental data with pre-existing knowledge the configuration based set-up enabled by BioXM reduced implementation time and effort for the knowledge base compared to similar systems implemented as classical software development projects. The knowledgebase enables the retrieval of sub-networks including protein-protein interaction, pathway, gene--disease and gene--compound data which are used for subsequent data analysis, modelling and simulation. Pre-structured queries and reports enhance usability; establishing their use in everyday clinical settings requires further simplification with a browser based interface which is currently under development.
Collapse
Affiliation(s)
| | | | | | - Susana G Kalko
- Hospital Clinic-IDIBAPS-CIBERES, Universitat de Barcelona, Barcelona, Spain
| | - Josep Roca
- Hospital Clinic-IDIBAPS-CIBERES, Universitat de Barcelona, Barcelona, Spain
| | - Igor Marin de Mas
- Departament de Bioquimica i Biologia Molecular, Institut de Biomedicina at Universitat de Barcelona IBUB and IDIBAPS-Hospital Clinic, Barcelona, Spain
| | - Nil Turan
- School of Biosciences and Institute of Biomedical Research (IBR), University of Birmingham, Birmingham, UK
| | - Marta Cascante
- Departament de Bioquimica i Biologia Molecular, Institut de Biomedicina at Universitat de Barcelona IBUB and IDIBAPS-Hospital Clinic, Barcelona, Spain
| | - Francesco Falciani
- School of Biosciences and Institute of Biomedical Research (IBR), University of Birmingham, Birmingham, UK
| | - Miguel Hernandez
- Computational Biochemistry and Biophysics lab, Research Unit on Biomedical Informatics (GRIB) of IMIM/UPF, Parc de Recerca Biomdica de Barcelona (PRBB); Barcelona, Spain
| | - Jordi Villà-Freixa
- Computational Biochemistry and Biophysics lab, Research Unit on Biomedical Informatics (GRIB) of IMIM/UPF, Parc de Recerca Biomdica de Barcelona (PRBB); Barcelona, Spain
| | | |
Collapse
|
10
|
Chervitz SA, Deutsch EW, Field D, Parkinson H, Quackenbush J, Rocca-Serra P, Sansone SA, Stoeckert CJ, Taylor CF, Taylor R, Ball CA. Data standards for Omics data: the basis of data sharing and reuse. Methods Mol Biol 2011; 719:31-69. [PMID: 21370078 PMCID: PMC4152841 DOI: 10.1007/978-1-61779-027-0_2] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
To facilitate sharing of Omics data, many groups of scientists have been working to establish the relevant data standards. The main components of data sharing standards are experiment description standards, data exchange standards, terminology standards, and experiment execution standards. Here we provide a survey of existing and emerging standards that are intended to assist the free and open exchange of large-format data.
Collapse
|
11
|
Howard HJ, Horaitis O, Cotton RGH, Vihinen M, Dalgleish R, Robinson P, Brookes AJ, Axton M, Hoffmann R, Tuffery-Giraud S. The Human Variome Project (HVP) 2009 Forum "Towards Establishing Standards". Hum Mutat 2010; 31:366-7. [PMID: 20052753 DOI: 10.1002/humu.21175] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The May 2009 Human Variome Project (HVP) Forum "Towards Establishing Standards" was a round table discussion attended by delegates from groups representing international efforts aimed at standardizing several aspects of the HVP: mutation nomenclature, description and annotation, clinical ontology, means to better characterize unclassified variants (UVs), and methods to capture mutations from diagnostic laboratories for broader distribution to the medical genetics research community. Methods for researchers to receive credit for their effort at mutation detection were also discussed.
Collapse
Affiliation(s)
- Heather J Howard
- Genomic Disorders Research Centre, Carlton South, Victoria, Australia.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Genotype-phenotype databases: challenges and solutions for the post-genomic era. Nat Rev Genet 2009; 10:9-18. [PMID: 19065136 DOI: 10.1038/nrg2483] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The flow of research data concerning the genetic basis of health and disease is rapidly increasing in speed and complexity. In response, many projects are seeking to ensure that there are appropriate informatics tools, systems and databases available to manage and exploit this flood of information. Previous solutions, such as central databases, journal-based publication and manually intensive data curation, are now being enhanced with new systems for federated databases, database publication, and more automated management of data flows and quality control. Along with emerging technologies that enhance connectivity and data retrieval, these advances should help to create a powerful knowledge environment for genotype-phenotype information.
Collapse
|