1
|
Callaghan J, Xu CH, Xin J, Cano MA, Riutta A, Zhou E, Juneja R, Yao Y, Narayan M, Hanspers K, Agrawal A, Pico AR, Wu C, Su AI. BioThings Explorer: a query engine for a federated knowledge graph of biomedical APIs. Bioinformatics 2023; 39:7273783. [PMID: 37707514 PMCID: PMC11015316 DOI: 10.1093/bioinformatics/btad570] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 08/18/2023] [Accepted: 09/12/2023] [Indexed: 09/15/2023] Open
Abstract
SUMMARY Knowledge graphs are an increasingly common data structure for representing biomedical information. These knowledge graphs can easily represent heterogeneous types of information, and many algorithms and tools exist for querying and analyzing graphs. Biomedical knowledge graphs have been used in a variety of applications, including drug repurposing, identification of drug targets, prediction of drug side effects, and clinical decision support. Typically, knowledge graphs are constructed by centralization and integration of data from multiple disparate sources. Here, we describe BioThings Explorer, an application that can query a virtual, federated knowledge graph derived from the aggregated information in a network of biomedical web services. BioThings Explorer leverages semantically precise annotations of the inputs and outputs for each resource, and automates the chaining of web service calls to execute multi-step graph queries. Because there is no large, centralized knowledge graph to maintain, BioThings Explorer is distributed as a lightweight application that dynamically retrieves information at query time. AVAILABILITY AND IMPLEMENTATION More information can be found at https://explorer.biothings.io and code is available at https://github.com/biothings/biothings_explorer.
Collapse
Affiliation(s)
- Jackson Callaghan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Colleen H Xu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Jiwen Xin
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Marco Alvarado Cano
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Anders Riutta
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, CA 94158, United States
| | - Eric Zhou
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Rohan Juneja
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Yao Yao
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Madhumita Narayan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Kristina Hanspers
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, CA 94158, United States
| | - Ayushi Agrawal
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, CA 94158, United States
| | - Alexander R Pico
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, CA 94158, United States
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| |
Collapse
|
2
|
Callaghan J, Xu CH, Xin J, Cano MA, Riutta A, Zhou E, Juneja R, Yao Y, Narayan M, Hanspers K, Agrawal A, Pico AR, Wu C, Su AI. BioThings Explorer: a query engine for a federated knowledge graph of biomedical APIs. ARXIV 2023:arXiv:2304.09344v1. [PMID: 37131885 PMCID: PMC10153288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Knowledge graphs are an increasingly common data structure for representing biomedical information. These knowledge graphs can easily represent heterogeneous types of information, and many algorithms and tools exist for querying and analyzing graphs. Biomedical knowledge graphs have been used in a variety of applications, including drug repurposing, identification of drug targets, prediction of drug side effects, and clinical decision support. Typically, knowledge graphs are constructed by centralization and integration of data from multiple disparate sources. Here, we describe BioThings Explorer, an application that can query a virtual, federated knowledge graph derived from the aggregated information in a network of biomedical web services. BioThings Explorer leverages semantically precise annotations of the inputs and outputs for each resource, and automates the chaining of web service calls to execute multi-step graph queries. Because there is no large, centralized knowledge graph to maintain, BioThing Explorer is distributed as a lightweight application that dynamically retrieves information at query time. More information can be found at https://explorer.biothings.io, and code is available at https://github.com/biothings/biothings_explorer.
Collapse
Affiliation(s)
- Jackson Callaghan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Colleen H Xu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Jiwen Xin
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Marco Alvarado Cano
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Anders Riutta
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Eric Zhou
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Rohan Juneja
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Yao Yao
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Madhumita Narayan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Kristina Hanspers
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Ayushi Agrawal
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Alexander R Pico
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| |
Collapse
|
3
|
Raveendran K, Freese NH, Kintali C, Tiwari S, Bole P, Dias C, Loraine AE. BioViz Connect: Web Application Linking CyVerse Cloud Resources to Genomic Visualization in the Integrated Genome Browser. FRONTIERS IN BIOINFORMATICS 2022; 2:764619. [PMID: 36304269 PMCID: PMC9580933 DOI: 10.3389/fbinf.2022.764619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 04/28/2022] [Indexed: 11/19/2022] Open
Abstract
Genomics researchers do better work when they can interactively explore and visualize data. Due to the vast size of experimental datasets, researchers are increasingly using powerful, cloud-based systems to process and analyze data. These remote systems, called science gateways, offer user-friendly, Web-based access to high performance computing and storage resources, but typically lack interactive visualization capability. In this paper, we present BioViz Connect, a middleware Web application that links CyVerse science gateway resources to the Integrated Genome Browser (IGB), a highly interactive native application implemented in Java that runs on the user's personal computer. Using BioViz Connect, users can 1) stream data from the CyVerse data store into IGB for visualization, 2) improve the IGB user experience for themselves and others by adding IGB specific metadata to CyVerse data files, including genome version and track appearance, and 3) run compute-intensive visual analytics functions on CyVerse infrastructure to create new datasets for visualization in IGB or other applications. To demonstrate how BioViz Connect facilitates interactive data visualization, we describe an example RNA-Seq data analysis investigating how heat and desiccation stresses affect gene expression in the model plant Arabidopsis thaliana. The RNA-Seq use case illustrates how interactive visualization with IGB can help a user identify problematic experimental samples, sanity-check results using a positive control, and create new data files for interactive visualization in IGB (or other tools) using a Docker image deployed to CyVerse via the Terrain API. Lastly, we discuss limitations of the technologies used and suggest opportunities for future work. BioViz Connect is available from https://bioviz.org.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Ann E. Loraine
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States
| |
Collapse
|
4
|
Irshad O, Ghani Khan MU. Formalization and Semantic Integration of Heterogeneous Omics Annotations for Exploratory Searches. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200127122818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Aim:
To facilitate researchers and practitioners for unveiling the mysterious functional aspects of human cellular system through performing exploratory searching on semantically integrated heterogeneous and geographically dispersed omics annotations.
Background:
Improving health standards of life is one of the motives which continuously instigates researchers and practitioners to strive for uncovering the mysterious aspects of human cellular system. Inferring new knowledge from known facts always requires reasonably large amount of data in well-structured, integrated and unified form. Due to the advent of especially high throughput and sensor technologies, biological data is growing heterogeneously and geographically at astronomical rate. Several data integration systems have been deployed to cope with the issues of data heterogeneity and global dispersion. Systems based on semantic data integration models are more flexible and expandable than syntax-based ones but still lack aspect-based data integration, persistence and querying. Furthermore, these systems do not fully support to warehouse biological entities in the form of semantic associations as naturally possessed by the human cell.
Objective:
To develop aspect-oriented formal data integration model for semantically integrating heterogeneous and geographically dispersed omics annotations for providing exploratory querying on integrated data.
Method:
We propose an aspect-oriented formal data integration model which uses web semantics standards to formally specify its each construct. Proposed model supports aspect-oriented representation of biological entities while addressing the issues of data heterogeneity and global dispersion. It associates and warehouses biological entities in the way they relate with
Result:
To show the significance of proposed model, we developed a data warehouse and information retrieval system based on proposed model compliant multi-layered and multi-modular software architecture. Results show that our model supports well for gathering, associating, integrating, persisting and querying each entity with respect to its all possible aspects within or across the various associated omics layers.
Conclusion:
Formal specifications better facilitate for addressing data integration issues by providing formal means for understanding omics data based on meaning instead of syntax
Collapse
Affiliation(s)
- Omer Irshad
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, The University of Engineering and Technology, Lahore,Pakistan
| | - Muhammad Usman Ghani Khan
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, The University of Engineering and Technology, Lahore,Pakistan
| |
Collapse
|
5
|
Selby P, Abbeloos R, Backlund JE, Basterrechea Salido M, Bauchet G, Benites-Alfaro OE, Birkett C, Calaminos VC, Carceller P, Cornut G, Vasques Costa B, Edwards JD, Finkers R, Yanxin Gao S, Ghaffar M, Glaser P, Guignon V, Hok P, Kilian A, König P, Lagare JEB, Lange M, Laporte MA, Larmande P, LeBauer DS, Lyon DA, Marshall DS, Matthews D, Milne I, Mistry N, Morales N, Mueller LA, Neveu P, Papoutsoglou E, Pearce B, Perez-Masias I, Pommier C, Ramírez-González RH, Rathore A, Raquel AM, Raubach S, Rife T, Robbins K, Rouard M, Sarma C, Scholz U, Sempéré G, Shaw PD, Simon R, Soldevilla N, Stephen G, Sun Q, Tovar C, Uszynski G, Verouden M. BrAPI-an application programming interface for plant breeding applications. Bioinformatics 2020; 35:4147-4155. [PMID: 30903186 PMCID: PMC6792114 DOI: 10.1093/bioinformatics/btz190] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 11/23/2018] [Accepted: 03/20/2019] [Indexed: 12/04/2022] Open
Abstract
Motivation Modern genomic breeding methods rely heavily on very large amounts of phenotyping and genotyping data, presenting new challenges in effective data management and integration. Recently, the size and complexity of datasets have increased significantly, with the result that data are often stored on multiple systems. As analyses of interest increasingly require aggregation of datasets from diverse sources, data exchange between disparate systems becomes a challenge. Results To facilitate interoperability among breeding applications, we present the public plant Breeding Application Programming Interface (BrAPI). BrAPI is a standardized web service API specification. The development of BrAPI is a collaborative, community-based initiative involving a growing global community of over a hundred participants representing several dozen institutions and companies. Development of such a standard is recognized as critical to a number of important large breeding system initiatives as a foundational technology. The focus of the first version of the API is on providing services for connecting systems and retrieving basic breeding data including germplasm, study, observation, and marker data. A number of BrAPI-enabled applications, termed BrAPPs, have been written, that take advantage of the emerging support of BrAPI by many databases. Availability and implementation More information on BrAPI, including links to the specification, test suites, BrAPPs, and sample implementations is available at https://brapi.org/. The BrAPI specification and the developer tools are provided as free and open source.
Collapse
Affiliation(s)
- Peter Selby
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, New York, USA
| | | | | | | | | | - Omar E Benites-Alfaro
- International Potato Center (CIP), Lima, Peru.,International Food Policy Research Institute (IFPRI), Washington DC, USA
| | | | - Viana C Calaminos
- International Rice Research Institute (IRRI), Los Baños, Laguna, The Philippines
| | - Pierre Carceller
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
| | | | | | | | - Richard Finkers
- Department of Plant Breeding, Wageningen University & Research, Wageningen, The Netherlands
| | - Star Yanxin Gao
- Institute of Biotechnology, Cornell University, Ithaca, New York, USA
| | - Mehmood Ghaffar
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| | - Philip Glaser
- Institute of Biotechnology, Cornell University, Ithaca, New York, USA
| | | | - Puthick Hok
- Diversity Arrays Technology, Bruce, Australia
| | | | - Patrick König
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| | | | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| | | | | | - David S LeBauer
- College of Agricultural and Life Sciences, The University of Arizona, Tucson, AZ, USA
| | | | - David S Marshall
- Information & Computational Sciences, The James Hutton Institute, Dundee, UK.,SRUC, Edinburgh, UK
| | | | - Iain Milne
- Information & Computational Sciences, The James Hutton Institute, Dundee, UK
| | | | | | | | - Pascal Neveu
- MISTEA, INRA, Montpellier SupAgro, Universite de Montpellier, Montpellier, France
| | - Evangelia Papoutsoglou
- Department of Plant Breeding, Wageningen University & Research, Wageningen, The Netherlands
| | | | | | - Cyril Pommier
- URGI, INRA, Université Paris-Saclay, Versailles, France
| | | | - Abhishek Rathore
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Angel Manica Raquel
- International Rice Research Institute (IRRI), Los Baños, Laguna, The Philippines
| | - Sebastian Raubach
- Information & Computational Sciences, The James Hutton Institute, Dundee, UK
| | - Trevor Rife
- Department of Plant Pathology, Kansas State University, Manhattan, KS, USA
| | - Kelly Robbins
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, New York, USA
| | | | - Chaitanya Sarma
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
| | - Guilhem Sempéré
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France.,INTERTRYP, Univ Montpellier, CIRAD, IRD, Montpellier, France
| | - Paul D Shaw
- Information & Computational Sciences, The James Hutton Institute, Dundee, UK
| | | | - Nahuel Soldevilla
- Integrated Breeding Program (IBP), CIMMYT, Texcoco, Mexico.,LeafNode Technology, Buenos Aires, Argentina
| | - Gordon Stephen
- Information & Computational Sciences, The James Hutton Institute, Dundee, UK
| | - Qi Sun
- Institute of Biotechnology, Cornell University, Ithaca, New York, USA
| | - Clarysabel Tovar
- Integrated Breeding Program (IBP), CIMMYT, Texcoco, Mexico.,LeafNode Technology, Buenos Aires, Argentina
| | | | - Maikel Verouden
- Wageningen University & Research, Biometris, Wageningen PB, The Netherlands
| | | |
Collapse
|
6
|
Irshad O, Khan MUG. Integration and Querying of Heterogeneous Omics Semantic Annotations for Biomedical and Biomolecular Knowledge Discovery. Curr Bioinform 2020. [DOI: 10.2174/1574893614666190409112025] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Background:Exploring various functional aspects of a biological cell system has been a focused research trend for last many decades. Biologists, scientists and researchers are continuously striving for unveiling the mysteries of these functional aspects to improve the health standards of life. For getting such understanding, astronomically growing, heterogeneous and geographically dispersed omics data needs to be critically analyzed. Currently, omics data is available in different types and formats through various data access interfaces. Applications which require offline and integrated data encounter a lot of data heterogeneity and global dispersion issues.Objective:For facilitating especially such applications, heterogeneous data must be collected, integrated and warehoused in such a loosely coupled way so that each molecular entity can computationally be understood independently or in association with other entities within or across the various cellular aspects.Methods:In this paper, we propose an omics data integration schema and its corresponding data warehouse system for integrating, warehousing and presenting heterogeneous and geographically dispersed omics entities according to the cellular functional aspects.Results & Conclusion:Such aspect-oriented data integration, warehousing and data access interfacing through graphical search, web services and application programing interfaces make our proposed integrated data schema and warehouse system better and useful than other contemporary ones.
Collapse
Affiliation(s)
- Omer Irshad
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, University of Engineering and Technology, Lahore, Pakistan
| | - Muhammad Usman Ghani Khan
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, The University of Engineering and Technology, Lahore, Pakistan
| |
Collapse
|
7
|
Sellami S, Dkaki T, Zarour NE, Charrel PJ. MidSemI. INTERNATIONAL JOURNAL OF INFORMATION SYSTEM MODELING AND DESIGN 2019. [DOI: 10.4018/ijismd.2019040101] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The web diversification into the Web of Data and social media means that companies need to gather all the necessary data to help make the best-informed market decisions. However, data providers on the web publish data in various data models and may equip it with different search capabilities, thus requiring data integration techniques to access them. This work explores the current challenges in this area, discusses the limitations of some existing integration tools, and addresses them by proposing a semantic mediator-based approach to virtually integrate enterprise data with large-scale social and linked data. The implementation of the proposed approach is a configurable middleware application and a user-friendly keyword search interface that retrieves its input from internal enterprise data combined with various SPARQL endpoints and Web APIs. An evaluation study was conducted to compare its features with recent integration approaches. The results illustrate the added value and usability of the contributed approach.
Collapse
Affiliation(s)
- Samir Sellami
- LIRE Laboratory, University of Constantine 2 - Abdelhamid Mehri, Constantine, Algeria
| | - Taoufiq Dkaki
- IRIT Laboratory, University of Toulouse 2 - Jean Jaurès, Toulouse, France
| | - Nacer Eddine Zarour
- LIRE Laboratory, University of Constantine 2 - Abdelhamid Mehri, Constantine, Algeria
| | | |
Collapse
|
8
|
Abstract
In this review, we take a survey of bioinformatics databases and quantitative structure-activity relationship studies reported in published literature. Databases from the most general to special cancer-related ones have been included. Most commonly used methods of structure-based analysis of molecules have been reviewed, along with some case studies where they have been used in cancer research. This article is expected to be of use for general bioinformatics researchers interested in cancer and will also provide an update to those who have been actively pursuing this field of research.
Collapse
Affiliation(s)
- Adeel Malik
- Department of Biosciences, Jamia Millia Islamia University, New Delhi-110025, India
| | - Hemajit Singh
- Department of Biosciences, Jamia Millia Islamia University, New Delhi-110025, India
| | - Munazah Andrabi
- Department of Biosciences, Jamia Millia Islamia University, New Delhi-110025, India
| | - Syed Akhtar Husain
- Department of Biosciences, Jamia Millia Islamia University, New Delhi-110025, India
| | - Shandar Ahmad
- Department of Biosciences, Jamia Millia Islamia University, New Delhi-110025, India
| |
Collapse
|
9
|
A Secure-Ware System for Web Server: Ensuring Platform Interoperability, Security, Privacy, Usability and Functionality. NATIONAL ACADEMY SCIENCE LETTERS 2017. [DOI: 10.1007/s40009-017-0547-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
10
|
Haider S, Waggott D, Lalonde E, Fung C, Liu FF, Boutros PC. A bedr way of genomic interval processing. SOURCE CODE FOR BIOLOGY AND MEDICINE 2016; 11:14. [PMID: 27999613 PMCID: PMC5157088 DOI: 10.1186/s13029-016-0059-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 09/30/2016] [Indexed: 11/10/2022]
Abstract
BACKGROUND Next-generation sequencing is making it critical to robustly and rapidly handle genomic ranges within standard pipelines. Standard use-cases include annotating sequence ranges with gene or other genomic annotation, merging multiple experiments together and subsequently quantifying and visualizing the overlap. The most widely-used tools for these tasks work at the command-line (e.g. BEDTools) and the small number of available R packages are either slow or have distinct semantics and features from command-line interfaces. RESULTS To provide a robust R-based interface to standard command-line tools for genomic coordinate manipulation, we created bedr. This open-source R package can use either BEDTools or BEDOPS as a back-end and performs data-manipulation extremely quickly, creating R data structures that can be readily interfaced with existing computational pipelines. It includes data-visualization capabilities and a number of data-access functions that interface with standard databases like UCSC and COSMIC. CONCLUSIONS bedr package provides an open source solution to enable genomic interval data manipulation and restructuring in R programming language which is commonly used in bioinformatics, and therefore would be useful to bioinformaticians and genomic researchers.
Collapse
Affiliation(s)
- Syed Haider
- Informatics and Biocomputing Platform, Ontario Institute for Cancer Research, Toronto, M5G 0A3 Canada
| | - Daryl Waggott
- Informatics and Biocomputing Platform, Ontario Institute for Cancer Research, Toronto, M5G 0A3 Canada
| | - Emilie Lalonde
- Informatics and Biocomputing Platform, Ontario Institute for Cancer Research, Toronto, M5G 0A3 Canada.,Departments of Radiation Oncology, Pharmacology & Toxicology, and Medical Biophysics, University of Toronto, Toronto, M5G 2M9 Canada
| | - Clement Fung
- Informatics and Biocomputing Platform, Ontario Institute for Cancer Research, Toronto, M5G 0A3 Canada
| | - Fei-Fei Liu
- Departments of Radiation Oncology, Pharmacology & Toxicology, and Medical Biophysics, University of Toronto, Toronto, M5G 2M9 Canada.,Ontario Cancer Institute and Campbell Family Institute for Cancer Research, Princess Margaret Hospital, University Health Network, Toronto, M5G 2M9 Canada
| | - Paul C Boutros
- Informatics and Biocomputing Platform, Ontario Institute for Cancer Research, Toronto, M5G 0A3 Canada.,Departments of Radiation Oncology, Pharmacology & Toxicology, and Medical Biophysics, University of Toronto, Toronto, M5G 2M9 Canada
| |
Collapse
|
11
|
Bianchi V, Ceol A, Ogier AGE, de Pretis S, Galeota E, Kishore K, Bora P, Croci O, Campaner S, Amati B, Morelli MJ, Pelizzola M. Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions. Front Genet 2016; 7:75. [PMID: 27200084 PMCID: PMC4858535 DOI: 10.3389/fgene.2016.00075] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2016] [Accepted: 04/18/2016] [Indexed: 02/06/2023] Open
Abstract
Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzing these data poses new challenges, which may easily be underestimated by research groups devoid of IT and quantitative skills. In this perspective, we identify five issues that should be carefully addressed by research groups approaching NGS technologies. In particular, the five key issues to be considered concern: (1) adopting a laboratory management system (LIMS) and safeguard the resulting raw data structure in downstream analyses; (2) monitoring the flow of the data and standardizing input and output directories and file names, even when multiple analysis protocols are used on the same data; (3) ensuring complete traceability of the analysis performed; (4) enabling non-experienced users to run analyses through a graphical user interface (GUI) acting as a front-end for the pipelines; (5) relying on standard metadata to annotate the datasets, and when possible using controlled vocabularies, ideally derived from biomedical ontologies. Finally, we discuss the currently available tools in the light of these issues, and we introduce HTS-flow, a new workflow management system conceived to address the concerns we raised. HTS-flow is able to retrieve information from a LIMS database, manages data analyses through a simple GUI, outputs data in standard locations and allows the complete traceability of datasets, accompanying metadata and analysis scripts.
Collapse
Affiliation(s)
- Valerio Bianchi
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Arnaud Ceol
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Alessandro G E Ogier
- Department of Experimental Oncology, European Institute of Oncology Milano, Italy
| | - Stefano de Pretis
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Eugenia Galeota
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Kamal Kishore
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Pranami Bora
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Ottavio Croci
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Stefano Campaner
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Bruno Amati
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di TecnologiaMilano, Italy; Department of Experimental Oncology, European Institute of OncologyMilano, Italy
| | - Marco J Morelli
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Mattia Pelizzola
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| |
Collapse
|
12
|
Lapatas V, Stefanidakis M, Jimenez RC, Via A, Schneider MV. Data integration in biological research: an overview. JOURNAL OF BIOLOGICAL RESEARCH (THESSALONIKE, GREECE) 2015; 22:9. [PMID: 26336651 PMCID: PMC4557916 DOI: 10.1186/s40709-015-0032-5] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Accepted: 08/10/2015] [Indexed: 11/16/2022]
Abstract
Data sharing, integration and annotation are essential to ensure the reproducibility of the analysis and interpretation of the experimental findings. Often these activities are perceived as a role that bioinformaticians and computer scientists have to take with no or little input from the experimental biologist. On the contrary, biological researchers, being the producers and often the end users of such data, have a big role in enabling biological data integration. The quality and usefulness of data integration depend on the existence and adoption of standards, shared formats, and mechanisms that are suitable for biological researchers to submit and annotate the data, so it can be easily searchable, conveniently linked and consequently used for further biological analysis and discovery. Here, we provide background on what is data integration from a computational science point of view, how it has been applied to biological research, which key aspects contributed to its success and future directions.
Collapse
Affiliation(s)
- Vasileios Lapatas
- />Department of Informatics, Ionian University, 7 Tsirigoti Square, Corfu, 49100 Greece
| | - Michalis Stefanidakis
- />Department of Informatics, Ionian University, 7 Tsirigoti Square, Corfu, 49100 Greece
| | | | - Allegra Via
- />Biocomputing Group, Sapienza University, Piazzale Aldo Moro 5, Rome, 00185 Italy
| | | |
Collapse
|
13
|
Börnigen D, Moon YS, Rahnavard G, Waldron L, McIver L, Shafquat A, Franzosa EA, Miropolsky L, Sweeney C, Morgan XC, Garrett WS, Huttenhower C. A reproducible approach to high-throughput biological data acquisition and integration. PeerJ 2015; 3:e791. [PMID: 26157642 PMCID: PMC4493686 DOI: 10.7717/peerj.791] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 02/04/2015] [Indexed: 12/25/2022] Open
Abstract
Modern biological research requires rapid, complex, and reproducible integration of multiple experimental results generated both internally and externally (e.g., from public repositories). Although large systematic meta-analyses are among the most effective approaches both for clinical biomarker discovery and for computational inference of biomolecular mechanisms, identifying, acquiring, and integrating relevant experimental results from multiple sources for a given study can be time-consuming and error-prone. To enable efficient and reproducible integration of diverse experimental results, we developed a novel approach for standardized acquisition and analysis of high-throughput and heterogeneous biological data. This allowed, first, novel biomolecular network reconstruction in human prostate cancer, which correctly recovered and extended the NFκB signaling pathway. Next, we investigated host-microbiome interactions. In less than an hour of analysis time, the system retrieved data and integrated six germ-free murine intestinal gene expression datasets to identify the genes most influenced by the gut microbiota, which comprised a set of immune-response and carbohydrate metabolism processes. Finally, we constructed integrated functional interaction networks to compare connectivity of peptide secretion pathways in the model organisms Escherichia coli, Bacillus subtilis, and Pseudomonas aeruginosa.
Collapse
Affiliation(s)
- Daniela Börnigen
- Biostatistics Department, Harvard School of Public Health, Boston, MA, USA.,The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yo Sup Moon
- Biostatistics Department, Harvard School of Public Health, Boston, MA, USA
| | - Gholamali Rahnavard
- Biostatistics Department, Harvard School of Public Health, Boston, MA, USA.,The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Levi Waldron
- Biostatistics Department, Harvard School of Public Health, Boston, MA, USA.,City University of New York School of Public Health, Hunter College, New York, NY, USA
| | - Lauren McIver
- Biostatistics Department, Harvard School of Public Health, Boston, MA, USA
| | - Afrah Shafquat
- Biostatistics Department, Harvard School of Public Health, Boston, MA, USA
| | - Eric A Franzosa
- Biostatistics Department, Harvard School of Public Health, Boston, MA, USA.,The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Larissa Miropolsky
- Biostatistics Department, Harvard School of Public Health, Boston, MA, USA
| | | | - Xochitl C Morgan
- Biostatistics Department, Harvard School of Public Health, Boston, MA, USA.,The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Wendy S Garrett
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA, USA.,Department of Medicine, Harvard Medical School, Boston, MA, USA.,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Curtis Huttenhower
- Biostatistics Department, Harvard School of Public Health, Boston, MA, USA.,The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
14
|
Combe CW, Fischer L, Rappsilber J. xiNET: cross-link network maps with residue resolution. Mol Cell Proteomics 2015; 14:1137-47. [PMID: 25648531 PMCID: PMC4390258 DOI: 10.1074/mcp.o114.042259] [Citation(s) in RCA: 215] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Indexed: 01/03/2023] Open
Abstract
UNLABELLED xiNET is a visualization tool for exploring cross-linking/mass spectrometry results. The interactive maps of the cross-link network that it generates are a type of node-link diagram. In these maps xiNET displays: (1) residue resolution positional information including linkage sites and linked peptides; (2) all types of cross-linking reaction product; (3) ambiguous results; and, (4) additional sequence information such as domains. xiNET runs in a browser and exports vector graphics which can be edited in common drawing packages to create publication quality figures. AVAILABILITY xiNET is open source, released under the Apache version 2 license. Results can be viewed by uploading data to http://crosslinkviewer.org/ or by downloading the software from http://github.com/colin-combe/crosslink-viewer and running it locally.
Collapse
Affiliation(s)
- Colin W Combe
- From the ‡Wellcome Trust Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom
| | - Lutz Fischer
- From the ‡Wellcome Trust Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom
| | - Juri Rappsilber
- From the ‡Wellcome Trust Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom; §Department of Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany
| |
Collapse
|
15
|
Cutts RJ, Guerra-Assunção JA, Gadaleta E, Dayem Ullah AZ, Chelala C. BCCTBbp: the Breast Cancer Campaign Tissue Bank bioinformatics portal. Nucleic Acids Res 2014; 43:D831-6. [PMID: 25332396 PMCID: PMC4384036 DOI: 10.1093/nar/gku984] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
BCCTBbp (http://bioinformatics.breastcancertissuebank.org) was initially developed as the data-mining portal of the Breast Cancer Campaign Tissue Bank (BCCTB), a vital resource of breast cancer tissue for researchers to support and promote cutting-edge research. BCCTBbp is dedicated to maximising research on patient tissues by initially storing genomics, methylomics, transcriptomics, proteomics and microRNA data that has been mined from the literature and linking to pathways and mechanisms involved in breast cancer. Currently, the portal holds 146 datasets comprising over 227 795 expression/genomic measurements from various breast tissues (e.g. normal, malignant or benign lesions), cell lines and body fluids. BCCTBbp can be used to build on breast cancer knowledge and maximise the value of existing research. By recording a large number of annotations on samples and studies, and linking to other databases, such as NCBI, Ensembl and Reactome, a wide variety of different investigations can be carried out. Additionally, BCCTBbp has a dedicated analytical layer allowing researchers to further analyse stored datasets. A future important role for BCCTBbp is to make available all data generated on BCCTB tissues thus building a valuable resource of information on the tissues in BCCTB that will save repetition of experiments and expand scientific knowledge.
Collapse
Affiliation(s)
- Rosalind J Cutts
- Centre for Molecular Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - José Afonso Guerra-Assunção
- Centre for Molecular Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Emanuela Gadaleta
- Centre for Molecular Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Abu Z Dayem Ullah
- Centre for Molecular Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Claude Chelala
- Centre for Molecular Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| |
Collapse
|
16
|
Mukhyala K, Masselot A. Visualization of protein sequence features using JavaScript and SVG with pViz.js. ACTA ACUST UNITED AC 2014; 30:3408-9. [PMID: 25147360 DOI: 10.1093/bioinformatics/btu567] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
UNLABELLED pViz.js is a visualization library for displaying protein sequence features in a Web browser. By simply providing a sequence and the locations of its features, this lightweight, yet versatile, JavaScript library renders an interactive view of the protein features. Interactive exploration of protein sequence features over the Web is a common need in Bioinformatics. Although many Web sites have developed viewers to display these features, their implementations are usually focused on data from a specific source or use case. Some of these viewers can be adapted to fit other use cases but are not designed to be reusable. pViz makes it easy to display features as boxes aligned to a protein sequence with zooming functionality but also includes predefined renderings for secondary structure and post-translational modifications. The library is designed to further customize this view. We demonstrate such applications of pViz using two examples: a proteomic data visualization tool with an embedded viewer for displaying features on protein structure, and a tool to visualize the results of the variant_effect_predictor tool from Ensembl. AVAILABILITY AND IMPLEMENTATION pViz.js is a JavaScript library, available on github at https://github.com/Genentech/pviz. This site includes examples and functional applications, installation instructions and usage documentation. A Readme file, which explains how to use pViz with examples, is available as Supplementary Material A.
Collapse
Affiliation(s)
- Kiran Mukhyala
- Department of Bioinformatics and Computational Biology, Genentech Inc., South San Francisco, CA 94080, USA
| | - Alexandre Masselot
- Department of Bioinformatics and Computational Biology, Genentech Inc., South San Francisco, CA 94080, USA
| |
Collapse
|
17
|
Rodrigues MR, Luck M. Effective Cooperations Through Non-Monetary Exchanges: A Computational Framework. INT J COOP INF SYST 2014. [DOI: 10.1142/s0218843014500026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Today there is an increase in the number of cooperative initiatives in different domains to make tools and data available to global communities free or charge. Such cooperative systems are open, heterogeneous, dynamic, and lack a formal payment system. Incentivising cooperation in these scenarios is essential to maintain their effectiveness. Therefore, there is a recognised need to move away from an ad hoc approach to one in which cooperation is supported and encouraged. The agent-oriented paradigm has been advocated as a natural way to design and implement systems that are distributed and heterogeneous. However, developing an agent-oriented system for today's cooperative systems is challenging. It requires a means not only to provide non-monetary incentives for service providers, but also to consider the level of quality of cooperations, in terms of the quality of provided and received services. In this context, the key contribution of this paper is a framework for non-monetary interactions among self-interested agents, in which the motivation to cooperate and the bases for analysing cooperations come from Piaget's theory of exchange values. Our framework includes a computational model of these values, which defines how exchange values are accumulated and spent by interacting agents. We illustrate how our framework can be used by agents to analyze cooperations and to take decisions about them, and provide an empirical evaluation.
Collapse
Affiliation(s)
- Maíra R. Rodrigues
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Av. Antonio Carlos 6627, Belo Horizonte, MG CEP 31270-910, Brazil
| | - Michael Luck
- Department of Informatics, King's College London, Strand, London, WC2R 2LS, UK
| |
Collapse
|
18
|
Gille C, Fähling M, Weyand B, Wieland T, Gille A. Alignment-Annotator web server: rendering and annotating sequence alignments. Nucleic Acids Res 2014; 42:W3-6. [PMID: 24813445 PMCID: PMC4086088 DOI: 10.1093/nar/gku400] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 04/16/2014] [Accepted: 04/24/2014] [Indexed: 11/30/2022] Open
Abstract
UNLABELLED Alignment-Annotator is a novel web service designed to generate interactive views of annotated nucleotide and amino acid sequence alignments (i) de novo and (ii) embedded in other software. All computations are performed at server side. Interactivity is implemented in HTML5, a language native to web browsers. The alignment is initially displayed using default settings and can be modified with the graphical user interfaces. For example, individual sequences can be reordered or deleted using drag and drop, amino acid color code schemes can be applied and annotations can be added. Annotations can be made manually or imported (BioDAS servers, the UniProt, the Catalytic Site Atlas and the PDB). Some edits take immediate effect while others require server interaction and may take a few seconds to execute. The final alignment document can be downloaded as a zip-archive containing the HTML files. Because of the use of HTML the resulting interactive alignment can be viewed on any platform including Windows, Mac OS X, Linux, Android and iOS in any standard web browser. Importantly, no plugins nor Java are required and therefore Alignment-Anotator represents the first interactive browser-based alignment visualization. AVAILABILITY http://www.bioinformatics.org/strap/aa/ and http://strap.charite.de/aa/.
Collapse
Affiliation(s)
- Christoph Gille
- Department of Biochemistry, Charité, University Medicine Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Michael Fähling
- Institute of Vegetative Physiology, Charité, University Medicine Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Birgit Weyand
- Department of Plastic, Hand and Reconstructive Surgery, Hannover Medical School, Carl-Neubergstraße 1, 30625 Hannover, Germany
| | - Thomas Wieland
- Mannheim Medical Faculty, Institute of Experimental and Clinical Pharmacology and Toxicology, Heidelberg University, Maybachstraße 14, 68169 Mannheim, Germany
| | - Andreas Gille
- Mannheim Medical Faculty, Institute of Experimental and Clinical Pharmacology and Toxicology, Heidelberg University, Maybachstraße 14, 68169 Mannheim, Germany
| |
Collapse
|
19
|
Izzo M, Mortola F, Arnulfo G, Fato MM, Varesio L. A digital repository with an extensible data model for biobanking and genomic analysis management. BMC Genomics 2014; 15 Suppl 3:S3. [PMID: 25077808 PMCID: PMC4083403 DOI: 10.1186/1471-2164-15-s3-s3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Motivation Molecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standardization is not feasible and it becomes crucial to develop digital repositories with flexible and extensible data models, as in the case of modern integrated biobanks management. Results We developed a novel data model in JSON format to describe heterogeneous data in a generic biomedical science scenario. The model is built on two hierarchical entities: processes and events, roughly corresponding to research studies and analysis steps within a single study. A number of sequential events can be grouped in a process building up a hierarchical structure to track patient and sample history. Each event can produce new data. Data is described by a set of user-defined metadata, and may have one or more associated files. We integrated the model in a web based digital repository with a data grid storage to manage large data sets located in geographically distinct areas. We built a graphical interface that allows authorized users to define new data types dynamically, according to their requirements. Operators compose queries on metadata fields using a flexible search interface and run them on the database and on the grid. We applied the digital repository to the integrated management of samples, patients and medical history in the BIT-Gaslini biobank. The platform currently manages 1800 samples of over 900 patients. Microarray data from 150 analyses are stored on the grid storage and replicated on two physical resources for preservation. The system is equipped with data integration capabilities with other biobanks for worldwide information sharing. Conclusions Our data model enables users to continuously define flexible, ad hoc, and loosely structured metadata, for information sharing in specific research projects and purposes. This approach can improve sensitively interdisciplinary research collaboration and allows to track patients' clinical records, sample management information, and genomic data. The web interface allows the operators to easily manage, query, and annotate the files, without dealing with the technicalities of the data grid.
Collapse
|
20
|
Abstract
Genome-Wide Association Studies are widely used to correlate phenotypic traits with genetic variants. These studies usually compare the genetic variation between two groups to single out certain Single Nucleotide Polymorphisms (SNPs) that are linked to a phenotypic variation in one of the groups. However, it is necessary to have a large enough sample size to find statistically significant correlations. Direct-To-Consumer (DTC) genetic testing can supply additional data: DTC-companies offer the analysis of a large amount of SNPs for an individual at low cost without the need to consult a physician or geneticist. Over 100,000 people have already been genotyped through Direct-To-Consumer genetic testing companies. However, this data is not public for a variety of reasons and thus cannot be used in research. It seems reasonable to create a central open data repository for such data. Here we present the web platform openSNP, an open database which allows participants of Direct-To-Consumer genetic testing to publish their genetic data at no cost along with phenotypic information. Through this crowdsourced effort of collecting genetic and phenotypic information, openSNP has become a resource for a wide area of studies, including Genome-Wide Association Studies. openSNP is hosted at http://www.opensnp.org, and the code is released under MIT-license at http://github.com/gedankenstuecke/snpr.
Collapse
|
21
|
Singh M, Bhartiya D, Maini J, Sharma M, Singh AR, Kadarkaraisamy S, Rana R, Sabharwal A, Nanda S, Ramachandran A, Mittal A, Kapoor S, Sehgal P, Asad Z, Kaushik K, Vellarikkal SK, Jagga D, Muthuswami M, Chauhan RK, Leonard E, Priyadarshini R, Halimani M, Malhotra S, Patowary A, Vishwakarma H, Joshi P, Bhardwaj V, Bhaumik A, Bhatt B, Jha A, Kumar A, Budakoti P, Lalwani MK, Meli R, Jalali S, Joshi K, Pal K, Dhiman H, Laddha SV, Jadhav V, Singh N, Pandey V, Sachidanandan C, Ekker SC, Klee EW, Scaria V, Sivasubbu S. The Zebrafish GenomeWiki: a crowdsourcing approach to connect the long tail for zebrafish gene annotation. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau011. [PMID: 24578356 PMCID: PMC3936183 DOI: 10.1093/database/bau011] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
A large repertoire of gene-centric data has been generated in the field of zebrafish biology. Although the bulk of these data are available in the public domain, most of them are not readily accessible or available in nonstandard formats. One major challenge is to unify and integrate these widely scattered data sources. We tested the hypothesis that active community participation could be a viable option to address this challenge. We present here our approach to create standards for assimilation and sharing of information and a system of open standards for database intercommunication. We have attempted to address this challenge by creating a community-centric solution for zebrafish gene annotation. The Zebrafish GenomeWiki is a 'wiki'-based resource, which aims to provide an altruistic shared environment for collective annotation of the zebrafish genes. The Zebrafish GenomeWiki has features that enable users to comment, annotate, edit and rate this gene-centric information. The credits for contributions can be tracked through a transparent microattribution system. In contrast to other wikis, the Zebrafish GenomeWiki is a 'structured wiki' or rather a 'semantic wiki'. The Zebrafish GenomeWiki implements a semantically linked data structure, which in the future would be amenable to semantic search. Database URL: http://genome.igib.res.in/twiki.
Collapse
Affiliation(s)
- Meghna Singh
- CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi 110007, India, Academy of Scientific and Innovative Research (AcSIR), Anusandhan Bhawan, Delhi 110001, India, Acharya Narendra Dev College, Delhi University, Govindpuri, Kalkaji, New Delhi 110019, India, Dr. B. R. Ambedkar Center for Biomedical Research, University of Delhi, Delhi 110007, India, Department of Genetics, University of Delhi South Campus, Benito Juarez Road, Dhaula Kuan, New Delhi 110021, India and Mayo Clinic, Rochester, MN, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Gong S, Ware JS, Walsh R, Cook SA. NECTAR: a database of codon-centric missense variant annotations. Nucleic Acids Res 2013; 42:D1013-9. [PMID: 24297257 PMCID: PMC3965063 DOI: 10.1093/nar/gkt1245] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
NECTAR (Non-synonymous Enriched Coding muTation ARchive; http://nectarmutation.org) is a database and web application to annotate disease-related and functionally important amino acids in human proteins. A number of tools are available to facilitate the interpretation of DNA variants identified in diagnostic or research sequencing. These typically identify previous reports of DNA variation at a given genomic location, predict its effects on transcript and protein sequence and may predict downstream functional consequences. Previous reports and functional annotations are typically linked by the genomic location of the variant observed. NECTAR collates disease-causing variants and functionally important amino acid residues from a number of sources. Importantly, rather than simply linking annotations by a shared genomic location, NECTAR annotates variants of interest with details of previously reported variation affecting the same codon. This provides a much richer data set for the interpretation of a novel DNA variant. NECTAR also identifies functionally equivalent amino acid residues in evolutionarily related proteins (paralogues) and, where appropriate, transfers annotations between them. As well as accessing these data through a web interface, users can upload batches of variants in variant call format (VCF) for annotation on-the-fly. The database is freely available to download from the ftp site: ftp://ftp.nectarmutation.org.
Collapse
Affiliation(s)
- Sungsam Gong
- NIHR Cardiovascular Biomedical Research Unit, Royal Brompton and Harefield NHS Foundation Trust and Imperial College London, London SW3 6NP, UK, National Heart and Lung Institute, Imperial College, London SW3 6LY, UK, National Heart Centre Singapore, Singapore 168752, Singapore and Cardiovascular & Metabolic Disorders, Duke National University of Singapore, Singapore 169857, Singapore
| | | | | | | |
Collapse
|
23
|
Raney BJ, Dreszer TR, Barber GP, Clawson H, Fujita PA, Wang T, Nguyen N, Paten B, Zweig AS, Karolchik D, Kent WJ. Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser. Bioinformatics 2013; 30:1003-5. [PMID: 24227676 PMCID: PMC3967101 DOI: 10.1093/bioinformatics/btt637] [Citation(s) in RCA: 332] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
SUMMARY Track data hubs provide an efficient mechanism for visualizing remotely hosted Internet-accessible collections of genome annotations. Hub datasets can be organized, configured and fully integrated into the University of California Santa Cruz (UCSC) Genome Browser and accessed through the familiar browser interface. For the first time, individuals can use the complete browser feature set to view custom datasets without the overhead of setting up and maintaining a mirror. AVAILABILITY AND IMPLEMENTATION Source code for the BigWig, BigBed and Genome Browser software is freely available for non-commercial use at http://hgdownload.cse.ucsc.edu/admin/jksrc.zip, implemented in C and supported on Linux. Binaries for the BigWig and BigBed creation and parsing utilities may be downloaded at http://hgdownload.cse.ucsc.edu/admin/exe/. Binary Alignment/Map (BAM) and Variant Call Format (VCF)/tabix utilities are available from http://samtools.sourceforge.net/ and http://vcftools.sourceforge.net/. The UCSC Genome Browser is publicly accessible at http://genome.ucsc.edu.
Collapse
Affiliation(s)
- Brian J Raney
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA and Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63108, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Moretti S, Laurenczy B, Gharib WH, Castella B, Kuzniar A, Schabauer H, Studer RA, Valle M, Salamin N, Stockinger H, Robinson-Rechavi M. Selectome update: quality control and computational improvements to a database of positive selection. Nucleic Acids Res 2013; 42:D917-21. [PMID: 24225318 PMCID: PMC3964977 DOI: 10.1093/nar/gkt1065] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Selectome (http://selectome.unil.ch/) is a database of positive selection, based on a branch-site likelihood test. This model estimates the number of nonsynonymous substitutions (dN) and synonymous substitutions (dS) to evaluate the variation in selective pressure (dN/dS ratio) over branches and over sites. Since the original release of Selectome, we have benchmarked and implemented a thorough quality control procedure on multiple sequence alignments, aiming to provide minimum false-positive results. We have also improved the computational efficiency of the branch-site test implementation, allowing larger data sets and more frequent updates. Release 6 of Selectome includes all gene trees from Ensembl for Primates and Glires, as well as a large set of vertebrate gene trees. A total of 6810 gene trees have some evidence of positive selection. Finally, the web interface has been improved to be more responsive and to facilitate searches and browsing.
Collapse
Affiliation(s)
- Sébastien Moretti
- Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland, Evolutionary Bioinformatics group, SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland, Vital-IT group, SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland, Computational Phylogenetics group, SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland, Division of Biosciences, Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK and Swiss National Supercomputing Centre (CSCS), CH-6900, Lugano, Switzerland
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Lagerstedt I, Moore WJ, Patwardhan A, Sanz-García E, Best C, Swedlow JR, Kleywegt GJ. Web-based visualisation and analysis of 3D electron-microscopy data from EMDB and PDB. J Struct Biol 2013; 184:173-81. [PMID: 24113529 PMCID: PMC3898923 DOI: 10.1016/j.jsb.2013.09.021] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2013] [Revised: 09/24/2013] [Accepted: 09/25/2013] [Indexed: 11/25/2022]
Abstract
The Protein Data Bank in Europe (PDBe) has developed web-based tools for the visualisation and analysis of 3D electron microscopy (3DEM) structures in the Electron Microscopy Data Bank (EMDB) and Protein Data Bank (PDB). The tools include: (1) a volume viewer for 3D visualisation of maps, tomograms and models, (2) a slice viewer for inspecting 2D slices of tomographic reconstructions, and (3) visual analysis pages to facilitate analysis and validation of maps, tomograms and models. These tools were designed to help non-experts and experts alike to get some insight into the content and assess the quality of 3DEM structures in EMDB and PDB without the need to install specialised software or to download large amounts of data from these archives. The technical challenges encountered in developing these tools, as well as the more general considerations when making archived data available to the user community through a web interface, are discussed.
Collapse
Affiliation(s)
- Ingvar Lagerstedt
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - William J. Moore
- Centre for Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dow Street, Dundee DD1 5EH, United Kingdom
| | - Ardan Patwardhan
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Eduardo Sanz-García
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Christoph Best
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Jason R. Swedlow
- Centre for Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dow Street, Dundee DD1 5EH, United Kingdom
| | - Gerard J. Kleywegt
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, United Kingdom
| |
Collapse
|
26
|
Lee E, Helt GA, Reese JT, Munoz-Torres MC, Childers CP, Buels RM, Stein L, Holmes IH, Elsik CG, Lewis SE. Web Apollo: a web-based genomic annotation editing platform. Genome Biol 2013; 14:R93. [PMID: 24000942 PMCID: PMC4053811 DOI: 10.1186/gb-2013-14-8-r93] [Citation(s) in RCA: 274] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2013] [Accepted: 08/30/2013] [Indexed: 01/11/2023] Open
Abstract
Web Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. One of the natural consequences following from current advances in sequencing technology is that there are more and more researchers sequencing new genomes. These researchers require tools to describe the functional features of their newly sequenced genomes. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world.
Collapse
|
27
|
Vaudel M, Sickmann A, Martens L. Introduction to opportunities and pitfalls in functional mass spectrometry based proteomics. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:12-20. [PMID: 23845992 DOI: 10.1016/j.bbapap.2013.06.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2012] [Revised: 06/05/2013] [Accepted: 06/25/2013] [Indexed: 10/26/2022]
Abstract
With the advent of mass spectrometry based proteomics, the identification of thousands of proteins has become commonplace in biology nowadays. Increasingly, efforts have also been invested toward the detection and localization of posttranslational modifications. It is furthermore common practice to quantify the identified entities, a task supported by a panel of different methods. Finally, the results can also be enriched with functional knowledge gained on the proteins, detecting for instance differentially expressed gene ontology terms or biological pathways. In this study, we review the resources, methods and tools available for the researcher to achieve such a quantitative functional analysis. These include statistics for the post-processing of identification and quantification results, online resources and public repositories. With a focus on free but user-friendly software, preferably also open-source, we provide a list of tools designed to help the researcher manage the vast amount of data generated. We also indicate where such applications currently remain lacking. Moreover, we stress the eventual pitfalls of every step of such studies. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
Collapse
Affiliation(s)
- Marc Vaudel
- Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V., Dortmund, Germany; Proteomics Unit (PROBE), Department of Biomedicine, University of Bergen, Bergen, Norway.
| | | | | |
Collapse
|
28
|
Vaudel M, Sickmann A, Martens L. Current methods for global proteome identification. Expert Rev Proteomics 2013. [PMID: 23194269 DOI: 10.1586/epr.12.51] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
In a time frame of a few decades, protein identification went from laborious single protein identification to automated identification of entire proteomes. This shift was enabled by the emergence of peptide-centric, gel-free analyses, in particular the so-called shotgun approaches, which not only rely on extensive experiments, but also on cutting-edge data processing methods. The present review therefore provides an overview of a shotgun proteomics identification workflow, listing the state-of-the-art methods involved and software that implement these. The authors focus on freely available tools where possible. Finally, data analysis in the context of emerging across-omics studies will also be discussed briefly, where proteomics goes beyond merely delivering a list of protein accession numbers.
Collapse
Affiliation(s)
- Marc Vaudel
- Leibniz-Institut für Analytische Wissenschaften-ISAS-e.V., Dortmund, Germany
| | | | | |
Collapse
|
29
|
Gremme G, Steinbiss S, Kurtz S. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:645-56. [PMID: 24091398 DOI: 10.1109/tcbb.2013.68] [Citation(s) in RCA: 276] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Genome annotations are often published as plain text files describing genomic features and their subcomponents by an implicit annotation graph. In this paper, we present the GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs. The GenomeTools strictly follow the annotation graph approach, offering a unified graph-based representation. This gives the developer intuitive and immediate access to genomic features and tools for their manipulation. To process large annotation sets with low memory overhead, we have designed and implemented an efficient pull-based approach for sequential processing of annotations. This allows to handle even the largest annotation sets, such as a complete catalogue of human variations. Our object-oriented C-based software library enables a developer to conveniently implement their own functionality on annotation graphs and to integrate it into larger workflows, simultaneously accessing compressed sequence data if required. The careful C implementation of the GenomeTools does not only ensure a light-weight memory footprint while allowing full sequential as well as random access to the annotation graph, but also facilitates the creation of bindings to a variety of script programming languages (like Python and Ruby) sharing the same interface.
Collapse
|
30
|
Marabini R, Macias JR, Vargas J, Quintana A, Sorzano COS, Carazo JM. On the development of three new tools for organizing and sharing information in three-dimensional electron microscopy. ACTA CRYSTALLOGRAPHICA SECTION D: BIOLOGICAL CRYSTALLOGRAPHY 2013; 69:695-700. [DOI: 10.1107/s0907444913007038] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2012] [Accepted: 03/13/2013] [Indexed: 11/10/2022]
|
31
|
Zhu W, Zhu Y, Yang X. Information engineering infrastructure for life sciences and its implementation in China. SCIENCE CHINA. LIFE SCIENCES 2013; 56:220-227. [PMID: 23526387 DOI: 10.1007/s11427-013-4440-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 12/31/2012] [Indexed: 06/02/2023]
Abstract
Biological data, represented by the data from omics platforms, are accumulating exponentially. As some other data-intensive scientific disciplines such as high-energy physics, climatology, meteorology, geology, geography and environmental sciences, modern life sciences have entered the information-rich era, the era of the 4th paradigm. The creation of Chinese information engineering infrastructure for pan-omics studies (CIEIPOS) has been long overdue as part of national scientific infrastructure, in accelerating the further development of Chinese life sciences, and translating rich data into knowledge and medical applications. By gathering facts of current status of international and Chinese bioinformatics communities in collecting, managing and utilizing biological data, the essay stresses the significance and urgency to create a 'data hub' in CIEIPOS, discusses challenges and possible solutions to integrate, query and visualize these data. Another important component of CIEIPOS, which is not part of traditional biological data centers such as NCBI and EBI, is omics informatics. Mass spectroscopy platform was taken as an example to illustrate the complexity of omics informatics. Its heavy dependency on computational power is highlighted. The demand for such power in omics studies is argued as the fundamental function to meet for CIEIPOS. Implementation outlook of CIEIPOS in hardware and network is discussed.
Collapse
Affiliation(s)
- Weimin Zhu
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100730, China.
| | | | | |
Collapse
|
32
|
Klepper K, Drabløs F. MotifLab: a tools and data integration workbench for motif discovery and regulatory sequence analysis. BMC Bioinformatics 2013; 14:9. [PMID: 23323883 PMCID: PMC3556059 DOI: 10.1186/1471-2105-14-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2012] [Accepted: 01/10/2013] [Indexed: 12/19/2022] Open
Abstract
Background Traditional methods for computational motif discovery often suffer from poor performance. In particular, methods that search for sequence matches to known binding motifs tend to predict many non-functional binding sites because they fail to take into consideration the biological state of the cell. In recent years, genome-wide studies have generated a lot of data that has the potential to improve our ability to identify functional motifs and binding sites, such as information about chromatin accessibility and epigenetic states in different cell types. However, it is not always trivial to make use of this data in combination with existing motif discovery tools, especially for researchers who are not skilled in bioinformatics programming. Results Here we present MotifLab, a general workbench for analysing regulatory sequence regions and discovering transcription factor binding sites and cis-regulatory modules. MotifLab supports comprehensive motif discovery and analysis by allowing users to integrate several popular motif discovery tools as well as different kinds of additional information, including phylogenetic conservation, epigenetic marks, DNase hypersensitive sites, ChIP-Seq data, positional binding preferences of transcription factors, transcription factor interactions and gene expression. MotifLab offers several data-processing operations that can be used to create, manipulate and analyse data objects, and complete analysis workflows can be constructed and automatically executed within MotifLab, including graphical presentation of the results. Conclusions We have developed MotifLab as a flexible workbench for motif analysis in a genomic context. The flexibility and effectiveness of this workbench has been demonstrated on selected test cases, in particular two previously published benchmark data sets for single motifs and modules, and a realistic example of genes responding to treatment with forskolin. MotifLab is freely available at http://www.motiflab.org.
Collapse
Affiliation(s)
- Kjetil Klepper
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway.
| | | |
Collapse
|
33
|
Abstract
This article aims to introduce the nature of data integration to life scientists. Generally, the subject of data integration is not discussed outside the field of computational science and is not covered in any detail, or even neglected, when teaching/training trainees. End users (hereby defined as wet-lab trainees, clinicians, lab researchers) will mostly interact with bioinformatics resources and tools through web interfaces that mask the user from the data integration processes. However, the lack of formal training or acquaintance with even simple database concepts and terminology often results in a real obstacle to the full comprehension of the resources and tools the end users wish to access. Understanding how data integration works is fundamental to empowering trainees to see the limitations as well as the possibilities when exploring, retrieving, and analysing biological data from databases. Here we introduce a game-based learning activity for training/teaching the topic of data integration that trainers/educators can adopt and adapt for their classroom. In particular we provide an example using DAS (Distributed Annotation Systems) as a method for data integration.
Collapse
Affiliation(s)
- Maria Victoria Schneider
- Outreach and Training Team, European Molecular Biology Laboratory Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
| | | |
Collapse
|
34
|
Kuhring M, Renard BY. iPiG: integrating peptide spectrum matches into genome browser visualizations. PLoS One 2012; 7:e50246. [PMID: 23226516 PMCID: PMC3514238 DOI: 10.1371/journal.pone.0050246] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 10/22/2012] [Indexed: 11/18/2022] Open
Abstract
Proteogenomic approaches have gained increasing popularity, however it is still difficult to integrate mass spectrometry identifications with genomic data due to differing data formats. To address this difficulty, we introduce iPiG as a tool for the integration of peptide identifications from mass spectrometry experiments into existing genome browser visualizations. Thereby, the concurrent analysis of proteomic and genomic data is simplified and proteomic results can directly be compared to genomic data. iPiG is freely available from https://sourceforge.net/projects/ipig/. It is implemented in Java and can be run as a stand-alone tool with a graphical user-interface or integrated into existing workflows. Supplementary data are available at PLOS ONE online.
Collapse
Affiliation(s)
- Mathias Kuhring
- Research Group Bioinformatics (NG4), Robert Koch-Institute, Berlin, Germany
| | - Bernhard Y. Renard
- Research Group Bioinformatics (NG4), Robert Koch-Institute, Berlin, Germany
- * E-mail:
| |
Collapse
|
35
|
Sallou O, Bretaudeau A, Roult A. Seqcrawler: biological data indexing and browsing platform. BMC Bioinformatics 2012; 13:175. [PMID: 22827839 PMCID: PMC3481441 DOI: 10.1186/1471-2105-13-175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Accepted: 06/19/2012] [Indexed: 11/10/2022] Open
|
36
|
Vizcaíno JA, Côté RG, Csordas A, Dianes JA, Fabregat A, Foster JM, Griss J, Alpi E, Birim M, Contell J, O'Kelly G, Schoenegger A, Ovelleiro D, Pérez-Riverol Y, Reisinger F, Ríos D, Wang R, Hermjakob H. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res 2012. [PMID: 23203882 PMCID: PMC3531176 DOI: 10.1093/nar/gks1262] [Citation(s) in RCA: 1621] [Impact Index Per Article: 124.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
The PRoteomics IDEntifications (PRIDE, http://www.ebi.ac.uk/pride) database at the European Bioinformatics Institute is one of the most prominent data repositories of mass spectrometry (MS)-based proteomics data. Here, we summarize recent developments in the PRIDE database and related tools. First, we provide up-to-date statistics in data content, splitting the figures by groups of organisms and species, including peptide and protein identifications, and post-translational modifications. We then describe the tools that are part of the PRIDE submission pipeline, especially the recently developed PRIDE Converter 2 (new submission tool) and PRIDE Inspector (visualization and analysis tool). We also give an update about the integration of PRIDE with other MS proteomics resources in the context of the ProteomeXchange consortium. Finally, we briefly review the quality control efforts that are ongoing at present and outline our future plans.
Collapse
Affiliation(s)
- Juan Antonio Vizcaíno
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Hsu W, Speier W, Taira RK. Automated extraction of reported statistical analyses: towards a logical representation of clinical trial literature. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2012; 2012:350-359. [PMID: 23304305 PMCID: PMC3540551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Randomized controlled trials are an important source of evidence for guiding clinical decisions when treating a patient. However, given the large number of studies and their variability in quality, determining how to summarize reported results and formalize them as part of practice guidelines continues to be a challenge. We have developed a set of information extraction and annotation tools to automate the identification of key information from papers related to the hypothesis, sample size, statistical test, confidence interval, significance level, and conclusions. We adapted the Automated Sequence Annotation Pipeline to map extracted phrases to relevant knowledge sources. We trained and tested our system on a corpus of 42 full-text articles related to chemotherapy of non-small cell lung cancer. On our test set of 7 papers, we obtained an overall precision of 86%, recall of 78%, and an F-score of 0.82 for classifying sentences. This work represents our efforts towards utilizing this information for quality assessment, meta-analysis, and modeling.
Collapse
Affiliation(s)
- William Hsu
- Medical Imaging Informatics Group, Dept of Radiological Sciences, University of California, Los Angeles, CA, USA
| | | | | |
Collapse
|
38
|
Mishima H, Aerts J, Katayama T, Bonnal RJP, Yoshiura KI. The Ruby UCSC API: accessing the UCSC genome database using Ruby. BMC Bioinformatics 2012; 13:240. [PMID: 22994508 PMCID: PMC3542311 DOI: 10.1186/1471-2105-13-240] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2012] [Accepted: 09/17/2012] [Indexed: 12/26/2022] Open
Abstract
Background The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. Results The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast. The API uses the bin index—if available—when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby). Conclusions Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/.
Collapse
Affiliation(s)
- Hiroyuki Mishima
- Department of Human Genetics, Nagasaki University Graduate School of Biomedical Sciences, 1-12-4 Sakamoto, Nagasaki, Nagasaki, 852-8523, Japan.
| | | | | | | | | |
Collapse
|
39
|
Speier W, Ochs MF. Updating annotations with the distributed annotation system and the automated sequence annotation pipeline. ACTA ACUST UNITED AC 2012; 28:2858-9. [PMID: 22945787 DOI: 10.1093/bioinformatics/bts530] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
SUMMARY The integration between BioDAS ProServer and Automated Sequence Annotation Pipeline (ASAP) provides an interface for querying diverse annotation sources, chaining and linking results, and standardizing the output using the Distributed Annotation System (DAS) protocol. This interface allows pipeline plans in ASAP to be integrated into any system using HTTP and also allows the information returned by ASAP to be included in the DAS registry for use in any DAS-aware system. Three example implementations have been developed: the first accesses TRANSFAC information to automatically create gene sets for the Coordinated Gene Activity in Pattern Sets (CoGAPS) algorithm; the second integrates annotations from multiple array platforms and provides unified annotations in an R environment; and the third wraps the UniProt database for integration with the SPICE DAS client. AVAILABILITY Source code for ASAP 2.7 and the DAS 1.6 interface is available under the GNU public license. Proserver 2.20 is free software available from SourceForge. Scripts for installation and configuration on Linux are provided at our website: http://www.rits.onc.jhmi.edu/dbb/custom/A6/
Collapse
Affiliation(s)
- William Speier
- Medical Imaging Informatics Group, University of California, Los Angeles, CA, USA.
| | | |
Collapse
|
40
|
Wang J, Kong L, Gao G, Luo J. A brief introduction to web-based genome browsers. Brief Bioinform 2012; 14:131-43. [DOI: 10.1093/bib/bbs029] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
41
|
Bleda M, Tarraga J, de Maria A, Salavert F, Garcia-Alonso L, Celma M, Martin A, Dopazo J, Medina I. CellBase, a comprehensive collection of RESTful web services for retrieving relevant biological information from heterogeneous sources. Nucleic Acids Res 2012; 40:W609-14. [PMID: 22693220 PMCID: PMC3394301 DOI: 10.1093/nar/gks575] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
During the past years, the advances in high-throughput technologies have produced an unprecedented growth in the number and size of repositories and databases storing relevant biological data. Today, there is more biological information than ever but, unfortunately, the current status of many of these repositories is far from being optimal. Some of the most common problems are that the information is spread out in many small databases; frequently there are different standards among repositories and some databases are no longer supported or they contain too specific and unconnected information. In addition, data size is increasingly becoming an obstacle when accessing or storing biological data. All these issues make very difficult to extract and integrate information from different sources, to analyze experiments or to access and query this information in a programmatic way. CellBase provides a solution to the growing necessity of integration by easing the access to biological data. CellBase implements a set of RESTful web services that query a centralized database containing the most relevant biological data sources. The database is hosted in our servers and is regularly updated. CellBase documentation can be found at http://docs.bioinfo.cipf.es/projects/cellbase.
Collapse
Affiliation(s)
- Marta Bleda
- Department of Bioinformatics and Genomics, Centro de Investigación Príncipe Felipe (CIPF), 46012 Valencia, Spain
| | | | | | | | | | | | | | | | | |
Collapse
|
42
|
do Nascimento LC, Costa GGL, Binneck E, Pereira GAG, Carazzolle MF. A web-based bioinformatics interface applied to the GENOSOJA Project: Databases and pipelines. Genet Mol Biol 2012; 35:203-11. [PMID: 22802706 PMCID: PMC3392873 DOI: 10.1590/s1415-47572012000200002] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The Genosoja consortium is an initiative to integrate different omics research approaches carried out in Brazil. Basically, the aim of the project is to improve the plant by identifying genes involved in responses against stresses that affect domestic production, like drought stress and Asian Rust fungal disease. To do so, the project generated several types of sequence data using different methodologies, most of them sequenced by next generation sequencers. The initial stage of the project is highly dependent on bioinformatics analysis, providing suitable tools and integrated databases. In this work, we describe the main features of the Genosoja web database, including the pipelines to analyze some kinds of data (ESTs, SuperSAGE, microRNAs, subtractive cDNA libraries), as well as web interfaces to access information about soybean gene annotation and expression.
Collapse
Affiliation(s)
- Leandro Costa do Nascimento
- Laboratório de Genômica e Expressão, Departamento de Genética, Evolução e Bioagentes, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, SP, Brazil
| | - Gustavo Gilson Lacerda Costa
- Laboratório de Genômica e Expressão, Departamento de Genética, Evolução e Bioagentes, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, SP, Brazil
| | - Eliseu Binneck
- Empresa Brasileira de Pesquisa Agropecuária, Londrina, PR, Brazil
| | - Gonçalo Amarante Guimarães Pereira
- Laboratório de Genômica e Expressão, Departamento de Genética, Evolução e Bioagentes, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, SP, Brazil
| | - Marcelo Falsarella Carazzolle
- Laboratório de Genômica e Expressão, Departamento de Genética, Evolução e Bioagentes, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, SP, Brazil
- Centro Nacional de Processamento de Alto Desempenho em São Paulo, Universidade Estadual de Campinas, Campinas, SP, Brazil
| |
Collapse
|
43
|
Relax with CouchDB--into the non-relational DBMS era of bioinformatics. Genomics 2012; 100:1-7. [PMID: 22609849 DOI: 10.1016/j.ygeno.2012.05.006] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2012] [Revised: 05/08/2012] [Accepted: 05/10/2012] [Indexed: 12/19/2022]
Abstract
With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services.
Collapse
|
44
|
Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 2012; 14:178-92. [PMID: 22517427 PMCID: PMC3603213 DOI: 10.1093/bib/bbs017] [Citation(s) in RCA: 6221] [Impact Index Per Article: 478.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Data visualization is an essential component of genomic data analysis. However, the size and diversity of the data sets produced by today’s sequencing and array-based profiling methods present major challenges to visualization tools. The Integrative Genomics Viewer (IGV) is a high-performance viewer that efficiently handles large heterogeneous data sets, while providing a smooth and intuitive user experience at all levels of genome resolution. A key characteristic of IGV is its focus on the integrative nature of genomic studies, with support for both array-based and next-generation sequencing data, and the integration of clinical and phenotypic data. Although IGV is often used to view genomic data from public sources, its primary emphasis is to support researchers who wish to visualize and explore their own data sets or those from colleagues. To that end, IGV supports flexible loading of local and remote data sets, and is optimized to provide high-performance data visualization and exploration on standard desktop systems. IGV is freely available for download from http://www.broadinstitute.org/igv, under a GNU LGPL open-source license.
Collapse
|
45
|
Loveland JE, Gilbert JGR, Griffiths E, Harrow JL. Community gene annotation in practice. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2012; 2012:bas009. [PMID: 22434843 PMCID: PMC3308165 DOI: 10.1093/database/bas009] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Manual annotation of genomic data is extremely valuable to produce an accurate reference gene set but is expensive compared with automatic methods and so has been limited to model organisms. Annotation tools that have been developed at the Wellcome Trust Sanger Institute (WTSI, http://www.sanger.ac.uk/.) are being used to fill that gap, as they can be used remotely and so open up viable community annotation collaborations. We introduce the ‘Blessed’ annotator and ‘Gatekeeper’ approach to Community Annotation using the Otterlace/ZMap genome annotation tool. We also describe the strategies adopted for annotation consistency, quality control and viewing of the annotation. Database URL: http://vega.sanger.ac.uk/index.html
Collapse
Affiliation(s)
- Jane E Loveland
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
| | | | | | | |
Collapse
|
46
|
Gulledge AA, Roberts AD, Vora H, Patel K, Loraine AE. Mining Arabidopsis thaliana RNA-seq data with Integrated Genome Browser reveals stress-induced alternative splicing of the putative splicing regulator SR45a. AMERICAN JOURNAL OF BOTANY 2012; 99:219-31. [PMID: 22291167 DOI: 10.3732/ajb.1100355] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
PREMISE OF THE STUDY High-throughput sequencing of cDNA libraries prepared from diverse samples (RNA-seq) can reveal genome-wide changes in alternative splicing. Using RNA-seq data to assess splicing at the level of individual genes requires the ability to visualize read alignments alongside genomic annotations. To meet this need, we added RNA-seq visualization capability to Integrated Genome Browser (IGB), a free desktop genome visualization tool. To illustrate this capability, we present an in-depth analysis of abiotic stresses and their effects on alternative splicing of SR45a (AT1G07350), a putative splicing regulator from Arabidopsis thaliana. METHODS cDNA libraries prepared from Arabidopsis plants that were subjected to heat and dehydration stresses were sequenced on an Illumina GAIIx sequencer, yielding more than 511 million high-quality 75-base, single-end sequence reads. Reads were aligned onto the reference genome and visualized in IGB. KEY RESULTS Using IGB, we confirmed exon-skipping alternative splicing in SR45a. Exon-skipped variant AT1G07350.1 encodes full-length SR45a protein with intact RS and RNA recognition motifs, while nonskipped variant AT1G07350.2 lacks the C-terminal RS region due to a frameshift in the alternative exon. Heat and drought stresses increased both transcript abundance and the proportion of exon-skipped transcripts encoding the full-length protein. We identified new splice sites and observed frequent intron retention flanking the alternative exon. CONCLUSIONS This study underlines the importance of visual inspection of RNA-seq alignments when investigating alternatively spliced genes. We showed that heat and dehydration stresses increase overall abundance of SR45a mRNA while also increasing production of transcripts encoding the full-length SR45a protein relative to other splice variants.
Collapse
Affiliation(s)
- Alyssa A Gulledge
- Department of Bioinformatics and Genomics, North Carolina Research Campus, University of North Carolina at Charlotte, 600 Laureate Way, Kannapolis, North Carolina 28081, USA
| | | | | | | | | |
Collapse
|
47
|
Kong L, Wang J, Zhao S, Gu X, Luo J, Gao G. ABrowse--a customizable next-generation genome browser framework. BMC Bioinformatics 2012; 13:2. [PMID: 22222089 PMCID: PMC3265404 DOI: 10.1186/1471-2105-13-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2011] [Accepted: 01/05/2012] [Indexed: 11/14/2022] Open
Abstract
Background With the rapid growth of genome sequencing projects, genome browser is becoming indispensable, not only as a visualization system but also as an interactive platform to support open data access and collaborative work. Thus a customizable genome browser framework with rich functions and flexible configuration is needed to facilitate various genome research projects. Results Based on next-generation web technologies, we have developed a general-purpose genome browser framework ABrowse which provides interactive browsing experience, open data access and collaborative work support. By supporting Google-map-like smooth navigation, ABrowse offers end users highly interactive browsing experience. To facilitate further data analysis, multiple data access approaches are supported for external platforms to retrieve data from ABrowse. To promote collaborative work, an online user-space is provided for end users to create, store and share comments, annotations and landmarks. For data providers, ABrowse is highly customizable and configurable. The framework provides a set of utilities to import annotation data conveniently. To build ABrowse on existing annotation databases, data providers could specify SQL statements according to database schema. And customized pages for detailed information display of annotation entries could be easily plugged in. For developers, new drawing strategies could be integrated into ABrowse for new types of annotation data. In addition, standard web service is provided for data retrieval remotely, providing underlying machine-oriented programming interface for open data access. Conclusions ABrowse framework is valuable for end users, data providers and developers by providing rich user functions and flexible customization approaches. The source code is published under GNU Lesser General Public License v3.0 and is accessible at http://www.abrowse.org/. To demonstrate all the features of ABrowse, a live demo for Arabidopsis thaliana genome has been built at http://arabidopsis.cbi.edu.cn/.
Collapse
Affiliation(s)
- Lei Kong
- College of Life Sciences, State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, Peking University, Beijing, 100871, P.R. China
| | | | | | | | | | | |
Collapse
|
48
|
|
49
|
Gadaleta E, Cutts RJ, Sangaralingam A, Lemoine NR, Chelala C. An Integrated Systems Approach to the Study of Pancreatic Cancer. SYSTEMS BIOLOGY IN CANCER RESEARCH AND DRUG DISCOVERY 2012:83-111. [DOI: 10.1007/978-94-007-4819-4_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
50
|
Identifying elemental genomic track types and representing them uniformly. BMC Bioinformatics 2011; 12:494. [PMID: 22208806 PMCID: PMC3315820 DOI: 10.1186/1471-2105-12-494] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2011] [Accepted: 12/30/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND With the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. The variation in biological context, and the increasingly dispersed mode of data generation, imply a need for precise, interoperable and flexible representations of genomic features through formats that are easy to parse. A host of alternative formats are currently available and in use, complicating analysis and tool development. The issue of whether and how the multitude of formats reflects varying underlying characteristics of data has to our knowledge not previously been systematically treated. RESULTS We here identify intrinsic distinctions between genomic features, and argue that the distinctions imply that a certain variation in the representation of features as genomic tracks is warranted. Four core informational properties of tracks are discussed: gaps, lengths, values and interconnections. From this we delineate fifteen generic track types. Based on the track type distinctions, we characterize major existing representational formats and find that the track types are not adequately supported by any single format. We also find, in contrast to the XML formats, that none of the existing tabular formats are conveniently extendable to support all track types. We thus propose two unified formats for track data, an improved XML format, BioXSD 1.1, and a new tabular format, GTrack 1.0. CONCLUSIONS The defined track types are shown to capture relevant distinctions between genomic annotation tracks, resulting in varying representational needs and analysis possibilities. The proposed formats, GTrack 1.0 and BioXSD 1.1, cater to the identified track distinctions and emphasize preciseness, flexibility and parsing convenience.
Collapse
|