Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Goble CA, Bhagat J, Aleksejevs S, Cruickshank D, Michaelides D, Newman D, Borkum M, Bechhofer S, Roos M, Li P, De Roure D. myExperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Res 2010;38:W677-82. [PMID: 20501605 PMCID: PMC2896080 DOI: 10.1093/nar/gkq429] [Citation(s) in RCA: 214] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

For:	Goble CA, Bhagat J, Aleksejevs S, Cruickshank D, Michaelides D, Newman D, Borkum M, Bechhofer S, Roos M, Li P, De Roure D. myExperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Res 2010;38:W677-82. [PMID: 20501605 PMCID: PMC2896080 DOI: 10.1093/nar/gkq429] [Citation(s) in RCA: 214] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Number

Cited by Other Article(s)

Multi-omic data analysis using Galaxy. Nat Biotechnol 2015;33:137-9. [DOI: 10.1038/nbt.3134] [Citation(s) in RCA: 113] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]

Marchese Robinson RL, Cronin MTD, Richarz AN, Rallo R. An ISA-TAB-Nano based data collection framework to support data-driven modelling of nanotoxicology. BEILSTEIN JOURNAL OF NANOTECHNOLOGY 2015;6:1978-99. [PMID: 26665069 PMCID: PMC4660926 DOI: 10.3762/bjnano.6.202] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 08/27/2015] [Indexed: 05/20/2023]

Golosova O, Henderson R, Vaskin Y, Gabrielian A, Grekhov G, Nagarajan V, Oler AJ, Quiñones M, Hurt D, Fursov M, Huyen Y. Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses. PeerJ 2014;2:e644. [PMID: 25392756 PMCID: PMC4226638 DOI: 10.7717/peerj.644] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2014] [Accepted: 10/09/2014] [Indexed: 02/03/2023] Open

Tsiliki G, Karacapilidis N, Christodoulou S, Tzagarakis M. Collaborative mining and interpretation of large-scale data for biomedical research insights. PLoS One 2014;9:e108600. [PMID: 25268270 PMCID: PMC4182494 DOI: 10.1371/journal.pone.0108600] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2014] [Accepted: 08/31/2014] [Indexed: 01/21/2023] Open

Hettne KM, Dharuri H, Zhao J, Wolstencroft K, Belhajjame K, Soiland-Reyes S, Mina E, Thompson M, Cruickshank D, Verdes-Montenegro L, Garrido J, de Roure D, Corcho O, Klyne G, van Schouwen R, ‘t Hoen PAC, Bechhofer S, Goble C, Roos M. Structuring research methods and data with the research object model: genomics workflows as a case study. J Biomed Semantics 2014;5:41. [PMID: 25276335 PMCID: PMC4177597 DOI: 10.1186/2041-1480-5-41] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Accepted: 07/29/2014] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows.

RESULTS

We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as "which particular data was input to a particular workflow to test a particular hypothesis?", and "which particular conclusions were drawn from a particular workflow?".

CONCLUSIONS

Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well.

AVAILABILITY

The Research Object is available at http://www.myexperiment.org/packs/428 The Wf4Ever Research Object Model is available at http://wf4ever.github.io/ro.

Collapse

Beisken S, Earll M, Baxter C, Portwood D, Ament Z, Kende A, Hodgman C, Seymour G, Smith R, Fraser P, Seymour M, Salek RM, Steinbeck C. Metabolic differences in ripening of Solanum lycopersicum 'Ailsa Craig' and three monogenic mutants. Sci Data 2014;1:140029. [PMID: 25977786 PMCID: PMC4322568 DOI: 10.1038/sdata.2014.29] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Accepted: 08/06/2014] [Indexed: 12/02/2022] Open

Tsiliki G, Kossida S, Friesen N, Rüping S, Tzagarakis M, Karacapilidis N. A Data Mining Based Approach for Collaborative Analysis of Biomedical Data. INT J ARTIF INTELL T 2014. [DOI: 10.1142/s0218213014600100] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Costa RS, Veríssimo A, Vinga S. KiMoSys: a web-based repository of experimental data for KInetic MOdels of biological SYStems. BMC SYSTEMS BIOLOGY 2014;8:85. [PMID: 25115331 PMCID: PMC4236735 DOI: 10.1186/s12918-014-0085-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Accepted: 07/11/2014] [Indexed: 01/03/2023]

Abstract

BACKGROUND

The kinetic modeling of biological systems is mainly composed of three steps that proceed iteratively: model building, simulation and analysis. In the first step, it is usually required to set initial metabolite concentrations, and to assign kinetic rate laws, along with estimating parameter values using kinetic data through optimization when these are not known. Although the rapid development of high-throughput methods has generated much omics data, experimentalists present only a summary of obtained results for publication, the experimental data files are not usually submitted to any public repository, or simply not available at all. In order to automatize as much as possible the steps of building kinetic models, there is a growing requirement in the systems biology community for easily exchanging data in combination with models, which represents the main motivation of KiMoSys development.

DESCRIPTION

KiMoSys is a user-friendly platform that includes a public data repository of published experimental data, containing concentration data of metabolites and enzymes and flux data. It was designed to ensure data management, storage and sharing for a wider systems biology community. This community repository offers a web-based interface and upload facility to turn available data into publicly accessible, centralized and structured-format data files. Moreover, it compiles and integrates available kinetic models associated with the data.KiMoSys also integrates some tools to facilitate the kinetic model construction process of large-scale metabolic networks, especially when the systems biologists perform computational research.

CONCLUSIONS

KiMoSys is a web-based system that integrates a public data and associated model(s) repository with computational tools, providing the systems biology community with a novel application facilitating data storage and sharing, thus supporting construction of ODE-based kinetic models and collaborative research projects.The web application implemented using Ruby on Rails framework is freely available for web access at http://kimosys.org, along with its full documentation.

Collapse

WGS Analysis and Interpretation in Clinical and Public Health Microbiology Laboratories: What Are the Requirements and How Do Existing Tools Compare? Pathogens 2014;3:437-58. [PMID: 25437808 PMCID: PMC4243455 DOI: 10.3390/pathogens3020437] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2014] [Revised: 05/30/2014] [Accepted: 06/03/2014] [Indexed: 11/16/2022] Open

The BioDICE Taverna plugin for clustering and visualization of biological data: a workflow for molecular compounds exploration. J Cheminform 2014. [PMCID: PMC4036106 DOI: 10.1186/1758-2946-6-24] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

McDonagh JL, Nath N, De Ferrari L, van Mourik T, Mitchell JBO. Uniting cheminformatics and chemical theory to predict the intrinsic aqueous solubility of crystalline druglike molecules. J Chem Inf Model 2014;54:844-56. [PMID: 24564264 PMCID: PMC3965570 DOI: 10.1021/ci4005805] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

SemEnAl: Using Semantics for Accelerating Environmental Analytical Model Discovery. BIG DATA ANALYTICS 2014. [DOI: 10.1007/978-3-319-13820-6_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open

Dharuri H, Henneman P, Demirkan A, van Klinken JB, Mook-Kanamori DO, Wang-Sattler R, Gieger C, Adamski J, Hettne K, Roos M, Suhre K, Van Duijn CM, van Dijk KW, 't Hoen PAC. Automated workflow-based exploitation of pathway databases provides new insights into genetic associations of metabolite profiles. BMC Genomics 2013;14:865. [PMID: 24320595 PMCID: PMC3879060 DOI: 10.1186/1471-2164-14-865] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2013] [Accepted: 12/02/2013] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Genome-wide association studies (GWAS) have identified many common single nucleotide polymorphisms (SNPs) that associate with clinical phenotypes, but these SNPs usually explain just a small part of the heritability and have relatively modest effect sizes. In contrast, SNPs that associate with metabolite levels generally explain a higher percentage of the genetic variation and demonstrate larger effect sizes. Still, the discovery of SNPs associated with metabolite levels is challenging since testing all metabolites measured in typical metabolomics studies with all SNPs comes with a severe multiple testing penalty. We have developed an automated workflow approach that utilizes prior knowledge of biochemical pathways present in databases like KEGG and BioCyc to generate a smaller SNP set relevant to the metabolite. This paper explores the opportunities and challenges in the analysis of GWAS of metabolomic phenotypes and provides novel insights into the genetic basis of metabolic variation through the re-analysis of published GWAS datasets.

RESULTS

Re-analysis of the published GWAS dataset from Illig et al. (Nature Genetics, 2010) using a pathway-based workflow (http://www.myexperiment.org/packs/319.html), confirmed previously identified hits and identified a new locus of human metabolic individuality, associating Aldehyde dehydrogenase family1 L1 (ALDH1L1) with serine/glycine ratios in blood. Replication in an independent GWAS dataset of phospholipids (Demirkan et al., PLoS Genetics, 2012) identified two novel loci supported by additional literature evidence: GPAM (Glycerol-3 phosphate acyltransferase) and CBS (Cystathionine beta-synthase). In addition, the workflow approach provided novel insight into the affected pathways and relevance of some of these gene-metabolite pairs in disease development and progression.

CONCLUSIONS

We demonstrate the utility of automated exploitation of background knowledge present in pathway databases for the analysis of GWAS datasets of metabolomic phenotypes. We report novel loci and potential biochemical mechanisms that contribute to our understanding of the genetic basis of metabolic variation and its relationship to disease development and progression.

Collapse

Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten simple rules for reproducible computational research. PLoS Comput Biol 2013;9:e1003285. [PMID: 24204232 PMCID: PMC3812051 DOI: 10.1371/journal.pcbi.1003285] [Citation(s) in RCA: 310] [Impact Index Per Article: 28.2] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Beisken S, Meinl T, Wiswedel B, de Figueiredo LF, Berthold M, Steinbeck C. KNIME-CDK: Workflow-driven cheminformatics. BMC Bioinformatics 2013;14:257. [PMID: 24103053 PMCID: PMC3765822 DOI: 10.1186/1471-2105-14-257] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2013] [Accepted: 08/21/2013] [Indexed: 12/17/2022] Open

Kouskoumvekaki I, Shublaq N, Brunak S. Facilitating the use of large-scale biological data and tools in the era of translational bioinformatics. Brief Bioinform 2013;15:942-52. [PMID: 23908249 DOI: 10.1093/bib/bbt055] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Ison J, Kalas M, Jonassen I, Bolser D, Uludag M, McWilliam H, Malone J, Lopez R, Pettifer S, Rice P. EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats. Bioinformatics 2013;29:1325-32. [PMID: 23479348 PMCID: PMC3654706 DOI: 10.1093/bioinformatics/btt113] [Citation(s) in RCA: 126] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Revised: 02/28/2013] [Accepted: 03/01/2013] [Indexed: 11/14/2022] Open

McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N, Cowley AP, Lopez R. Analysis Tool Web Services from the EMBL-EBI. Nucleic Acids Res 2013;41:W597-600. [PMID: 23671338 PMCID: PMC3692137 DOI: 10.1093/nar/gkt376] [Citation(s) in RCA: 1203] [Impact Index Per Article: 109.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open

Wolstencroft K, Haines R, Fellows D, Williams A, Withers D, Owen S, Soiland-Reyes S, Dunlop I, Nenadic A, Fisher P, Bhagat J, Belhajjame K, Bacall F, Hardisty A, Nieva de la Hidalga A, Balcazar Vargas MP, Sufi S, Goble C. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res 2013;41:W557-61. [PMID: 23640334 PMCID: PMC3692062 DOI: 10.1093/nar/gkt328] [Citation(s) in RCA: 482] [Impact Index Per Article: 43.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open

Pérez M, Berlanga R, Sanz I, Aramburu MJ. BioUSeR: a semantic-based tool for retrieving Life Science web resources driven by text-rich user requirements. J Biomed Semantics 2013;4:12. [PMID: 23635042 PMCID: PMC3698192 DOI: 10.1186/2041-1480-4-12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 04/18/2013] [Indexed: 12/05/2022] Open

Wollbrett J, Larmande P, de Lamotte F, Ruiz M. Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases. BMC Bioinformatics 2013;14:126. [PMID: 23586394 PMCID: PMC3680174 DOI: 10.1186/1471-2105-14-126] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2012] [Accepted: 03/25/2013] [Indexed: 11/10/2022] Open

Vaughan LK, Srinivasasainagendra V. Where in the genome are we? A cautionary tale of database use in genomics research. Front Genet 2013;4:38. [PMID: 23519237 PMCID: PMC3604632 DOI: 10.3389/fgene.2013.00038] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2012] [Accepted: 03/04/2013] [Indexed: 11/20/2022] Open

Myers T, Atkinson I. Eco-informatics modelling via semantic inference. INFORM SYST 2013. [DOI: 10.1016/j.is.2012.04.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Aranguren ME, Fernández-Breis JT, Mungall C, Antezana E, González AR, Wilkinson MD. OPPL-Galaxy, a Galaxy tool for enhancing ontology exploitation as part of bioinformatics workflows. J Biomed Semantics 2013;4:2. [PMID: 23286517 PMCID: PMC3643862 DOI: 10.1186/2041-1480-4-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 12/27/2012] [Indexed: 11/10/2022] Open

Jimenez RC, Corpas M. Bioinformatics workflows and web services in systems biology made easy for experimentalists. Methods Mol Biol 2013;1021:299-310. [PMID: 23715992 DOI: 10.1007/978-1-62703-450-0_16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]

Bird CL, Willoughby C, Frey JG. Laboratory notebooks in the digital era: the role of ELNs in record keeping for chemistry and other sciences. Chem Soc Rev 2013;42:8157-75. [DOI: 10.1039/c3cs60122f] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Jiménez RC, Vizcaíno JA. Proteomics data exchange and storage: the need for common standards and public repositories. Methods Mol Biol 2013;1007:317-333. [PMID: 23666733 DOI: 10.1007/978-1-62703-392-3_14] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]

Beck T, Free RC, Thorisson GA, Brookes AJ. Semantically enabling a genome-wide association study database. J Biomed Semantics 2012;3:9. [PMID: 23244533 PMCID: PMC3579732 DOI: 10.1186/2041-1480-3-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2012] [Accepted: 08/22/2012] [Indexed: 01/03/2023] Open

Abstract

Background

The amount of data generated from genome-wide association studies (GWAS) has grown rapidly, but considerations for GWAS phenotype data reuse and interchange have not kept pace. This impacts on the work of GWAS Central – a free and open access resource for the advanced querying and comparison of summary-level genetic association data. The benefits of employing ontologies for standardising and structuring data are widely accepted. The complex spectrum of observed human phenotypes (and traits), and the requirement for cross-species phenotype comparisons, calls for reflection on the most appropriate solution for the organisation of human phenotype data. The Semantic Web provides standards for the possibility of further integration of GWAS data and the ability to contribute to the web of Linked Data.

Results

A pragmatic consideration when applying phenotype ontologies to GWAS data is the ability to retrieve all data, at the most granular level possible, from querying a single ontology graph. We found the Medical Subject Headings (MeSH) terminology suitable for describing all traits (diseases and medical signs and symptoms) at various levels of granularity and the Human Phenotype Ontology (HPO) most suitable for describing phenotypic abnormalities (medical signs and symptoms) at the most granular level. Diseases within MeSH are mapped to HPO to infer the phenotypic abnormalities associated with diseases. Building on the rich semantic phenotype annotation layer, we are able to make cross-species phenotype comparisons and publish a core subset of GWAS data as RDF nanopublications.

Conclusions

We present a methodology for applying phenotype annotations to a comprehensive genome-wide association dataset and for ensuring compatibility with the Semantic Web. The annotations are used to assist with cross-species genotype and phenotype comparisons. However, further processing and deconstructions of terms may be required to facilitate automatic phenotype comparisons. The provision of GWAS nanopublications enables a new dimension for exploring GWAS data, by way of intrinsic links to related data resources within the Linked Data web. The value of such annotation and integration will grow as more biomedical resources adopt the standards of the Semantic Web.

Collapse

Application of an integrative computational framework in trancriptomic data of atherosclerotic mice suggests numerous molecular players. Adv Bioinformatics 2012;2012:453513. [PMID: 23193398 PMCID: PMC3502768 DOI: 10.1155/2012/453513] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2012] [Accepted: 09/21/2012] [Indexed: 01/09/2023] Open

Rodrigues MR, Magalhães WCS, Machado M, Tarazona-Santos E. A graph-based approach for designing extensible pipelines. BMC Bioinformatics 2012;13:163. [PMID: 22788675 PMCID: PMC3496580 DOI: 10.1186/1471-2105-13-163] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2011] [Accepted: 06/22/2012] [Indexed: 11/10/2022] Open

Abstract

Background

In bioinformatics, it is important to build extensible and low-maintenance systems that are able to deal with the new tools and data formats that are constantly being developed. The traditional and simplest implementation of pipelines involves hardcoding the execution steps into programs or scripts. This approach can lead to problems when a pipeline is expanding because the incorporation of new tools is often error prone and time consuming. Current approaches to pipeline development such as workflow management systems focus on analysis tasks that are systematically repeated without significant changes in their course of execution, such as genome annotation. However, more dynamism on the pipeline composition is necessary when each execution requires a different combination of steps.

Results

We propose a graph-based approach to implement extensible and low-maintenance pipelines that is suitable for pipeline applications with multiple functionalities that require different combinations of steps in each execution. Here pipelines are composed automatically by compiling a specialised set of tools on demand, depending on the functionality required, instead of specifying every sequence of tools in advance. We represent the connectivity of pipeline components with a directed graph in which components are the graph edges, their inputs and outputs are the graph nodes, and the paths through the graph are pipelines. To that end, we developed special data structures and a pipeline system algorithm. We demonstrate the applicability of our approach by implementing a format conversion pipeline for the fields of population genetics and genetic epidemiology, but our approach is also helpful in other fields where the use of multiple software is necessary to perform comprehensive analyses, such as gene expression and proteomics analyses. The project code, documentation and the Java executables are available under an open source license at http://code.google.com/p/dynamic-pipeline. The system has been tested on Linux and Windows platforms.

Conclusions

Our graph-based approach enables the automatic creation of pipelines by compiling a specialised set of tools on demand, depending on the functionality required. It also allows the implementation of extensible and low-maintenance pipelines and contributes towards consolidating openness and collaboration in bioinformatics systems. It is targeted at pipeline developers and is suited for implementing applications with sequential execution steps and combined functionalities. In the format conversion application, the automatic combination of conversion tools increased both the number of possible conversions available to the user and the extensibility of the system to allow for future updates with new file formats.

Collapse

Jeliazkova N. Web tools for predictive toxicology model building. Expert Opin Drug Metab Toxicol 2012;8:791-801. [PMID: 22577953 DOI: 10.1517/17425255.2012.685158] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]

Abouelhoda M, Issa SA, Ghanem M. Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support. BMC Bioinformatics 2012;13:77. [PMID: 22559942 PMCID: PMC3583125 DOI: 10.1186/1471-2105-13-77] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2011] [Accepted: 05/04/2012] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts.

RESULTS

In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure.

CONCLUSIONS

Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis.The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org.

Collapse

Rybiński M, Lula M, Banasik P, Lasota S, Gambin A. Tav4SB: integrating tools for analysis of kinetic models of biological systems. BMC SYSTEMS BIOLOGY 2012;6:25. [PMID: 22480273 PMCID: PMC3495710 DOI: 10.1186/1752-0509-6-25] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Accepted: 04/05/2012] [Indexed: 11/25/2022]

Parr CS, Guralnick R, Cellinese N, Page RD. Evolutionary informatics: unifying knowledge about the diversity of life. Trends Ecol Evol 2012;27:94-103. [PMID: 22154516 DOI: 10.1016/j.tree.2011.11.001] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2011] [Revised: 10/31/2011] [Accepted: 11/01/2011] [Indexed: 01/23/2023]

Michener WK, Jones MB. Ecoinformatics: supporting ecology as a data-intensive science. Trends Ecol Evol 2012;27:85-93. [PMID: 22240191 DOI: 10.1016/j.tree.2011.11.016] [Citation(s) in RCA: 146] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2011] [Revised: 11/29/2011] [Accepted: 11/29/2011] [Indexed: 11/30/2022]

Hallinan J. Data mining for microbiologists. J Microbiol Methods 2012. [DOI: 10.1016/b978-0-08-099387-4.00002-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]

Willighagen EL, Jeliazkova N, Hardy B, Grafström RC, Spjuth O. Computational toxicology using the OpenTox application programming interface and Bioclipse. BMC Res Notes 2011;4:487. [PMID: 22075173 PMCID: PMC3264531 DOI: 10.1186/1756-0500-4-487] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2011] [Accepted: 11/10/2011] [Indexed: 11/10/2022] Open

Abstract

Background

Toxicity is a complex phenomenon involving the potential adverse effect on a range of biological functions. Predicting toxicity involves using a combination of experimental data (endpoints) and computational methods to generate a set of predictive models. Such models rely strongly on being able to integrate information from many sources. The required integration of biological and chemical information sources requires, however, a common language to express our knowledge ontologically, and interoperating services to build reliable predictive toxicology applications.

Findings

This article describes progress in extending the integrative bio- and cheminformatics platform Bioclipse to interoperate with OpenTox, a semantic web framework which supports open data exchange and toxicology model building. The Bioclipse workbench environment enables functionality from OpenTox web services and easy access to OpenTox resources for evaluating toxicity properties of query molecules. Relevant cases and interfaces based on ten neurotoxins are described to demonstrate the capabilities provided to the user. The integration takes advantage of semantic web technologies, thereby providing an open and simplifying communication standard. Additionally, the use of ontologies ensures proper interoperation and reliable integration of toxicity information from both experimental and computational sources.

Conclusions

A novel computational toxicity assessment platform was generated from integration of two open science platforms related to toxicology: Bioclipse, that combines a rich scriptable and graphical workbench environment for integration of diverse sets of information sources, and OpenTox, a platform for interoperable toxicology data and computational services. The combination provides improved reliability and operability for handling large data sets by the use of the Open Standards from the OpenTox Application Programming Interface. This enables simultaneous access to a variety of distributed predictive toxicology databases, and algorithm and model resources, taking advantage of the Bioclipse workbench handling the technical layers.

Collapse

Romano P, Giugno R, Pulvirenti A. Tools and collaborative environments for bioinformatics research. Brief Bioinform 2011;12:549-61. [PMID: 21984743 PMCID: PMC3220874 DOI: 10.1093/bib/bbr055] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open

Splendiani A, Gündel M, Austyn JM, Cavalieri D, Scognamiglio C, Brandizi M. Knowledge sharing and collaboration in translational research, and the DC-THERA Directory. Brief Bioinform 2011;12:562-75. [PMID: 21969471 PMCID: PMC3220873 DOI: 10.1093/bib/bbr051] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Mishima H, Sasaki K, Tanaka M, Tatebe O, Yoshiura KI. Agile parallel bioinformatics workflow management using Pwrake. BMC Res Notes 2011;4:331. [PMID: 21899774 PMCID: PMC3180464 DOI: 10.1186/1756-0500-4-331] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2011] [Accepted: 09/08/2011] [Indexed: 12/20/2022] Open

Abstract

Background

In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.

Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows.

Findings

We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows.

Conclusions

Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows.

Collapse

Jagla B, Wiswedel B, Coppée JY. Extending KNIME for next-generation sequencing data analysis. ACTA ACUST UNITED AC 2011;27:2907-9. [PMID: 21873641 DOI: 10.1093/bioinformatics/btr478] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Swedlow JR, Zanetti G, Best C. Channeling the data deluge. Nat Methods 2011;8:463-5. [PMID: 21623351 DOI: 10.1038/nmeth.1616] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Frey JG, Bird CL. Web-based services for drug design and discovery. Expert Opin Drug Discov 2011;6:885-95. [DOI: 10.1517/17460441.2011.598924] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Lushbough CM, Jennewein DM, Brendel VP. The BioExtract Server: a web-based bioinformatic workflow platform. Nucleic Acids Res 2011;39:W528-32. [PMID: 21546552 PMCID: PMC3125737 DOI: 10.1093/nar/gkr286] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Webb AJ, Thorisson GA, Brookes AJ. An informatics project and online "Knowledge Centre" supporting modern genotype-to-phenotype research. Hum Mutat 2011;32:543-50. [PMID: 21438073 DOI: 10.1002/humu.21469] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2011] [Accepted: 01/28/2011] [Indexed: 11/06/2022]

Strijkers R, Cushing R, Vasyunin D, de Laat C, Belloum AS, Meijer R. Toward Executable Scientiﬁc Publications. ACTA ACUST UNITED AC 2011. [DOI: 10.1016/j.procs.2011.04.074] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Vroling B, Sanders M, Baakman C, Borrmann A, Verhoeven S, Klomp J, Oliveira L, de Vlieg J, Vriend G. GPCRDB: information system for G protein-coupled receptors. Nucleic Acids Res 2011;39:D309-19. [PMID: 21045054 PMCID: PMC3013641 DOI: 10.1093/nar/gkq1009] [Citation(s) in RCA: 115] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2010] [Accepted: 10/07/2010] [Indexed: 11/14/2022] Open

Affiliation(s)

Bas Vroling CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
Marijn Sanders CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
Coos Baakman CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
Annika Borrmann CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
Stefan Verhoeven CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
Jan Klomp CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
Laerte Oliveira CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
Jacob de Vlieg CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
Gert Vriend CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil

Collapse

ProPub: Towards a Declarative Approach for Publishing Customized, Policy-Aware Provenance. ACTA ACUST UNITED AC 2011. [DOI: 10.1007/978-3-642-22351-8_13] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

Möller S, Krabbenhöft HN, Tille A, Paleino D, Williams A, Wolstencroft K, Goble C, Holland R, Belhachemi D, Plessy C. Community-driven computational biology with Debian Linux. BMC Bioinformatics 2010;11 Suppl 12:S5. [PMID: 21210984 PMCID: PMC3040531 DOI: 10.1186/1471-2105-11-s12-s5] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

100

Wilkinson MD, McCarthy L, Vandervalk B, Withers D, Kawas E, Samadian S. SADI, SHARE, and the in silico scientific method. BMC Bioinformatics 2010;11 Suppl 12:S7. [PMID: 21210986 PMCID: PMC3040533 DOI: 10.1186/1471-2105-11-s12-s7] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open