1
|
Lamprecht AL, Palmblad M, Ison J, Schwämmle V, Al Manir MS, Altintas I, Baker CJO, Ben Hadj Amor A, Capella-Gutierrez S, Charonyktakis P, Crusoe MR, Gil Y, Goble C, Griffin TJ, Groth P, Ienasescu H, Jagtap P, Kalaš M, Kasalica V, Khanteymoori A, Kuhn T, Mei H, Ménager H, Möller S, Richardson RA, Robert V, Soiland-Reyes S, Stevens R, Szaniszlo S, Verberne S, Verhoeven A, Wolstencroft K. Perspectives on automated composition of workflows in the life sciences. F1000Res 2021; 10:897. [PMID: 34804501 PMCID: PMC8573700 DOI: 10.12688/f1000research.54159.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/27/2021] [Indexed: 12/29/2022] Open
Abstract
Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.
Collapse
Affiliation(s)
| | - Magnus Palmblad
- Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands
| | - Jon Ison
- French Institute of Bioinformatics, 91057 Évry, France
| | | | | | - Ilkay Altintas
- University of California San Diego, La Jolla, CA, 92093, USA
| | - Christopher J. O. Baker
- University of New Brunswick, Saint John, E2L 4L5, Canada
- IPSNP Computing Inc., Saint John, E2L 4S6, Canada
| | | | | | | | | | - Yolanda Gil
- University of Southern California, Marina Del Rey, CA, 90292, USA
| | - Carole Goble
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Paul Groth
- University of Amsterdam, 1090 GH Amsterdam, The Netherlands
| | - Hans Ienasescu
- Technical University of Denmark, 2800 Kongens Lyngby, Denmark
| | - Pratik Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA
| | | | | | | | - Tobias Kuhn
- VU Amsterdam, 1081 HV Amsterdam, The Netherlands
| | - Hailiang Mei
- Sequencing Analysis Support Core, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands
| | | | - Steffen Möller
- IBIMA, Rostock University Medical Center, 18057 Rostock, Germany
| | | | | | - Stian Soiland-Reyes
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
- Informatics Institute, University of Amsterdam, 1090 GH Amsterdam, The Netherlands
| | - Robert Stevens
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | | | - Suzan Verberne
- Leiden Institute of Advanced Computer Science, Leiden University, 2333 BE Leiden, The Netherlands
| | - Aswin Verhoeven
- Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands
| | - Katherine Wolstencroft
- Leiden Institute of Advanced Computer Science, Leiden University, 2333 BE Leiden, The Netherlands
| |
Collapse
|
2
|
Costa GCB, Braga R, David JMN, Campos F. A Scientific Software Product Line for the Bioinformatics domain. J Biomed Inform 2015; 56:239-64. [DOI: 10.1016/j.jbi.2015.05.014] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 04/04/2015] [Accepted: 05/19/2015] [Indexed: 11/17/2022]
|
3
|
Malone J, Brown A, Lister AL, Ison J, Hull D, Parkinson H, Stevens R. The Software Ontology (SWO): a resource for reproducibility in biomedical data analysis, curation and digital preservation. J Biomed Semantics 2014; 5:25. [PMID: 25068035 PMCID: PMC4098953 DOI: 10.1186/2041-1480-5-25] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 04/19/2014] [Indexed: 01/07/2023] Open
Abstract
Motivation Biomedical ontologists to date have concentrated on ontological descriptions of biomedical entities such as gene products and their attributes, phenotypes and so on. Recently, effort has diversified to descriptions of the laboratory investigations by which these entities were produced. However, much biological insight is gained from the analysis of the data produced from these investigations, and there is a lack of adequate descriptions of the wide range of software that are central to bioinformatics. We need to describe how data are analyzed for discovery, audit trails, provenance and reproducibility. Results The Software Ontology (SWO) is a description of software used to store, manage and analyze data. Input to the SWO has come from beyond the life sciences, but its main focus is the life sciences. We used agile techniques to gather input for the SWO and keep engagement with our users. The result is an ontology that meets the needs of a broad range of users by describing software, its information processing tasks, data inputs and outputs, data formats versions and so on. Recently, the SWO has incorporated EDAM, a vocabulary for describing data and related concepts in bioinformatics. The SWO is currently being used to describe software used in multiple biomedical applications. Conclusion The SWO is another element of the biomedical ontology landscape that is necessary for the description of biomedical entities and how they were discovered. An ontology of software used to analyze data produced by investigations in the life sciences can be made in such a way that it covers the important features requested and prioritized by its users. The SWO thus fits into the landscape of biomedical ontologies and is produced using techniques designed to keep it in line with user’s needs. Availability The Software Ontology is available under an Apache 2.0 license at http://theswo.sourceforge.net/; the Software Ontology blog can be read at http://softwareontology.wordpress.com.
Collapse
Affiliation(s)
- James Malone
- EMBL-EBI, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK
| | - Andy Brown
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Allyson L Lister
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Jon Ison
- EMBL-EBI, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK
| | - Duncan Hull
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Helen Parkinson
- EMBL-EBI, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK
| | - Robert Stevens
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| |
Collapse
|
4
|
Ison J, Kalas M, Jonassen I, Bolser D, Uludag M, McWilliam H, Malone J, Lopez R, Pettifer S, Rice P. EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats. Bioinformatics 2013; 29:1325-32. [PMID: 23479348 PMCID: PMC3654706 DOI: 10.1093/bioinformatics/btt113] [Citation(s) in RCA: 126] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Revised: 02/28/2013] [Accepted: 03/01/2013] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Advancing the search, publication and integration of bioinformatics tools and resources demands consistent machine-understandable descriptions. A comprehensive ontology allowing such descriptions is therefore required. RESULTS EDAM is an ontology of bioinformatics operations (tool or workflow functions), types of data and identifiers, application domains and data formats. EDAM supports semantic annotation of diverse entities such as Web services, databases, programmatic libraries, standalone tools, interactive applications, data schemas, datasets and publications within bioinformatics. EDAM applies to organizing and finding suitable tools and data and to automating their integration into complex applications or workflows. It includes over 2200 defined concepts and has successfully been used for annotations and implementations. AVAILABILITY The latest stable version of EDAM is available in OWL format from http://edamontology.org/EDAM.owl and in OBO format from http://edamontology.org/EDAM.obo. It can be viewed online at the NCBO BioPortal and the EBI Ontology Lookup Service. For documentation and license please refer to http://edamontology.org. This article describes version 1.2 available at http://edamontology.org/EDAM_1.2.owl. CONTACT jison@ebi.ac.uk.
Collapse
Affiliation(s)
- Jon Ison
- EMBL European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Abstract
The emergence of genomics tools for the evolutionary and comparative biology community led to a rapid explosion in the number of online resources targeted at this specialized community, including Web-based comparative genomics software, such as the Artemis Comparison Tool (WebACT); databases, such as PaleoDB, Global Biodiversity Information Facility, and TreeBase; and knowledge frameworks, such as the Evolution Ontology. Unfortunately, these providers are largely independent of one another and therefore the individual resources do not share any centralized plan for how the data or tools would or should be provided. As a result, there are a myriad of often incompatible technologies and frameworks being used by this community of providers. In this chapter, we explore approaches to online resource publication, both those already in use by the community, as well as new and emergent frameworks and standards. Exploration of the strengths and weaknesses of each approach, together with a brief exploration of the philosophy or informatics theory behind the varying approaches, will hopefully help readers as they navigate this data space. The discussion is constructed such that it lays the groundwork for exploration of a new global standard for data and knowledge representation--"The Semantic Web"--that holds promise of providing solutions to many of the complexities users face in their attempts to discover and integrate biodiversity data, and examples are provided.
Collapse
Affiliation(s)
- Mark D Wilkinson
- Department of Medical Genetics, University of British Columbia and PI Bioinformatics, Heart + Lung Institute at St. Paul's Hospital, Vancouver, BC, Canada.
| |
Collapse
|
6
|
Zhao X, Liu E, Clapworthy GJ, Viceconti M, Testi D. SOA-based digital library services and composition in biomedical applications. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2012; 106:219-233. [PMID: 20846740 DOI: 10.1016/j.cmpb.2010.08.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2009] [Revised: 07/21/2010] [Accepted: 08/10/2010] [Indexed: 05/29/2023]
Abstract
Carefully collected, high-quality data are crucial in biomedical visualization, and it is important that the user community has ready access to both this data and the high-performance computing resources needed by the complex, computational algorithms that will process it. Biological researchers generally require data, tools and algorithms from multiple providers to achieve their goals. This paper illustrates our response to the problems that result from this. The Living Human Digital Library (LHDL) project presented in this paper has taken advantage of Web Services to build a biomedical digital library infrastructure that allows clinicians and researchers not only to preserve, trace and share data resources, but also to collaborate at the data-processing level.
Collapse
Affiliation(s)
- Xia Zhao
- Department of Computer Science & Technology, University of Bedfordshire, United Kingdom
| | | | | | | | | |
Collapse
|
7
|
A semantic approach for the requirement-driven discovery of web resources in the Life Sciences. Knowl Inf Syst 2012. [DOI: 10.1007/s10115-012-0498-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
8
|
Scientific Workflow, Provenance, and Data Modeling Challenges and Approaches. JOURNAL ON DATA SEMANTICS 2012. [DOI: 10.1007/s13740-012-0004-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
9
|
Zhao A, Ma Y. Constructing Service Semantic Link Network Based on the Probabilistic Graphical Model. INT J COMPUT INT SYS 2012. [DOI: 10.1080/18756891.2012.747660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
|
10
|
de A.R. Gonçalves JC, de Oliveira D, Ocaña KACS, Ogasawara E, Mattoso M. Using Domain-Specific Data to Enhance Scientific Workflow Steering Queries. LECTURE NOTES IN COMPUTER SCIENCE 2012. [DOI: 10.1007/978-3-642-34222-6_12] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
11
|
Lamprecht AL, Naujokat S, Margaria T, Steffen B. Semantics-based composition of EMBOSS services. J Biomed Semantics 2011; 2 Suppl 1:S5. [PMID: 21388574 PMCID: PMC3105497 DOI: 10.1186/2041-1480-2-s1-s5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background More than in other domains the heterogeneous services world in bioinformatics demands for a methodology to classify and relate resources in a both human and machine accessible manner. The Semantic Web, which is meant to address exactly this challenge, is currently one of the most ambitious projects in computer science. Collective efforts within the community have already led to a basis of standards for semantic service descriptions and meta-information. In combination with process synthesis and planning methods, such knowledge about types and services can facilitate the automatic composition of workflows for particular research questions. Results In this study we apply the synthesis methodology that is available in the Bio-jETI workflow management framework for the semantics-based composition of EMBOSS services. EMBOSS (European Molecular Biology Open Software Suite) is a collection of 350 tools (March 2010) for various sequence analysis tasks, and thus a rich source of services and types that imply comprehensive domain models for planning and synthesis approaches. We use and compare two different setups of our EMBOSS synthesis domain: 1) a manually defined domain setup where an intuitive, high-level, semantically meaningful nomenclature is applied to describe the input/output behavior of the single EMBOSS tools and their classifications, and 2) a domain setup where this information has been automatically derived from the EMBOSS Ajax Command Definition (ACD) files and the EMBRACE Data and Methods ontology (EDAM). Our experiments demonstrate that these domain models in combination with our synthesis methodology greatly simplify working with the large, heterogeneous, and hence manually intractable EMBOSS collection. However, they also show that with the information that can be derived from the (current) ACD files and EDAM ontology alone, some essential connections between services can not be recognized. Conclusions Our results show that adequate domain modeling requires to incorporate as much domain knowledge as possible, far beyond the mere technical aspects of the different types and services. Finding or defining semantically appropriate service and type descriptions is a difficult task, but the bioinformatics community appears to be on the right track towards a Life Science Semantic Web, which will eventually allow automatic service composition methods to unfold their full potential.
Collapse
Affiliation(s)
- Anna-Lena Lamprecht
- Chair for Programming Systems, Technical University Dortmund, Dortmund, D-44227, Germany.
| | | | | | | |
Collapse
|
12
|
Afzal H, Eales J, Stevens R, Nenadic G. Mining semantic networks of bioinformatics e-resources from the literature. J Biomed Semantics 2011; 2 Suppl 1:S4. [PMID: 21388573 PMCID: PMC3105496 DOI: 10.1186/2041-1480-2-s1-s4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There have been a number of recent efforts (e.g. BioCatalogue, BioMoby) to systematically catalogue bioinformatics tools, services and datasets. These efforts rely on manual curation, making it difficult to cope with the huge influx of various electronic resources that have been provided by the bioinformatics community. We present a text mining approach that utilises the literature to automatically extract descriptions and semantically profile bioinformatics resources to make them available for resource discovery and exploration through semantic networks that contain related resources. RESULTS The method identifies the mentions of resources in the literature and assigns a set of co-occurring terminological entities (descriptors) to represent them. We have processed 2,691 full-text bioinformatics articles and extracted profiles of 12,452 resources containing associated descriptors with binary and tf*idf weights. Since such representations are typically sparse (on average 13.77 features per resource), we used lexical kernel metrics to identify semantically related resources via descriptor smoothing. Resources are then clustered or linked into semantic networks, providing the users (bioinformaticians, curators and service/tool crawlers) with a possibility to explore algorithms, tools, services and datasets based on their relatedness. Manual exploration of links between a set of 18 well-known bioinformatics resources suggests that the method was able to identify and group semantically related entities. CONCLUSIONS The results have shown that the method can reconstruct interesting functional links between resources (e.g. linking data types and algorithms), in particular when tf*idf-like weights are used for profiling. This demonstrates the potential of combining literature mining and simple lexical kernel methods to model relatedness between resource descriptors in particular when there are few features, thus potentially improving the resource description, discovery and exploration process. The resource profiles are available at http://gnode1.mib.man.ac.uk/bioinf/semnets.html.
Collapse
Affiliation(s)
- Hammad Afzal
- College of Telecommunication Engineering, National University of Sciences and Technology, Islamabad, Pakistan
- Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland
| | - James Eales
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Robert Stevens
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Goran Nenadic
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| |
Collapse
|
13
|
Web Service management system for bioinformatics research: a case study. SERVICE ORIENTED COMPUTING AND APPLICATIONS 2011. [DOI: 10.1007/s11761-011-0076-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
14
|
Miles A, Zhao J, Klyne G, White-Cooper H, Shotton D. OpenFlyData: an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster. J Biomed Inform 2011; 43:752-61. [PMID: 20382263 DOI: 10.1016/j.jbi.2010.04.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2009] [Revised: 04/01/2010] [Accepted: 04/05/2010] [Indexed: 01/02/2023]
Abstract
MOTIVATION Integrating heterogeneous data across distributed sources is a major requirement for in silico bioinformatics supporting translational research. For example, genome-scale data on patterns of gene expression in the fruit fly Drosophila melanogaster are widely used in functional genomic studies in many organisms to inform candidate gene selection and validate experimental results. However, current data integration solutions tend to be heavy weight, and require significant initial and ongoing investment of effort. Development of a common Web-based data integration infrastructure (a.k.a. data web), using Semantic Web standards, promises to alleviate these difficulties, but little is known about the feasibility, costs, risks or practical means of migrating to such an infrastructure. RESULTS We describe the development of OpenFlyData, a proof-of-concept system integrating gene expression data on D. melanogaster, combining Semantic Web standards with light-weight approaches to Web programming based on Web 2.0 design patterns. To support researchers designing and validating functional genomic studies, OpenFlyData includes user-facing search applications providing intuitive access to and comparison of gene expression data from FlyAtlas, the BDGP in situ database, and FlyTED, using data from FlyBase to expand and disambiguate gene names. OpenFlyData's services are also openly accessible, and are available for reuse by other bioinformaticians and application developers. Semi-automated methods and tools were developed to support labour- and knowledge-intensive tasks involved in deploying SPARQL services. These include methods for generating ontologies and relational-to-RDF mappings for relational databases, which we illustrate using the FlyBase Chado database schema; and methods for mapping gene identifiers between databases. The advantages of using Semantic Web standards for biomedical data integration are discussed, as are open issues. In particular, although the performance of open source SPARQL implementations is sufficient to query gene expression data directly from user-facing applications such as Web-based data fusions (a.k.a. mashups), we found open SPARQL endpoints to be vulnerable to denial-of-service-type problems, which must be mitigated to ensure reliability of services based on this standard. These results are relevant to data integration activities in translational bioinformatics. AVAILABILITY The gene expression search applications and SPARQL endpoints developed for OpenFlyData are deployed at http://openflydata.org. FlyUI, a library of JavaScript widgets providing re-usable user-interface components for Drosophila gene expression data, is available at http://flyui.googlecode.com. Software and ontologies to support transformation of data from FlyBase, FlyAtlas, BDGP and FlyTED to RDF are available at http://openflydata.googlecode.com. SPARQLite, an implementation of the SPARQL protocol, is available at http://sparqlite.googlecode.com. All software is provided under the GPL version 3 open source license.
Collapse
Affiliation(s)
- Alistair Miles
- Department of Zoology, University of Oxford, Oxford OX1 3PS, UK
| | | | | | | | | |
Collapse
|
15
|
Bhagat J, Tanoh F, Nzuobontane E, Laurent T, Orlowski J, Roos M, Wolstencroft K, Aleksejevs S, Stevens R, Pettifer S, Lopez R, Goble CA. BioCatalogue: a universal catalogue of web services for the life sciences. Nucleic Acids Res 2010; 38:W689-94. [PMID: 20484378 PMCID: PMC2896129 DOI: 10.1093/nar/gkq394] [Citation(s) in RCA: 162] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2010] [Revised: 04/27/2010] [Accepted: 04/29/2010] [Indexed: 12/01/2022] Open
Abstract
The use of Web Services to enable programmatic access to on-line bioinformatics is becoming increasingly important in the Life Sciences. However, their number, distribution and the variable quality of their documentation can make their discovery and subsequent use difficult. A Web Services registry with information on available services will help to bring together service providers and their users. The BioCatalogue (http://www.biocatalogue.org/) provides a common interface for registering, browsing and annotating Web Services to the Life Science community. Services in the BioCatalogue can be described and searched in multiple ways based upon their technical types, bioinformatics categories, user tags, service providers or data inputs and outputs. They are also subject to constant monitoring, allowing the identification of service problems and changes and the filtering-out of unavailable or unreliable resources. The system is accessible via a human-readable 'Web 2.0'-style interface and a programmatic Web Service interface. The BioCatalogue follows a community approach in which all services can be registered, browsed and incrementally documented with annotations by any member of the scientific community.
Collapse
Affiliation(s)
- Jiten Bhagat
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, EMBL European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, PL-02-109 Warsaw, Poland Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ and Human Genetics Department, Leiden University Medical Centre, NL-2333 ZA Leiden, Netherlands
| | - Franck Tanoh
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, EMBL European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, PL-02-109 Warsaw, Poland Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ and Human Genetics Department, Leiden University Medical Centre, NL-2333 ZA Leiden, Netherlands
| | - Eric Nzuobontane
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, EMBL European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, PL-02-109 Warsaw, Poland Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ and Human Genetics Department, Leiden University Medical Centre, NL-2333 ZA Leiden, Netherlands
| | - Thomas Laurent
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, EMBL European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, PL-02-109 Warsaw, Poland Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ and Human Genetics Department, Leiden University Medical Centre, NL-2333 ZA Leiden, Netherlands
| | - Jerzy Orlowski
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, EMBL European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, PL-02-109 Warsaw, Poland Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ and Human Genetics Department, Leiden University Medical Centre, NL-2333 ZA Leiden, Netherlands
| | - Marco Roos
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, EMBL European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, PL-02-109 Warsaw, Poland Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ and Human Genetics Department, Leiden University Medical Centre, NL-2333 ZA Leiden, Netherlands
| | - Katy Wolstencroft
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, EMBL European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, PL-02-109 Warsaw, Poland Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ and Human Genetics Department, Leiden University Medical Centre, NL-2333 ZA Leiden, Netherlands
| | - Sergejs Aleksejevs
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, EMBL European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, PL-02-109 Warsaw, Poland Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ and Human Genetics Department, Leiden University Medical Centre, NL-2333 ZA Leiden, Netherlands
| | - Robert Stevens
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, EMBL European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, PL-02-109 Warsaw, Poland Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ and Human Genetics Department, Leiden University Medical Centre, NL-2333 ZA Leiden, Netherlands
| | - Steve Pettifer
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, EMBL European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, PL-02-109 Warsaw, Poland Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ and Human Genetics Department, Leiden University Medical Centre, NL-2333 ZA Leiden, Netherlands
| | - Rodrigo Lopez
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, EMBL European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, PL-02-109 Warsaw, Poland Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ and Human Genetics Department, Leiden University Medical Centre, NL-2333 ZA Leiden, Netherlands
| | - Carole A. Goble
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, EMBL European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, PL-02-109 Warsaw, Poland Informatics Institute, University of Amsterdam, Amsterdam, 1098 SJ and Human Genetics Department, Leiden University Medical Centre, NL-2333 ZA Leiden, Netherlands
| |
Collapse
|
16
|
Pettifer S, Ison J, Kalas M, Thorne D, McDermott P, Jonassen I, Liaquat A, Fernández JM, Rodriguez JM, Pisano DG, Blanchet C, Uludag M, Rice P, Bartaseviciute E, Rapacki K, Hekkelman M, Sand O, Stockinger H, Clegg AB, Bongcam-Rudloff E, Salzemann J, Breton V, Attwood TK, Cameron G, Vriend G. The EMBRACE web service collection. Nucleic Acids Res 2010; 38:W683-8. [PMID: 20462862 PMCID: PMC2896104 DOI: 10.1093/nar/gkq297] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2010] [Revised: 03/29/2010] [Accepted: 04/07/2010] [Indexed: 12/03/2022] Open
Abstract
The EMBRACE (European Model for Bioinformatics Research and Community Education) web service collection is the culmination of a 5-year project that set out to investigate issues involved in developing and deploying web services for use in the life sciences. The project concluded that in order for web services to achieve widespread adoption, standards must be defined for the choice of web service technology, for semantically annotating both service function and the data exchanged, and a mechanism for discovering services must be provided. Building on this, the project developed: EDAM, an ontology for describing life science web services; BioXSD, a schema for exchanging data between services; and a centralized registry (http://www.embraceregistry.net) that collects together around 1000 services developed by the consortium partners. This article presents the current status of the collection and its associated recommendations and standards definitions.
Collapse
Affiliation(s)
- Steve Pettifer
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Hartung M, Loebe F, Herre H, Rahm E. Management of evolving semantic grid metadata within a collaborative platform. Inf Sci (N Y) 2010. [DOI: 10.1016/j.ins.2009.08.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
18
|
Chen H, Xie G. The use of web ontology languages and other semantic web tools in drug discovery. Expert Opin Drug Discov 2010; 5:413-23. [DOI: 10.1517/17460441003762709] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
19
|
Tsiknakis M, Sfakianakis S, Zacharioudakis G, Umakis L, Kanterakis A, Potamias G, Kafetzopoulos D. A semantically aware platform for the authoring and secure enactment of bioinformatics workflows. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2010; 2009:5625-8. [PMID: 19964401 DOI: 10.1109/iembs.2009.5333787] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Recent advances in the field of bioinformatics present a number of challenges in the secure and efficient management and analysis of biological data resources. Workflow technologies aim to assist scientists and domain experts in the design of complex, long running, data and computing intensive experiments that involve many data processing and analysis tasks with the objective of generating new knowledge or formulate new hypothesis. In this paper we present a bioinformatics workflow authoring and execution environment that intends to greatly facilitate the whole lifecycle of such experiments. Emphasis is given on the security and ethical requirements of these scenarios and the corresponding technological response. In addition we present our semantic framework used for supporting specific user-requirements related to the reasoning and inference capabilities of the environment.
Collapse
Affiliation(s)
- M Tsiknakis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, Crete, Greece.
| | | | | | | | | | | | | |
Collapse
|
20
|
Martín-Requena V, Ríos J, García M, Ramírez S, Trelles O. jORCA: easily integrating bioinformatics Web Services. Bioinformatics 2010; 26:553-9. [DOI: 10.1093/bioinformatics/btp709] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
21
|
Semantically-Guided Workflow Construction in Taverna: The SADI and BioMoby Plug-Ins. LECTURE NOTES IN COMPUTER SCIENCE 2010. [DOI: 10.1007/978-3-642-16558-0_26] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
22
|
Magallanes: a web services discovery and automatic workflow composition tool. BMC Bioinformatics 2009; 10:334. [PMID: 19832968 PMCID: PMC2771019 DOI: 10.1186/1471-2105-10-334] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2009] [Accepted: 10/15/2009] [Indexed: 11/25/2022] Open
Abstract
Background To aid in bioinformatics data processing and analysis, an increasing number of web-based applications are being deployed. Although this is a positive circumstance in general, the proliferation of tools makes it difficult to find the right tool, or more importantly, the right set of tools that can work together to solve real complex problems. Results Magallanes (Magellan) is a versatile, platform-independent Java library of algorithms aimed at discovering bioinformatics web services and associated data types. A second important feature of Magallanes is its ability to connect available and compatible web services into workflows that can process data sequentially to reach a desired output given a particular input. Magallanes' capabilities can be exploited both as an API or directly accessed through a graphic user interface. The Magallanes' API is freely available for academic use, and together with Magallanes application has been tested in MS-Windows™ XP and Unix-like operating systems. Detailed implementation information, including user manuals and tutorials, is available at . Conclusion Different implementations of the same client (web page, desktop applications, web services, etc.) have been deployed and are currently in use in real installations such as the National Institute of Bioinformatics (Spain) and the ACGT-EU project. This shows the potential utility and versatility of the software library, including the integration of novel tools in the domain and with strong evidences in the line of facilitate the automatic discovering and composition of workflows.
Collapse
|
23
|
Sutherland K, McLeod K, Ferguson G, Burger A. Knowledge-driven enhancements for task composition in bioinformatics. BMC Bioinformatics 2009; 10 Suppl 10:S12. [PMID: 19796396 PMCID: PMC2755820 DOI: 10.1186/1471-2105-10-s10-s12] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND A key application area of semantic technologies is the fast-developing field of bioinformatics. Sealife was a project within this field with the aim of creating semantics-based web browsing capabilities for the Life Sciences. This includes meaningfully linking significant terms from the text of a web page to executable web services. It also involves the semantic mark-up of biological terms, linking them to biomedical ontologies, then discovering and executing services based on terms that interest the user. RESULTS A system was produced which allows a user to identify terms of interest on a web page and subsequently connects these to a choice of web services which can make use of these inputs. Elements of Artificial Intelligence Planning build on this to present a choice of higher level goals, which can then be broken down to construct a workflow. An Argumentation System was implemented to evaluate the results produced by three different gene expression databases. An evaluation of these modules was carried out on users from a variety of backgrounds. Users with little knowledge of web services were able to achieve tasks that used several services in much less time than they would have taken to do this manually. The Argumentation System was also considered a useful resource and feedback was collected on the best way to present results. CONCLUSION Overall the system represents a move forward in helping users to both construct workflows and analyse results by incorporating specific domain knowledge into the software. It also provides a mechanism by which web pages can be linked to web services. However, this work covers a specific domain and much co-ordinated effort is needed to make all web services available for use in such a way, i.e. the integration of underlying knowledge is a difficult but essential task.
Collapse
Affiliation(s)
- Karen Sutherland
- grid.9531.e0000000106567444Department of Computer Science, Heriot-Watt University, Edinburgh, UK
| | - Kenneth McLeod
- grid.9531.e0000000106567444Department of Computer Science, Heriot-Watt University, Edinburgh, UK
| | - Gus Ferguson
- grid.9531.e0000000106567444Department of Computer Science, Heriot-Watt University, Edinburgh, UK
| | - Albert Burger
- grid.9531.e0000000106567444Department of Computer Science, Heriot-Watt University, Edinburgh, UK ,grid.415854.90000 0004 0605 7892MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, Edinburgh, UK
| |
Collapse
|
24
|
Prosdocimi F, Chisham B, Pontelli E, Thompson JD, Stoltzfus A. Initial implementation of a comparative data analysis ontology. Evol Bioinform Online 2009; 5:47-66. [PMID: 19812726 PMCID: PMC2747124 DOI: 10.4137/ebo.s2320] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Comparative analysis is used throughout biology. When entities under comparison (e.g. proteins, genomes, species) are related by descent, evolutionary theory provides a framework that, in principle, allows N-ary comparisons of entities, while controlling for non-independence due to relatedness. Powerful software tools exist for specialized applications of this approach, yet it remains under-utilized in the absence of a unifying informatics infrastructure. A key step in developing such an infrastructure is the definition of a formal ontology. The analysis of use cases and existing formalisms suggests that a significant component of evolutionary analysis involves a core problem of inferring a character history, relying on key concepts: “Operational Taxonomic Units” (OTUs), representing the entities to be compared; “character-state data” representing the observations compared among OTUs; “phylogenetic tree”, representing the historical path of evolution among the entities; and “transitions”, the inferred evolutionary changes in states of characters that account for observations. Using the Web Ontology Language (OWL), we have defined these and other fundamental concepts in a Comparative Data Analysis Ontology (CDAO). CDAO has been evaluated for its ability to represent token data sets and to support simple forms of reasoning. With further development, CDAO will provide a basis for tools (for semantic transformation, data retrieval, validation, integration, etc.) that make it easier for software developers and biomedical researchers to apply evolutionary methods of inference to diverse types of data, so as to integrate this powerful framework for reasoning into their research.
Collapse
Affiliation(s)
- Francisco Prosdocimi
- Department of Structural Biology and Genomics, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), F-67400 Illkirch, France
| | | | | | | | | |
Collapse
|
25
|
Hall W, De Roure D, Shadbolt N. The evolution of the Web and implications for eResearch. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2009; 367:991-1001. [PMID: 19087929 DOI: 10.1098/rsta.2008.0252] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The hypertext visionaries foresaw the potential of richly interlinked global information systems for advancing human knowledge. The Web provided the infrastructure to enable those ideas to become a reality, and it quickly became a platform for collaborative research and data sharing. As the Web has evolved, new ways of using it for eResearch have emerged, such as the social networking facilities enabled by Web 2.0 technologies. The next generation of the Web-the so-called Semantic Web--is now on the horizon, which will again enable new types of collaborative research to emerge. If we are to understand and anticipate these new modes of collaboration, we need a discipline that studies the Web as a whole. Web science is this discipline.
Collapse
Affiliation(s)
- Wendy Hall
- School of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK.
| | | | | |
Collapse
|