1
|
Greenbaum D, Rozowsky J, Stodden V, Gerstein M. Structuring supplemental materials in support of reproducibility. Genome Biol 2017; 18:64. [PMID: 28381262 PMCID: PMC5382465 DOI: 10.1186/s13059-017-1205-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Supplements are increasingly important to the scientific record, particularly in genomics. However, they are often underutilized. Optimally, supplements should make results findable, accessible, interoperable, and reusable (i.e., "FAIR"). Moreover, properly off-loading to them the data and detail in a paper could make the main text more readable. We propose a hierarchical organization for supplements, with some parts paralleling and "shadowing" the main text and other elements branching off from it, and we suggest a specific formatting to make this structure explicit. Furthermore, sections of the supplement could be presented in multiple scientific "dialects", including machine-readable and lay-friendly formats.
Collapse
Affiliation(s)
- Dov Greenbaum
- Zvi Meitar Institute for Legal Implications of Emerging Technologies, Radzyner Law School, Interdisciplinary Center, Herzliya, Israel.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Joel Rozowsky
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Victoria Stodden
- Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, 501 E Daniel St, Champaign, IL, 61820, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA. .,Integrated Graduate Program in Physical and Engineering Biology, Yale University, New Haven, CT, 06520, USA. .,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA. .,Department of Computer Science, Yale University, New Haven, CT, 06520, USA.
| |
Collapse
|
2
|
Mısırlı G, Hallinan J, Pocock M, Lord P, McLaughlin JA, Sauro H, Wipat A. Data Integration and Mining for Synthetic Biology Design. ACS Synth Biol 2016; 5:1086-1097. [PMID: 27110921 DOI: 10.1021/acssynbio.5b00295] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
One aim of synthetic biologists is to create novel and predictable biological systems from simpler modular parts. This approach is currently hampered by a lack of well-defined and characterized parts and devices. However, there is a wealth of existing biological information, which can be used to identify and characterize biological parts, and their design constraints in the literature and numerous biological databases. However, this information is spread among these databases in many different formats. New computational approaches are required to make this information available in an integrated format that is more amenable to data mining. A tried and tested approach to this problem is to map disparate data sources into a single data set, with common syntax and semantics, to produce a data warehouse or knowledge base. Ontologies have been used extensively in the life sciences, providing this common syntax and semantics as a model for a given biological domain, in a fashion that is amenable to computational analysis and reasoning. Here, we present an ontology for applications in synthetic biology design, SyBiOnt, which facilitates the modeling of information about biological parts and their relationships. SyBiOnt was used to create the SyBiOntKB knowledge base, incorporating and building upon existing life sciences ontologies and standards. The reasoning capabilities of ontologies were then applied to automate the mining of biological parts from this knowledge base. We propose that this approach will be useful to speed up synthetic biology design and ultimately help facilitate the automation of the biological engineering life cycle.
Collapse
Affiliation(s)
- Göksel Mısırlı
- School
of Computing Science, Newcastle University, NE1 7RU Newcastle
upon Tyne, United Kingdom
| | - Jennifer Hallinan
- School
of Computing Science, Newcastle University, NE1 7RU Newcastle
upon Tyne, United Kingdom
| | - Matthew Pocock
- School
of Computing Science, Newcastle University, NE1 7RU Newcastle
upon Tyne, United Kingdom
- Turing Ate My Hamster Ltd, NE27
0RT Newcastle upon Tyne, United Kingdom
| | - Phillip Lord
- School
of Computing Science, Newcastle University, NE1 7RU Newcastle
upon Tyne, United Kingdom
| | | | - Herbert Sauro
- Department
of Bioengineering, University of Washington, Seattle, Washington 98105, United States
| | - Anil Wipat
- School
of Computing Science, Newcastle University, NE1 7RU Newcastle
upon Tyne, United Kingdom
| |
Collapse
|
3
|
Arntzen MØ, Boddie P, Frick R, Koehler CJ, Thiede B. Consolidation of proteomics data in the Cancer Proteomics database. Proteomics 2015; 15:3765-71. [DOI: 10.1002/pmic.201500144] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2015] [Revised: 06/30/2015] [Accepted: 08/24/2015] [Indexed: 11/09/2022]
Affiliation(s)
- Magnus Ø. Arntzen
- Biotechnology Centre of Oslo; University of Oslo; Oslo Norway
- Department of Chemistry, Biotechnology, and Food Science; Norwegian University of Life Sciences; Ås Norway
| | - Paul Boddie
- Biotechnology Centre of Oslo; University of Oslo; Oslo Norway
| | - Rahel Frick
- Biotechnology Centre of Oslo; University of Oslo; Oslo Norway
| | - Christian J. Koehler
- Biotechnology Centre of Oslo; University of Oslo; Oslo Norway
- Department of Biosciences; University of Oslo; Oslo Norway
| | - Bernd Thiede
- Biotechnology Centre of Oslo; University of Oslo; Oslo Norway
- Department of Biosciences; University of Oslo; Oslo Norway
| |
Collapse
|
4
|
Rebholz-Schuhmann D, Grabmüller C, Kavaliauskas S, Croset S, Woollard P, Backofen R, Filsell W, Clark D. A case study: semantic integration of gene-disease associations for type 2 diabetes mellitus from literature and biomedical data resources. Drug Discov Today 2013; 19:882-9. [PMID: 24201223 DOI: 10.1016/j.drudis.2013.10.024] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2012] [Revised: 09/24/2013] [Accepted: 10/28/2013] [Indexed: 10/26/2022]
Abstract
In the Semantic Enrichment of the Scientific Literature (SESL) project, researchers from academia and from life science and publishing companies collaborated in a pre-competitive way to integrate and share information for type 2 diabetes mellitus (T2DM) in adults. This case study exposes benefits from semantic interoperability after integrating the scientific literature with biomedical data resources, such as UniProt Knowledgebase (UniProtKB) and the Gene Expression Atlas (GXA). We annotated scientific documents in a standardized way, by applying public terminological resources for diseases and proteins, and other text-mining approaches. Eventually, we compared the genetic causes of T2DM across the data resources to demonstrate the benefits from the SESL triple store. Our solution enables publishers to distribute their content with little overhead into remote data infrastructures, such as into any Virtual Knowledge Broker.
Collapse
Affiliation(s)
- Dietrich Rebholz-Schuhmann
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; Computerlinguistik, Universität Zürich, Binzmühlestrasse 14, 8050 Zürich, Switzerland.
| | - Christoph Grabmüller
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Silvestras Kavaliauskas
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Samuel Croset
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter Woollard
- GlaxoSmithKline, GlaxoSmithKline Medicines Research Centre, Gunnels Wood Road, Stevenage SG1 2NY, UK
| | - Rolf Backofen
- Albert-Ludwigs-University Freiburg, Fahnenbergplatz, D-79085 Freiburg, Germany
| | - Wendy Filsell
- Unilever R&D, Colworth Science Park, Sharnbrook MK44 1LQ, UK
| | - Dominic Clark
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
5
|
Chatr-Aryamontri A, Winter A, Perfetto L, Briganti L, Licata L, Iannuccelli M, Castagnoli L, Cesareni G, Tyers M. Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases. BMC Bioinformatics 2011; 12 Suppl 8:S8. [PMID: 22151178 PMCID: PMC3269943 DOI: 10.1186/1471-2105-12-s8-s8] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Background The vast amount of data published in the primary biomedical literature represents a challenge for the automated extraction and codification of individual data elements. Biological databases that rely solely on manual extraction by expert curators are unable to comprehensively annotate the information dispersed across the entire biomedical literature. The development of efficient tools based on natural language processing (NLP) systems is essential for the selection of relevant publications, identification of data attributes and partially automated annotation. One of the tasks of the Biocreative 2010 Challenge III was devoted to the evaluation of NLP systems developed to identify articles for curation and extraction of protein-protein interaction (PPI) data. Results The Biocreative 2010 competition addressed three tasks: gene normalization, article classification and interaction method identification. The BioGRID and MINT protein interaction databases both participated in the generation of the test publication set for gene normalization, annotated the development and test sets for article classification, and curated the test set for interaction method classification. These test datasets served as a gold standard for the evaluation of data extraction algorithms. Conclusion The development of efficient tools for extraction of PPI data is a necessary step to achieve full curation of the biomedical literature. NLP systems can in the first instance facilitate expert curation by refining the list of candidate publications that contain PPI data; more ambitiously, NLP approaches may be able to directly extract relevant information from full-text articles for rapid inspection by expert curators. Close collaboration between biological databases and NLP systems developers will continue to facilitate the long-term objectives of both disciplines.
Collapse
|
6
|
Retrieval, alignment, and clustering of computational models based on semantic annotations. Mol Syst Biol 2011; 7:512. [PMID: 21772260 PMCID: PMC3159965 DOI: 10.1038/msb.2011.41] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2010] [Accepted: 05/31/2011] [Indexed: 01/17/2023] Open
Abstract
As the number of computational systems biology models increases, new methods are needed to explore their content and build connections with experimental data. In this Perspective article, the authors propose a flexible semantic framework that can help achieve these aims. The exploding number of computational models produced by Systems Biologists over the last years is an invitation to structure and exploit this new wealth of information. Researchers would like to trace models relevant to specific scientific questions, to explore their biological content, to align and combine them, and to match them with experimental data. To automate these processes, it is essential to consider semantic annotations, which describe their biological meaning. As a prerequisite for a wide range of computational methods, we propose general and flexible similarity measures for Systems Biology models computed from semantic annotations. By using these measures and a large extensible ontology, we implement a platform that can retrieve, cluster, and align Systems Biology models and experimental data sets. At present, its major application is the search for relevant models in the BioModels Database, starting from initial models, data sets, or lists of biological concepts. Beyond similarity searches, the representation of models by semantic feature vectors may pave the way for visualisation, exploration, and statistical analysis of large collections of models and corresponding data.
Collapse
|