1
|
Arabi-Jeshvaghani F, Javadi-Zarnaghi F, Löchel HF, Martin R, Heider D. LAMPPrimerBank, a manually curated database of experimentally validated loop-mediated isothermal amplification primers for detection of respiratory pathogens. Infection 2023; 51:1809-1818. [PMID: 37828369 DOI: 10.1007/s15010-023-02100-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 09/13/2023] [Indexed: 10/14/2023]
Abstract
PURPOSE AND METHODS The emergence of coronavirus disease 2019 (COVID-19) has once again affirmed the significant threat of respiratory infections to global public health and the utmost importance of prompt diagnosis in managing and mitigating any pandemic. The nucleic acid amplification test (NAAT) is the primary detection method for most pathogens. Loop-mediated isothermal amplification (LAMP) is a rapid, simple, sensitive, and specific epitome of isothermal NAAT performed using a set of four to six primers. Primer design is a fundamental step in LAMP assays, with several complexities and experimental screening requirements. To address this challenge, an online database is presented here. Its workflow comprises three steps: literature aggregation, data curation, and database and website implementation. RESULTS LAMPPrimerBank ( https://lampprimerbank.mathematik.uni-marburg.de ) is a manually curated database dedicated to experimentally validated LAMP primers, their peculiarities of assays, and accompanying literature, with a primary emphasis on respiratory pathogens. LAMPPrimerBank, with its user-friendly web interface and an open application programming interface, enables the accelerated and facile exploration, comparison, and exportation of LAMP primer sequences and their respective information from the massively scattered literature. LAMPPrimerBank currently comprises LAMP primers for diagnosing viral, bacterial, and fungal respiratory pathogens. Additionally, to address the challenge of false-positive results generated by nonspecific amplifications, LAMPPrimerBank computationally predicted and visualized the sizes of LAMP products for recorded primer sets in the database. CONCLUSION LAMPPrimerBank, as a pioneering database in the rapidly expanding field of isothermal NAAT, endeavors to confront the two challenges of the LAMP: primer design and discrimination of false-positive results.
Collapse
Affiliation(s)
- Fatemeh Arabi-Jeshvaghani
- Department of Cell and Molecular Biology & Microbiology, Faculty of Biological Science and Technology, University of Isfahan, Isfahan, Iran
| | - Fatemeh Javadi-Zarnaghi
- Department of Cell and Molecular Biology & Microbiology, Faculty of Biological Science and Technology, University of Isfahan, Isfahan, Iran.
| | - Hannah Franziska Löchel
- Department of Data Science in Biomedicine, Faculty of Mathematics and Computer Science, University of Marburg, Marburg, Germany
| | - Roman Martin
- Department of Data Science in Biomedicine, Faculty of Mathematics and Computer Science, University of Marburg, Marburg, Germany
| | - Dominik Heider
- Department of Data Science in Biomedicine, Faculty of Mathematics and Computer Science, University of Marburg, Marburg, Germany
| |
Collapse
|
2
|
Lubiana T, Lopes R, Medeiros P, Silva JC, Goncalves ANA, Maracaja-Coutinho V, Nakaya HI. Ten quick tips for harnessing the power of ChatGPT in computational biology. PLoS Comput Biol 2023; 19:e1011319. [PMID: 37561669 PMCID: PMC10414555 DOI: 10.1371/journal.pcbi.1011319] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023] Open
Affiliation(s)
- Tiago Lubiana
- School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
| | - Rafael Lopes
- Department of Epidemiology of Microbial Diseases and Public Health Modeling Unit, Yale School of Public Health, New Haven, Connecticut, United States of America
| | | | - Juan Carlo Silva
- School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
| | | | - Vinicius Maracaja-Coutinho
- Advanced Center for Chronic Diseases, Universidad de Chile, Santiago, Chile
- Centro de Modelamiento Molecular, Biofísica y Bioinformática—CM2B2, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
- ANID Anillo ACT210004 SYSTEMIX, Rancagua, Chile
- Anillo Inflammation in HIV/AIDS—InflammAIDS, Santiago, Chile
- Beagle Bioinformatics, São Paulo, Brasil & Santiago, Chile
| | - Helder I. Nakaya
- School of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
- Hospital Israelita Albert Einstein, São Paulo, Brazil
| |
Collapse
|
3
|
Mazein A, Acencio ML, Balaur I, Rougny A, Welter D, Niarakis A, Ramirez Ardila D, Dogrusoz U, Gawron P, Satagopam V, Gu W, Kremer A, Schneider R, Ostaszewski M. A guide for developing comprehensive systems biology maps of disease mechanisms: planning, construction and maintenance. FRONTIERS IN BIOINFORMATICS 2023; 3:1197310. [PMID: 37426048 PMCID: PMC10325725 DOI: 10.3389/fbinf.2023.1197310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 06/09/2023] [Indexed: 07/11/2023] Open
Abstract
As a conceptual model of disease mechanisms, a disease map integrates available knowledge and is applied for data interpretation, predictions and hypothesis generation. It is possible to model disease mechanisms on different levels of granularity and adjust the approach to the goals of a particular project. This rich environment together with requirements for high-quality network reconstruction makes it challenging for new curators and groups to be quickly introduced to the development methods. In this review, we offer a step-by-step guide for developing a disease map within its mainstream pipeline that involves using the CellDesigner tool for creating and editing diagrams and the MINERVA Platform for online visualisation and exploration. We also describe how the Neo4j graph database environment can be used for managing and querying efficiently such a resource. For assessing the interoperability and reproducibility we apply FAIR principles.
Collapse
Affiliation(s)
- Alexander Mazein
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Marcio Luis Acencio
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Irina Balaur
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | | | - Danielle Welter
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Anna Niarakis
- Université Paris-Saclay, Laboratoire Européen de Recherche Pour la Polyarthrite Rhumatoïde–Genhotel, University Evry, Evry, France
- Lifeware Group, Inria Saclay-Ile de France, Palaiseau, France
| | - Diana Ramirez Ardila
- ITTM Information Technology for Translational Medicine, Esch-sur-Alzette, Luxemburg
| | - Ugur Dogrusoz
- Computer Engineering Department, Bilkent University, Ankara, Türkiye
| | - Piotr Gawron
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Venkata Satagopam
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Wei Gu
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Andreas Kremer
- ITTM Information Technology for Translational Medicine, Esch-sur-Alzette, Luxemburg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| |
Collapse
|
4
|
Gavgani HN, Grotewold E, Gray J. Methodology for Constructing a Knowledgebase for Plant Gene Regulation Information. Methods Mol Biol 2023; 2698:277-300. [PMID: 37682481 DOI: 10.1007/978-1-0716-3354-0_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
The amount of biological data is growing at a rapid pace as many high-throughput omics technologies and data pipelines are developed. This is resulting in the growth of databases for DNA and protein sequences, gene expression, protein accumulation, structural, and localization information. The diversity and multi-omics nature of such bioinformatic data requires well-designed databases for flexible organization and presentation. Besides general-purpose online bioinformatic databases, users need narrowly focused online databases to quickly access a meaningful collection of related data for their research. Here, we describe the methodology used to implement a plant gene regulatory knowledgebase, with data, query, and tool features, as well as the ability to expand to accommodate future datasets. We exemplify this methodology for the GRASSIUS knowledgebase, but it is applicable to developing and updating similar plant gene regulatory knowledgebases. GRASSIUS organizes and presents gene regulatory data from grass species with a central focus on maize (Zea mays). The main class of data presented include not only the families of transcription factors (TFs) and co-regulators (CRs) but also protein-DNA interaction data, where available.
Collapse
Affiliation(s)
- Hadi Nayebi Gavgani
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
- Dandelions Therapeutics Inc., San Francisco, CA, USA
| | - Erich Grotewold
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
| | - John Gray
- Department of Biological Sciences, University of Toledo, Toledo, OH, USA.
| |
Collapse
|
5
|
Abstract
Applying computational statistics or machine learning methods to data is a key component of many scientific studies, in any field, but alone might not be sufficient to generate robust and reliable outcomes and results. Before applying any discovery method, preprocessing steps are necessary to prepare the data to the computational analysis. In this framework, data cleaning and feature engineering are key pillars of any scientific study involving data analysis and that should be adequately designed and performed since the first phases of the project. We call "feature" a variable describing a particular trait of a person or an observation, recorded usually as a column in a dataset. Even if pivotal, these data cleaning and feature engineering steps sometimes are done poorly or inefficiently, especially by beginners and unexperienced researchers. For this reason, we propose here our quick tips for data cleaning and feature engineering on how to carry out these important preprocessing steps correctly avoiding common mistakes and pitfalls. Although we designed these guidelines with bioinformatics and health informatics scenarios in mind, we believe they can more in general be applied to any scientific area. We therefore target these guidelines to any researcher or practitioners wanting to perform data cleaning or feature engineering. We believe our simple recommendations can help researchers and scholars perform better computational analyses that can lead, in turn, to more solid outcomes and more reliable discoveries.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
- * E-mail:
| | - Luca Oneto
- Dipartimento di Informatica Bioingegneria Robotica e Ingegneria dei Sistemi, Università di Genova, Genoa, Italy
- ZenaByte S.r.l., Genoa, Italy
| | - Erica Tavazzi
- Dipartimento di Ingegneria dell’Informazione, Università di Padova, Padua, Italy
| |
Collapse
|
6
|
Yoon A, Kim J, Donaldson DR. Big data curation framework: Curation actions and challenges. J Inf Sci 2022. [DOI: 10.1177/01655515221133528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Big data curation represents an emerging topic of inquiry but still in an early phase along its adoption curve. The term big data itself is a nebulous concept, and the differences between small data curation and big data curation are nuanced. The goal of this research is to provide a theoretical framework that identifies big data curation actions and associated curation challenges. This study is based on the practices of big data research and data curation by systematically examining literature. The outcome of the study includes the big data curation framework that provides overview of curation activities and concerns that are essential to perform such activities. The study also provides practical implications for libraries, archives, data repositories and other information organisations that concerns the issue of big data curation as big data presents a multidimensional array of exigencies in relation to the mission of those organisations.
Collapse
Affiliation(s)
- Ayoung Yoon
- Department of Library and Information Science, School of Informatics and Computing, Indiana University–Purdue University Indianapolis (IUPUI), USA
| | - Jihyun Kim
- Department of Library & Information Science, Ewha Womans University, South Korea
| | - Devan Ray Donaldson
- Department of Information and Library Science, Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, USA
| |
Collapse
|
7
|
Matentzoglu N, Goutte-Gattat D, Tan SZK, Balhoff JP, Carbon S, Caron AR, Duncan WD, Flack JE, Haendel M, Harris NL, Hogan WR, Hoyt CT, Jackson RC, Kim H, Kir H, Larralde M, McMurry JA, Overton JA, Peters B, Pilgrim C, Stefancsik R, Robb SMC, Toro S, Vasilevsky NA, Walls R, Mungall CJ, Osumi-Sutherland D. Ontology Development Kit: a toolkit for building, maintaining and standardizing biomedical ontologies. Database (Oxford) 2022; 2022:6754192. [PMID: 36208225 PMCID: PMC9547537 DOI: 10.1093/database/baac087] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/19/2022] [Accepted: 09/23/2022] [Indexed: 11/21/2022]
Abstract
Similar to managing software packages, managing the ontology life cycle involves multiple complex workflows such as preparing releases, continuous quality control checking and dependency management. To manage these processes, a diverse set of tools is required, from command-line utilities to powerful ontology-engineering environmentsr. Particularly in the biomedical domain, which has developed a set of highly diverse yet inter-dependent ontologies, standardizing release practices and metadata and establishing shared quality standards are crucial to enable interoperability. The Ontology Development Kit (ODK) provides a set of standardized, customizable and automatically executable workflows, and packages all required tooling in a single Docker image. In this paper, we provide an overview of how the ODK works, show how it is used in practice and describe how we envision it driving standardization efforts in our community. Database URL: https://github.com/INCATools/ontology-development-kit.
Collapse
Affiliation(s)
| | - Damien Goutte-Gattat
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3DY, UK
| | - Shawn Zheng Kai Tan
- Samples Phenotypes and Ontologies Team (SPOT), European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - James P Balhoff
- RENCI, University of North Carolina, Chapel Hill, NC, North Carolina 27517, USA
| | - Seth Carbon
- Berkeley Bioinformatics Open-source Projects (BBOP), Lawrence Berkeley National Laboratory (LBNL), 1 Cyclotron Road, Mailstop 977-0257, Berkeley, CA 94720, USA
| | - Anita R Caron
- Samples Phenotypes and Ontologies Team (SPOT), European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - William D Duncan
- Berkeley Bioinformatics Open-source Projects (BBOP), Lawrence Berkeley National Laboratory (LBNL), 1 Cyclotron Road, Mailstop 977-0257, Berkeley, CA 94720, USA,College of Dentistry; Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, William D. Duncan: 1395 Center Dr, Gainesville, William R. Hogan: 1600 SW Archer Rd, Gainesville, FL 32610, USA
| | - Joe E Flack
- School of Medicine, Johns Hopkins University, 733 N Broadway, Baltimore, Baltimore, MD 21205, USA
| | - Melissa Haendel
- University of Colorado Anschutz Medical Campus, 13001 E 17th Pl, Aurora, CO 80045, USA
| | - Nomi L Harris
- Berkeley Bioinformatics Open-source Projects (BBOP), Lawrence Berkeley National Laboratory (LBNL), 1 Cyclotron Road, Mailstop 977-0257, Berkeley, CA 94720, USA
| | - William R Hogan
- College of Dentistry; Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, William D. Duncan: 1395 Center Dr, Gainesville, William R. Hogan: 1600 SW Archer Rd, Gainesville, FL 32610, USA
| | - Charles Tapley Hoyt
- Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Avenue Armenise Building Room 109, Boston, MA 02115, USA
| | - Rebecca C Jackson
- Bend Informatics LLC, 5305 RIVER RD NORTH, STE B, KEIZER, OR 97303, USA
| | | | - Huseyin Kir
- Samples Phenotypes and Ontologies Team (SPOT), European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Martin Larralde
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Julie A McMurry
- University of Colorado Anschutz Medical Campus, 13001 E 17th Pl, Aurora, CO 80045, USA
| | | | - Bjoern Peters
- Institute for Allergy & Immunology, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Clare Pilgrim
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3DY, UK
| | - Ray Stefancsik
- Samples Phenotypes and Ontologies Team (SPOT), European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sofia MC Robb
- Stowers Institute for Medical Research, 1000 E. 50th St., Kansas City, MO 64110, USA
| | - Sabrina Toro
- University of Colorado Anschutz Medical Campus, 13001 E 17th Pl, Aurora, CO 80045, USA
| | - Nicole A Vasilevsky
- University of Colorado Anschutz Medical Campus, 13001 E 17th Pl, Aurora, CO 80045, USA
| | - Ramona Walls
- Critical Path Institute, 1730 E River Road, Tucson, AZ 85718, USA
| | - Christopher J Mungall
- Berkeley Bioinformatics Open-source Projects (BBOP), Lawrence Berkeley National Laboratory (LBNL), 1 Cyclotron Road, Mailstop 977-0257, Berkeley, CA 94720, USA
| | | |
Collapse
|
8
|
Hemedan AA, Niarakis A, Schneider R, Ostaszewski M. Boolean modelling as a logic-based dynamic approach in systems medicine. Comput Struct Biotechnol J 2022; 20:3161-3172. [PMID: 35782730 PMCID: PMC9234349 DOI: 10.1016/j.csbj.2022.06.035] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 06/14/2022] [Accepted: 06/14/2022] [Indexed: 11/17/2022] Open
Abstract
Molecular mechanisms of health and disease are often represented as systems biology diagrams, and the coverage of such representation constantly increases. These static diagrams can be transformed into dynamic models, allowing for in silico simulations and predictions. Boolean modelling is an approach based on an abstract representation of the system. It emphasises the qualitative modelling of biological systems in which each biomolecule can take two possible values: zero for absent or inactive, one for present or active. Because of this approximation, Boolean modelling is applicable to large diagrams, allowing to capture their dynamic properties. We review Boolean models of disease mechanisms and compare a range of methods and tools used for analysis processes. We explain the methodology of Boolean analysis focusing on its application in disease modelling. Finally, we discuss its practical application in analysing signal transduction and gene regulatory pathways in health and disease.
Collapse
Affiliation(s)
- Ahmed Abdelmonem Hemedan
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Anna Niarakis
- Université Paris-Saclay, Laboratoire Européen de Recherche pour la Polyarthrite rhumatoïde – Genhotel, Univ Evry, Evry, France
- Lifeware Group, Inria, Saclay-île de France, 91120 Palaiseau, France
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Corresponding author at: Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, L-4367 Belvaux, Luxembourg.
| |
Collapse
|
9
|
Fitzpatrick R, Stefan MI. Validation Through Collaboration: Encouraging Team Efforts to Ensure Internal and External Validity of Computational Models of Biochemical Pathways. Neuroinformatics 2022; 20:277-284. [PMID: 35543917 PMCID: PMC9537119 DOI: 10.1007/s12021-022-09584-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/17/2022] [Indexed: 01/09/2023]
Abstract
Computational modelling of biochemical reaction pathways is an increasingly important part of neuroscience research. In order to be useful, computational models need to be valid in two senses: First, they need to be consistent with experimental data and able to make testable predictions (external validity). Second, they need to be internally consistent and independently reproducible (internal validity). Here, we discuss both types of validity and provide a brief overview of tools and technologies used to ensure they are met. We also suggest the introduction of new collaborative technologies to ensure model validity: an incentivised experimental database for external validity and reproducibility audits for internal validity. Both rely on FAIR principles and on collaborative science practices.
Collapse
Affiliation(s)
- Richard Fitzpatrick
- Centre for Discovery Brain Sciences, University of Edinburgh, Edinburgh, UK ,School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | - Melanie I. Stefan
- Centre for Discovery Brain Sciences, University of Edinburgh, Edinburgh, UK ,ZJU-UoE Institute, Zhejiang University, Haining, China
| |
Collapse
|
10
|
Carey MA, Dräger A, Beber ME, Papin JA, Yurkovich JT. Community standards to facilitate development and address challenges in metabolic modeling. Mol Syst Biol 2021; 16:e9235. [PMID: 32845080 PMCID: PMC8411906 DOI: 10.15252/msb.20199235] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Standardization of data and models facilitates effective communication, especially in computational systems biology. However, both the development and consistent use of standards and resources remain challenging. As a result, the amount, quality, and format of the information contained within systems biology models are not consistent and therefore present challenges for widespread use and communication. Here, we focused on these standards, resources, and challenges in the field of constraint-based metabolic modeling by conducting a community-wide survey. We used this feedback to (i) outline the major challenges that our field faces and to propose solutions and (ii) identify a set of features that defines what a "gold standard" metabolic network reconstruction looks like concerning content, annotation, and simulation capabilities. We anticipate that this community-driven outline will help the long-term development of community-inspired resources as well as produce high-quality, accessible models within our field. More broadly, we hope that these efforts can serve as blueprints for other computational modeling communities to ensure the continued development of both practical, usable standards and reproducible, knowledge-rich models.
Collapse
Affiliation(s)
- Maureen A Carey
- Division of Infectious Diseases and International Health, Department of Medicine, University of Virginia, Charlottesville, VA, USA
| | - Andreas Dräger
- Computational Systems Biology of Infection and Antimicrobial-Resistant Pathogens, Institute for Biomedical Informatics (IBMI), University of Tübingen, Tübingen, Germany.,Department of Computer Science, University of Tübingen, Tübingen, Germany.,German Center for Infection Research (DZIF), partner site Tübingen, Tübingen, Germany
| | - Moritz E Beber
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Denmark
| | - Jason A Papin
- Division of Infectious Diseases and International Health, Department of Medicine, University of Virginia, Charlottesville, VA, USA.,Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA
| | | |
Collapse
|
11
|
Hatos A, Quaglia F, Piovesan D, Tosatto SCE. APICURON: a database to credit and acknowledge the work of biocurators. Database (Oxford) 2021; 2021:baab019. [PMID: 33882120 PMCID: PMC8060004 DOI: 10.1093/database/baab019] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/12/2021] [Accepted: 04/12/2021] [Indexed: 11/14/2022]
Abstract
APICURON is an open and freely accessible resource that tracks and credits the work of biocurators across multiple participating knowledgebases. Biocuration is essential to extract knowledge from research data and make it available in a structured and standardized way to the scientific community. However, processing biological data-mainly from literature-requires a huge effort that is difficult to attribute and quantify. APICURON collects biocuration events from third-party resources and aggregates this information, spotlighting biocurator contributions. APICURON promotes biocurator engagement implementing gamification concepts like badges, medals and leaderboards and at the same time provides a monitoring service for registered resources and for biocurators themselves. APICURON adopts a data model that is flexible enough to represent and track the majority of biocuration activities. Biocurators are identified through their Open Researcher and Contributor ID. The definition of curation events, scoring systems and rules for assigning badges and medals are resource-specific and easily customizable. Registered resources can transfer curation activities on the fly through a secure and robust Application Programming Interface (API). Here, we show how simple and effective it is to connect a resource to APICURON, describing the DisProt database of intrinsically disordered proteins as a use case. We believe APICURON will provide biological knowledgebases with a service to recognize and credit the effort of their biocurators, monitor their activity and promote curator engagement. Database URL: https://apicuron.org.
Collapse
Affiliation(s)
- András Hatos
- Department of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padova 35131, Italy
| | - Federica Quaglia
- Department of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padova 35131, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padova 35131, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padova 35131, Italy
| |
Collapse
|
12
|
Langenstein M, Hermjakob H, Llinares MB. A decoupled, modular and scriptable architecture for tools to curate data platforms. Bioinformatics 2021; 37:3693-3694. [PMID: 33830216 PMCID: PMC8545344 DOI: 10.1093/bioinformatics/btab233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 03/12/2021] [Accepted: 04/07/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Curation is essential for any data platform to maintain the quality of the data it provides. Today, more effective curation tools are often vital to keep up with the rapid growth of existing, maintenance-requiring databases and the amount of newly published information that needs to be surveyed. However, curation interfaces are often complex and challenging to be further developed. Therefore, opportunities for experimentation with curation workflows may be lost due to a lack of development resources or a reluctance to change sensitive production systems. RESULTS We propose a decoupled, modular and scriptable architecture to build new curation tools on top of existing platforms. Our architecture treats the existing platform as a black box. It therefore only relies on its public application programming interfaces (APIs) and web application instead of requiring any changes to the existing infrastructure. As a case study, we have implemented this architecture in cmd-iaso, a curation tool for the identifiers.org registry. With cmd-iaso, we also show that the proposed design's flexibility can be utilised to streamline and enhance the curator's workflow with the platform's existing web interface. AVAILABILITY The cmd-iaso curation tool is implemented in Python 3.7+ and supports Linux, macOS and Windows. Its source code and documentation are freely available from https://github.com/identifiers-org/cmd-iaso. It is also published as a Docker container at https://hub.docker.com/r/identifiersorg/cmd-iaso. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Momo Langenstein
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, UK CB10 1SD
| | - Henning Hermjakob
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, UK CB10 1SD
| | - Manuel Bernal Llinares
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, UK CB10 1SD
| |
Collapse
|
13
|
Thessen AE, Bogdan P, Patterson DJ, Casey TM, Hinojo-Hinojo C, de Lange O, Haendel MA. From Reductionism to Reintegration: Solving society's most pressing problems requires building bridges between data types across the life sciences. PLoS Biol 2021; 19:e3001129. [PMID: 33770077 PMCID: PMC7997011 DOI: 10.1371/journal.pbio.3001129] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Decades of reductionist approaches in biology have achieved spectacular progress, but the proliferation of subdisciplines, each with its own technical and social practices regarding data, impedes the growth of the multidisciplinary and interdisciplinary approaches now needed to address pressing societal challenges. Data integration is key to a reintegrated biology able to address global issues such as climate change, biodiversity loss, and sustainable ecosystem management. We identify major challenges to data integration and present a vision for a "Data as a Service"-oriented architecture to promote reuse of data for discovery. The proposed architecture includes standards development, new tools and services, and strategies for career-development and sustainability.
Collapse
Affiliation(s)
- Anne E. Thessen
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
- * E-mail:
| | - Paul Bogdan
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California, United States of America
| | | | - Theresa M. Casey
- Department of Animal Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - César Hinojo-Hinojo
- Department of Earth System Science, University of California, Irvine, California, United States of America
| | - Orlando de Lange
- Department of Electrical Engineering, University of Washington, Seattle, Washington, United States of America
| | - Melissa A. Haendel
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
| |
Collapse
|
14
|
Bastian FB, Roux J, Niknejad A, Comte A, Fonseca Costa SS, de Farias TM, Moretti S, Parmentier G, de Laval VR, Rosikiewicz M, Wollbrett J, Echchiki A, Escoriza A, Gharib WH, Gonzales-Porta M, Jarosz Y, Laurenczy B, Moret P, Person E, Roelli P, Sanjeev K, Seppey M, Robinson-Rechavi M. The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals. Nucleic Acids Res 2021; 49:D831-D847. [PMID: 33037820 PMCID: PMC7778977 DOI: 10.1093/nar/gkaa793] [Citation(s) in RCA: 76] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 08/24/2020] [Accepted: 09/15/2020] [Indexed: 01/24/2023] Open
Abstract
Bgee is a database to retrieve and compare gene expression patterns in multiple animal species, produced by integrating multiple data types (RNA-Seq, Affymetrix, in situ hybridization, and EST data). It is based exclusively on curated healthy wild-type expression data (e.g., no gene knock-out, no treatment, no disease), to provide a comparable reference of normal gene expression. Curation includes very large datasets such as GTEx (re-annotation of samples as ‘healthy’ or not) as well as many small ones. Data are integrated and made comparable between species thanks to consistent data annotation and processing, and to calls of presence/absence of expression, along with expression scores. As a result, Bgee is capable of detecting the conditions of expression of any single gene, accommodating any data type and species. Bgee provides several tools for analyses, allowing, e.g., automated comparisons of gene expression patterns within and between species, retrieval of the prefered conditions of expression of any gene, or enrichment analyses of conditions with expression of sets of genes. Bgee release 14.1 includes 29 animal species, and is available at https://bgee.org/ and through its Bioconductor R package BgeeDB.
Collapse
Affiliation(s)
- Frederic B Bastian
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Julien Roux
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Anne Niknejad
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Aurélie Comte
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Sara S Fonseca Costa
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tarcisio Mendes de Farias
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Sébastien Moretti
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Gilles Parmentier
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Valentine Rech de Laval
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Marta Rosikiewicz
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Julien Wollbrett
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Amina Echchiki
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Angélique Escoriza
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Walid H Gharib
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Mar Gonzales-Porta
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Yohan Jarosz
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Balazs Laurenczy
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Philippe Moret
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Emilie Person
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Patrick Roelli
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Komal Sanjeev
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Mathieu Seppey
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|