1
|
Waterhouse RM, Adam-Blondon AF, Balech B, Barta E, Ying Shi Chua P, Di Cola V, Heil KF, Hughes GM, Jermiin LS, Kalaš M, Lanfear J, Pafilis E, Palagi PM, Papageorgiou AC, Paupério J, Psomopoulos F, Raes N, Burgin J, Gabaldón T. The ELIXIR Biodiversity Community: Understanding short- and long-term changes in biodiversity. F1000Res 2024; 12:ELIXIR-499. [PMID: 38882711 PMCID: PMC11179050 DOI: 10.12688/f1000research.133724.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/09/2024] [Indexed: 06/18/2024] Open
Abstract
Biodiversity loss is now recognised as one of the major challenges for humankind to address over the next few decades. Unless major actions are taken, the sixth mass extinction will lead to catastrophic effects on the Earth's biosphere and human health and well-being. ELIXIR can help address the technical challenges of biodiversity science, through leveraging its suite of services and expertise to enable data management and analysis activities that enhance our understanding of life on Earth and facilitate biodiversity preservation and restoration. This white paper, prepared by the ELIXIR Biodiversity Community, summarises the current status and responses, and presents a set of plans, both technical and community-oriented, that should both enhance how ELIXIR Services are applied in the biodiversity field and how ELIXIR builds connections across the many other infrastructures active in this area. We discuss the areas of highest priority, how they can be implemented in cooperation with the ELIXIR Platforms, and their connections to existing ELIXIR Communities and international consortia. The article provides a preliminary blueprint for a Biodiversity Community in ELIXIR and is an appeal to identify and involve new stakeholders.
Collapse
Affiliation(s)
- Robert M. Waterhouse
- Department of Ecology and Evolution, SIB Swiss Institute of Bioinformatics, Universite de Lausanne, Lausanne, Vaud, 1015, Switzerland
| | - Anne-Françoise Adam-Blondon
- INRAE, BioinfOmics, Plant Bioinformatics Facility, Universite Paris-Saclay, Gif-sur-Yvette, Île-de-France, 78026, France
| | - Bachir Balech
- Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Bari, 70126, Italy
| | - Endre Barta
- Institute of Genetics and Biotechnology, Magyar Agrar- es Elettudomanyi Egyetem, Gödöllő, Pest County, Hungary
| | | | - Valeria Di Cola
- SIB Swiss Institute of Bioinformatics, Lausanne, Vaud, 1015, Switzerland
| | | | - Graham M. Hughes
- School of Biology and Environmental Science, University College Dublin, Dublin, Leinster, Ireland
| | - Lars S. Jermiin
- Systems Biology Ireland, School of Medicine, University College Dublin, Dublin, Leinster, Ireland
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland
| | - Matúš Kalaš
- Department of Informatics, Universitetet i Bergen, Bergen, Hordaland, Norway
| | - Jerry Lanfear
- ELIXIR, Wellcome Genome Campus, Hinxton, England, CB10 1SD, UK
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, 71003, Greece
| | - Patricia M. Palagi
- SIB Swiss Institute of Bioinformatics, Lausanne, Vaud, 1015, Switzerland
| | | | - Joana Paupério
- EMBL-EBI, Wellcome Genome Campus, Hinxton, England, CB10 1SD, UK
| | - Fotis Psomopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Niels Raes
- Naturalis Biodiversity Center, Leiden, South Holland, The Netherlands
| | - Josephine Burgin
- EMBL-EBI, Wellcome Genome Campus, Hinxton, England, CB10 1SD, UK
| | - Toni Gabaldón
- Institut de Recerca Biomedica, Barcelona, Catalonia, Spain
- Centro Nacional de Supercomputacion, Barcelona, Catalonia, Spain
| |
Collapse
|
2
|
Caucheteur D, May Pendlington Z, Roncaglia P, Gobeill J, Mottin L, Matentzoglu N, Agosti D, Osumi-Sutherland D, Parkinson H, Ruch P. COVoc and COVTriage: novel resources to support literature triage. Bioinformatics 2023; 39:6895097. [PMID: 36511598 PMCID: PMC9825781 DOI: 10.1093/bioinformatics/btac800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 10/28/2022] [Accepted: 12/12/2022] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Since early 2020, the coronavirus disease 2019 (COVID-19) pandemic has confronted the biomedical community with an unprecedented challenge. The rapid spread of COVID-19 and ease of transmission seen worldwide is due to increased population flow and international trade. Front-line medical care, treatment research and vaccine development also require rapid and informative interpretation of the literature and COVID-19 data produced around the world, with 177 500 papers published between January 2020 and November 2021, i.e. almost 8500 papers per month. To extract knowledge and enable interoperability across resources, we developed the COVID-19 Vocabulary (COVoc), an application ontology related to the research on this pandemic. The main objective of COVoc development was to enable seamless navigation from biomedical literature to core databases and tools of ELIXIR, a European-wide intergovernmental organization for life sciences. RESULTS This collaborative work provided data integration into SIB Literature services, an application ontology (COVoc) and a triage service named COVTriage and based on annotation processing to search for COVID-related information across pre-defined aspects with daily updates. Thanks to its interoperability potential, COVoc lends itself to wider applications, hopefully through further connections with other novel COVID-19 ontologies as has been established with Coronavirus Infectious Disease Ontology. AVAILABILITY AND IMPLEMENTATION The data at https://github.com/EBISPOT/covoc and the service at https://candy.hesge.ch/COVTriage.
Collapse
Affiliation(s)
| | - Zoë May Pendlington
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Paola Roncaglia
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Julien Gobeill
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva 1206, Switzerland
- BiTeM Group, Information Sciences, HES-SO/HEG Genève, Carouge 1227, Switzerland
| | - Luc Mottin
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva 1206, Switzerland
- BiTeM Group, Information Sciences, HES-SO/HEG Genève, Carouge 1227, Switzerland
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva 1205, Switzerland
| | - Nicolas Matentzoglu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
- Semanticly Ltd, London, WC2H 9JQ, UK
| | - Donat Agosti
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva 1206, Switzerland
- Plazi, Bern 3007, Switzerland
| | - David Osumi-Sutherland
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Patrick Ruch
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva 1206, Switzerland
- BiTeM Group, Information Sciences, HES-SO/HEG Genève, Carouge 1227, Switzerland
| |
Collapse
|
3
|
Burgin J, Ahamed A, Cummins C, Devraj R, Gueye K, Gupta D, Gupta V, Haseeb M, Ihsan M, Ivanov E, Jayathilaka S, Balavenkataraman Kadhirvelu V, Kumar M, Lathi A, Leinonen R, Mansurova M, McKinnon J, O’Cathail C, Paupério J, Pesant S, Rahman N, Rinck G, Selvakumar S, Suman S, Vijayaraja S, Waheed Z, Woollard P, Yuan D, Zyoud A, Burdett T, Cochrane G. The European Nucleotide Archive in 2022. Nucleic Acids Res 2022; 51:D121-D125. [PMID: 36399492 PMCID: PMC9825583 DOI: 10.1093/nar/gkac1051] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/21/2022] [Accepted: 10/25/2022] [Indexed: 11/19/2022] Open
Abstract
The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena), maintained by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), offers those producing data an open and supported platform for the management, archiving, publication, and dissemination of data; and to the scientific community as a whole, it offers a globally comprehensive data set through a host of data discovery and retrieval tools. Here, we describe recent updates to the ENA's submission and retrieval services as well as focused efforts to improve connectivity, reusability, and interoperability of ENA data and metadata.
Collapse
Affiliation(s)
- Josephine Burgin
- To whom correspondence should be addressed. Tel: +44 1223 49 4246; Fax: +44 1223 494 468;
| | - Alisha Ahamed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carla Cummins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rajkumar Devraj
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Khadim Gueye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dipayan Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Vikas Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Muhammad Haseeb
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Maira Ihsan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Eugene Ivanov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Suran Jayathilaka
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Manish Kumar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ankur Lathi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rasko Leinonen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Milena Mansurova
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jasmine McKinnon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Colman O’Cathail
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Joana Paupério
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stéphane Pesant
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nadim Rahman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Gabriele Rinck
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sandeep Selvakumar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Swati Suman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Senthilnathan Vijayaraja
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zahra Waheed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter Woollard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David Yuan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ahmad Zyoud
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Guy Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
4
|
Agosti D, Benichou L, Addink W, Arvanitidis C, Catapano T, Cochrane G, Dillen M, Döring M, Georgiev T, Gérard I, Groom Q, Kishor P, Kroh A, Kvaček J, Mergen P, Mietchen D, Pauperio J, Sautter G, Penev L. Recommendations for use of annotations and persistent identifiers in taxonomy and biodiversity publishing. RESEARCH IDEAS AND OUTCOMES 2022. [DOI: 10.3897/rio.8.e97374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The paper summarises many years of discussions and experience of biodiversity publishers, organisations, research projects and individual researchers, and proposes recommendations for implementation of persistent identifiers for article metadata, structural elements (sections, subsections, figures, tables, references, supplementary materials and others) and data specific to biodiversity (taxonomic treatments, treatment citations, taxon names, material citations, gene sequences, specimens, scientific collections) in taxonomy and biodiversity publishing. The paper proposes best practices on how identifiers should be used in the different cases and on how they can be minted, cited, and expressed in the backend article XML to facilitate conversion to and further re-use of the article content as FAIR data. The paper also discusses several specific routes for post-publication re-use of semantically enhanced content through large biodiversity data aggregators such as the Global Biodiversity Information Facility (GBIF), the International Nucleotide Sequence Database Collaboration (INSDC) and others, and proposes specifications of both identifiers and XML tags to be used for that purpose. A summary table provides an account and overview of the recommendations. The guidelines are supported with examples from the existing publishing practices.
Collapse
|
5
|
Meeus S, Addink W, Agosti D, Arvanitidis C, Balech B, Dillen M, Dimitrova M, González-Aranda JM, Holetschek J, Islam S, Jeppesen T, Mietchen D, Nicolson N, Penev L, Robertson T, Ruch P, Trekels M, Groom Q. Recommendations for interoperability among infrastructures. RESEARCH IDEAS AND OUTCOMES 2022. [DOI: 10.3897/rio.8.e96180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The BiCIKL project is born from a vision that biodiversity data are most useful if they are presented as a network of data that can be integrated and viewed from different starting points. BiCIKL’s goal is to realise that vision by linking biodiversity data infrastructures, particularly for literature, molecular sequences, specimens, nomenclature and analytics. To make those links we need to better understand the existing infrastructures, their limitations, the nature of the data they hold, the services they provide and particularly how they can interoperate. In light of those aims, in the autumn of 2021, 74 people from the biodiversity data community engaged in a total of twelve hackathon topics with the aim to assess the current state of interoperability between infrastructures holding biodiversity data. These topics examined interoperability from several angles. Some were research subjects that required interoperability to get results, some examined modalities of access and the use and implementation of standards, while others tested technologies and workflows to improve linkage of different data types.
These topics and the issues in regard to interoperability uncovered by the hackathon participants inspired the formulation of the following recommendations for infrastructures related to (1) the use of data brokers, (2) building communities and trust, (3) cloud computing as a collaborative tool, (4) standards and (5) multiple modalities of access:
If direct linking cannot be supported between infrastructures, explore using data brokers to store links
Cooperate with open linkage brokers to provide a simple way to allow two-way links between infrastructures, without having to co-organize between many different organisations
Facilitate and encourage the external reporting of issues related to their infrastructure and its interoperability.
Facilitate and encourage requests for new features related to their infrastructure and its interoperability.
Provide development roadmaps openly
Provide a mechanism for anyone to ask for help
Discuss issues in an open forum
Provide cloud-based environments to allow external participants to contribute and test changes to features
Consider the opportunities that cloud computing brings as a means to enable shared management of the infrastructure.
Promote the sharing of knowledge around big data technologies amongst partners, using cloud computing as a training environment
Invest in standards compliance and work with standards organisations to develop new, and extend existing standards
Report on and review standards compliance within an infrastructure with metrics that give credit for work on standard compliance and development
Provide as many different modalities of access as possible
Avoid requiring personal contacts to download data
Provide a full description of an API and the data it serves
If direct linking cannot be supported between infrastructures, explore using data brokers to store links
Cooperate with open linkage brokers to provide a simple way to allow two-way links between infrastructures, without having to co-organize between many different organisations
Facilitate and encourage the external reporting of issues related to their infrastructure and its interoperability.
Facilitate and encourage requests for new features related to their infrastructure and its interoperability.
Provide development roadmaps openly
Provide a mechanism for anyone to ask for help
Discuss issues in an open forum
Provide cloud-based environments to allow external participants to contribute and test changes to features
Consider the opportunities that cloud computing brings as a means to enable shared management of the infrastructure.
Promote the sharing of knowledge around big data technologies amongst partners, using cloud computing as a training environment
Invest in standards compliance and work with standards organisations to develop new, and extend existing standards
Report on and review standards compliance within an infrastructure with metrics that give credit for work on standard compliance and development
Provide as many different modalities of access as possible
Avoid requiring personal contacts to download data
Provide a full description of an API and the data it serves
Finally, the hackathons were an ideal meeting opportunity to build, diversify and extend the BiCIKL community further, and to ensure the alignment of the community with a common vision on how best to link data from specimens, samples, sequences, taxonomic names and taxonomic literature.
Collapse
|
6
|
Gupta V, Paupério J, Burgin J, Jayathilaka S, Cochrane G. ENA Source Attribute Helper: An Application Programming Interface to facilitate accurate reference to biological source data. F1000Res 2022. [DOI: 10.12688/f1000research.123934.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background: Metadata attributes of sequences that accurately reference their biological sources, as specimens or other materials of origin, and link with natural history collections, are essential to facilitate the connections between different fields in life sciences and promote reusability of data. However, metadata used to reference the biological source of sequences available within the molecular data repositories are not always well structured or comprehensive. Methods: Within the scope of the Horizon 2020 project Biodiversity Community Integrated Knowledge Library (BiCIKL), we have developed a tool, the European Nucleotide Archive (ENA) Source Attribute Helper Application Programming Interface (API), to help users accurately report biological source-related sequence and sample attributes. This tool currently focuses on the attributes in which specimens, cultures or other materials are identified, from which the sequence data were derived, and uses curated data to obtain the unique codes for the institutions and collections holding the vouchers. The API's main functions include the presentation of metadata associated with queried institutions or collections, validation of institution and collection codes in the attribute strings provided by the user, and the construction of an attribute string based on user-entered data. The API does not however support the search of voucher specimen codes, as these need to be obtained directly from the voucher institutions. We describe the API and discuss use cases for its different endpoints. The API is available at https://www.ebi.ac.uk/ena/sah/api/. Conclusions: We expect the API to promote and support the initial submission and any subsequent curation of biological source attributes, and hereby contribute to better links between sequence data and natural history collections, and hence on to taxonomy and biodiversity research, towards increasing the discoverability, reusability and impact of data.
Collapse
|