1
|
Bernasconi A, Canakoglu A, Masseroli M, Ceri S. META-BASE: A Novel Architecture for Large-Scale Genomic Metadata Integration. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:543-557. [PMID: 32750853 DOI: 10.1109/tcbb.2020.2998954] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The integration of genomic metadata is, at the same time, an important, difficult, and well-recognized challenge. It is important because a wealth of public data repositories is available to drive biological and clinical research; combining information from various heterogeneous and widely dispersed sources is paramount to a number of biological discoveries. It is difficult because the domain is complex and there is no agreement among the various metadata definitions, which refer to different vocabularies and ontologies. It is well-recognized in the bioinformatics community because, in the common practice, repositories are accessed one-by-one, learning their specific metadata definitions as result of long and tedious efforts, and such practice is error-prone. In this paper, we describe META-BASE, an architecture for integrating metadata extracted from a variety of genomic data sources, based upon a structured transformation process. We present a variety of innovative techniques for data extraction, cleaning, normalization and enrichment. We propose a general, open and extensible pipeline that can easily incorporate any number of new data sources, and propose the resulting repository-already integrating several important sources-which is exposed by means of practical user interfaces to respond biological researchers' needs.
Collapse
|
2
|
Gallo A, Perrone G. Current Approaches for Advancement in Understanding the Molecular Mechanisms of Mycotoxin Biosynthesis. Int J Mol Sci 2021; 22:ijms22157878. [PMID: 34360643 PMCID: PMC8346063 DOI: 10.3390/ijms22157878] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 07/20/2021] [Accepted: 07/21/2021] [Indexed: 12/17/2022] Open
Abstract
Filamentous fungi are able to synthesise a remarkable range of secondary metabolites, which play various key roles in the interaction between fungi and the rest of the biosphere, determining their ecological fitness. Many of them can have a beneficial activity to be exploited, as well as negative impact on human and animal health, as in the case of mycotoxins contaminating large quantities of food, feed, and agricultural products worldwide and posing serious health and economic risks. The elucidation of the molecular aspects of mycotoxin biosynthesis has been greatly sped up over the past decade due to the advent of next-generation sequencing technologies, which greatly reduced the cost of genome sequencing and related omic analyses. Here, we briefly highlight the recent progress in the use and integration of omic approaches for the study of mycotoxins biosynthesis. Particular attention has been paid to genomics and transcriptomic approaches for the identification and characterisation of biosynthetic gene clusters of mycotoxins and the understanding of the regulatory pathways activated in response to physiological and environmental factors leading to their production. The latest innovations in genome-editing technology have also provided a more powerful tool for the complete explanation of regulatory and biosynthesis pathways. Finally, we address the crucial issue of the interpretation of the combined omics data on the biology of the mycotoxigenic fungi. They are rapidly expanding and require the development of resources for more efficient integration, as well as the completeness and the availability of intertwined data for the research community.
Collapse
Affiliation(s)
- Antonia Gallo
- Institute of Sciences of Food Production (ISPA) National Research Council (CNR), 73100 Lecce, Italy
- Correspondence: (A.G.); (G.P.)
| | - Giancarlo Perrone
- Institute of Sciences of Food Production (ISPA) National Research Council (CNR), 70126 Bari, Italy
- Correspondence: (A.G.); (G.P.)
| |
Collapse
|
3
|
Jurburg SD, Keil P, Singh BK, Chase JM. All together now: Limitations and recommendations for the simultaneous analysis of all eukaryotic soil sequences. Mol Ecol Resour 2021; 21:1759-1771. [PMID: 33943001 DOI: 10.1111/1755-0998.13401] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 03/31/2021] [Accepted: 04/13/2021] [Indexed: 02/06/2023]
Abstract
The soil environment contains a large, but historically underexplored, reservoir of biodiversity. Sequencing prokaryotic marker genes has become commonplace for the discovery and characterization of soil bacteria and archaea. Increasingly, this approach is also applied to eukaryotic marker genes to characterize the diversity and distribution of soil eukaryotes. However, understanding the properties and limitations of eukaryotic marker sequences is essential for correctly analysing, interpreting, and synthesizing the resulting data. Here, we illustrate several biases from sequencing data that affect measurements of biodiversity that arise from variation in morphology, taxonomy and phylogeny between organisms, as well as from sampling designs. We recommend analytical approaches to overcome these limitations, and outline how the benchmarking and standardization of sequencing protocols may improve the comparability of the data.
Collapse
Affiliation(s)
- Stephanie D Jurburg
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.,Institute of Biology, Leipzig University, Leipzig, Germany
| | - Petr Keil
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.,Department of Computer Science, Martin Luther University, Halle-Wittenberg, Halle, Germany.,Faculty of Environmental Sciences, Czech University of Life Sciences Prague, Praha-Suchdol, Czech Republic
| | - Brajesh K Singh
- Hawkesbury Institute for the Environment, and Global Centre for Land-Based Innovation, Western Sydney University, Penrith, NSW, Australia
| | - Jonathan M Chase
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.,Department of Computer Science, Martin Luther University, Halle-Wittenberg, Halle, Germany
| |
Collapse
|
4
|
Lindsay EC, Metcalfe NB, Llewellyn MS. The potential role of the gut microbiota in shaping host energetics and metabolic rate. J Anim Ecol 2020; 89:2415-2426. [PMID: 32858775 DOI: 10.1111/1365-2656.13327] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 07/07/2020] [Indexed: 12/14/2022]
Abstract
It is increasingly recognized that symbiotic microbiota (especially those present in the gut) have important influences on the functioning of their host. Here, we review the interplay between this microbial community and the growth, metabolic rate and nutritional energy harvest of the host. We show how recent developments in experimental and analytical methods have allowed much easier characterization of the nature, and increasingly the functioning, of the gut microbiota. Manipulation studies that remove or augment gut microorganisms or transfer them between hosts have allowed unprecedented insights into their impact. Whilst much of the information to date has come from studies of laboratory model organisms, recent studies have used a more diverse range of host species, including those living in natural conditions, revealing their ecological relevance. The gut microbiota can provide the host with dietary nutrients that would be otherwise unobtainable, as well as allow the host flexibility in its capacity to cope with changing environments. The composition of the gut microbial community of a species can vary seasonally or when the host moves between environments (e.g. fresh and sea water in the case of migratory fish). It can also change with host diet choice, metabolic rate (or demands) and life stage. These changes in gut microbial community composition enable the host to live within different environments, adapt to seasonal changes in diet and maintain performance throughout its entire life history, highlighting the ecological relevance of the gut microbiota. Whilst it is evident that gut microbes can underpin host metabolic plasticity, the causal nature of associations between particular microorganisms and host performance is not always clear unless a manipulative approach has been used. Many studies have focussed on a correlative approach by characterizing microbial community composition, but there is now a need for more experimental studies in both wild and laboratory-based environments, to reveal the true role of gut microbiota in influencing the functioning of their hosts, including its capacity to tolerate environmental change. We highlight areas where these would be particularly fruitful in the context of ecological energetics.
Collapse
Affiliation(s)
- Elle C Lindsay
- Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, Glasgow, UK
| | - Neil B Metcalfe
- Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, Glasgow, UK
| | - Martin S Llewellyn
- Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, Glasgow, UK
| |
Collapse
|
5
|
Jurburg SD, Konzack M, Eisenhauer N, Heintz-Buschart A. The archives are half-empty: an assessment of the availability of microbial community sequencing data. Commun Biol 2020; 3:474. [PMID: 32859925 PMCID: PMC7455719 DOI: 10.1038/s42003-020-01204-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 08/03/2020] [Indexed: 02/07/2023] Open
Abstract
As DNA sequencing has become more popular, the public genetic repositories where sequences are archived have experienced explosive growth. These repositories now hold invaluable collections of sequences, e.g., for microbial ecology, but whether these data are reusable has not been evaluated. We assessed the availability and state of 16S rRNA gene amplicon sequences archived in public genetic repositories (SRA, EBI, and DDJ). We screened 26,927 publications in 17 microbiology journals, identifying 2015 16S rRNA gene sequencing studies. Of these, 7.2% had not made their data public at the time of analysis. Among a subset of 635 studies sequencing the same gene region, 40.3% contained data which was not available or not reusable, and an additional 25.5% contained faults in data formatting or data labeling, creating obstacles for data reuse. Our study reveals gaps in data availability, identifies major contributors to data loss, and offers suggestions for improving data archiving practices.
Collapse
Affiliation(s)
- Stephanie D Jurburg
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, 04103, Leipzig, Germany.
- Leipzig University, Institute of Biology, Deutscher Platz 5e, 04103, Leipzig, Germany.
| | - Maximilian Konzack
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, 04103, Leipzig, Germany
- Martin Luther University Halle-Wittenberg, Halle, Germany
| | - Nico Eisenhauer
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, 04103, Leipzig, Germany
- Leipzig University, Institute of Biology, Deutscher Platz 5e, 04103, Leipzig, Germany
| | - Anna Heintz-Buschart
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, 04103, Leipzig, Germany
- Helmholtz Centre for Environmental Research GmbH - UFZ, Halle, Germany
| |
Collapse
|
6
|
Bolduc B, Hodgkins SB, Varner RK, Crill PM, McCalley CK, Chanton JP, Tyson GW, Riley WJ, Palace M, Duhaime MB, Hough MA, Saleska SR, Sullivan MB, Rich VI. The IsoGenie database: an interdisciplinary data management solution for ecosystems biology and environmental research. PeerJ 2020. [DOI: 10.7717/peerj.9467] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Modern microbial and ecosystem sciences require diverse interdisciplinary teams that are often challenged in “speaking” to one another due to different languages and data product types. Here we introduce the IsoGenie Database (IsoGenieDB; https://isogenie-db.asc.ohio-state.edu/), a de novo developed data management and exploration platform, as a solution to this challenge of accurately representing and integrating heterogenous environmental and microbial data across ecosystem scales. The IsoGenieDB is a public and private data infrastructure designed to store and query data generated by the IsoGenie Project, a ~10 year DOE-funded project focused on discovering ecosystem climate feedbacks in a thawing permafrost landscape. The IsoGenieDB provides (i) a platform for IsoGenie Project members to explore the project’s interdisciplinary datasets across scales through the inherent relationships among data entities, (ii) a framework to consolidate and harmonize the datasets needed by the team’s modelers, and (iii) a public venue that leverages the same spatially explicit, disciplinarily integrated data structure to share published datasets. The IsoGenieDB is also being expanded to cover the NASA-funded Archaea to Atmosphere (A2A) project, which scales the findings of IsoGenie to a broader suite of Arctic peatlands, via the umbrella A2A Database (A2A-DB). The IsoGenieDB’s expandability and flexible architecture allow it to serve as an example ecosystems database.
Collapse
Affiliation(s)
- Benjamin Bolduc
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
| | | | - Ruth K. Varner
- Earth Systems Research Center, Institute for the Study of Earth, Oceans and Space, University of New Hampshire, Durham, NH, USA
- Department of Earth Sciences, College of Engineering and Physical Sciences, University of New Hampshire, Durham, NH, USA
| | - Patrick M. Crill
- Department of Geological Sciences and Bolin Centre for Climate Research, Stockholm University, Stockholm, Sweden
| | - Carmody K. McCalley
- Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester, NY, USA
| | - Jeffrey P. Chanton
- Department of Earth, Ocean, and Atmospheric Science, Florida State University, Tallahassee, FL, USA
| | - Gene W. Tyson
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - William J. Riley
- Climate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Michael Palace
- Earth Systems Research Center, Institute for the Study of Earth, Oceans and Space, University of New Hampshire, Durham, NH, USA
- Department of Earth Sciences, College of Engineering and Physical Sciences, University of New Hampshire, Durham, NH, USA
| | - Melissa B. Duhaime
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Moira A. Hough
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Scott R. Saleska
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Matthew B. Sullivan
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
- Department of Civil, Environmental and Geodetic Engineering, The Ohio State University, Columbus, OH, USA
| | - Virginia I. Rich
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
| | | |
Collapse
|
7
|
Canakoglu A, Bernasconi A, Colombo A, Masseroli M, Ceri S. GenoSurf: metadata driven semantic search system for integrated genomic datasets. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5670757. [PMID: 31820804 PMCID: PMC6902006 DOI: 10.1093/database/baz132] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Revised: 10/04/2019] [Accepted: 10/21/2019] [Indexed: 01/18/2023]
Abstract
Many valuable resources developed by world-wide research institutions and consortia describe genomic datasets that are both open and available for secondary research, but their metadata search interfaces are heterogeneous, not interoperable and sometimes with very limited capabilities. We implemented GenoSurf, a multi-ontology semantic search system providing access to a consolidated collection of metadata attributes found in the most relevant genomic datasets; values of 10 attributes are semantically enriched by making use of the most suited available ontologies. The user of GenoSurf provides as input the search terms, sets the desired level of ontological enrichment and obtains as output the identity of matching data files at the various sources. Search is facilitated by drop-down lists of matching values; aggregate counts describing resulting files are updated in real time while the search terms are progressively added. In addition to the consolidated attributes, users can perform keyword-based searches on the original (raw) metadata, which are also imported; GenoSurf supports the interplay of attribute-based and keyword-based search through well-defined interfaces. Currently, GenoSurf integrates about 40 million metadata of several major valuable data sources, including three providers of clinical and experimental data (TCGA, ENCODE and Roadmap Epigenomics) and two sources of annotation data (GENCODE and RefSeq); it can be used as a standalone resource for targeting the genomic datasets at their original sources (identified with their accession IDs and URLs), or as part of an integrated query answering system for performing complex queries over genomic regions and metadata.
Collapse
Affiliation(s)
- Arif Canakoglu
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy
| | - Anna Bernasconi
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy
| | - Andrea Colombo
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy
| | - Marco Masseroli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy
| | - Stefano Ceri
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy
| |
Collapse
|
8
|
Harjes J, Link A, Weibulat T, Triebel D, Rambold G. FAIR digital objects in environmental and life sciences should comprise workflow operation design data and method information for repeatability of study setups and reproducibility of results. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5894776. [PMID: 32815545 PMCID: PMC7439577 DOI: 10.1093/database/baaa059] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Revised: 07/01/2020] [Accepted: 07/07/2020] [Indexed: 12/23/2022]
Abstract
Repeatability of study setups and reproducibility of research results by underlying data are major requirements in science. Until now, abstract models for describing the structural logic of studies in environmental sciences are lacking and tools for data management are insufficient. Mandatory for repeatability and reproducibility is the use of sophisticated data management solutions going beyond data file sharing. Particularly, it implies maintenance of coherent data along workflows. Design data concern elements from elementary domains of operations being transformation, measurement and transaction. Operation design elements and method information are specified for each consecutive workflow segment from field to laboratory campaigns. The strict linkage of operation design element values, operation values and objects is essential. For enabling coherence of corresponding objects along consecutive workflow segments, the assignment of unique identifiers and the specification of their relations are mandatory. The abstract model presented here addresses these aspects, and the software DiversityDescriptions (DWB-DD) facilitates the management of thusly connected digital data objects and structures. DWB-DD allows for an individual specification of operation design elements and their linking to objects. Two workflow design use cases, one for DNA barcoding and another for cultivation of fungal isolates, are given. To publish those structured data, standard schema mapping and XML-provision of digital objects are essential. Schemas useful for this mapping include the Ecological Markup Language, the Schema for Meta-omics Data of Collection Objects and the Standard for Structured Descriptive Data. Data pipelines with DWB-DD include the mapping and conversion between schemas and functions for data publishing and archiving according to the Open Archival Information System standard. The setting allows for repeatability of study setups, reproducibility of study results and for supporting work groups to structure and maintain their data from the beginning of a study. The theory of ‘FAIR++’ digital objects is introduced.
Collapse
Affiliation(s)
- Janno Harjes
- University of Bayreuth, Universitätsstraße 30, 95440 Bayreuth, Germany
| | - Anton Link
- Staatliche Naturwissenschaftliche Sammlungen Bayerns, Menzinger Straße 67, 80638 München, Germany
| | - Tanja Weibulat
- Staatliche Naturwissenschaftliche Sammlungen Bayerns, Menzinger Straße 67, 80638 München, Germany.,German Federation for Biological Data e. V., Campus Ring 1, 28759 Bremen, Germany
| | - Dagmar Triebel
- Staatliche Naturwissenschaftliche Sammlungen Bayerns, Menzinger Straße 67, 80638 München, Germany.,German Federation for Biological Data e. V., Campus Ring 1, 28759 Bremen, Germany
| | - Gerhard Rambold
- University of Bayreuth, Universitätsstraße 30, 95440 Bayreuth, Germany
| |
Collapse
|
9
|
|
10
|
Bernasconi A, Canakoglu A, Ceri S. Exploiting Conceptual Modeling for Searching Genomic Metadata: A Quantitative and Qualitative Empirical Study. LECTURE NOTES IN COMPUTER SCIENCE 2019. [DOI: 10.1007/978-3-030-34146-6_8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|