1
|
Tahir A, Draxler A, Stelzer T, Blaschke A, Laky B, Széll M, Binar J, Bartak V, Bragagna L, Maqboul L, Herzog T, Thell R, Wagner KH. A comprehensive IDA and SWATH-DIA Lipidomics and Metabolomics dataset: SARS-CoV-2 case control study. Sci Data 2024; 11:998. [PMID: 39266559 PMCID: PMC11393081 DOI: 10.1038/s41597-024-03822-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 08/27/2024] [Indexed: 09/14/2024] Open
Abstract
A significant hurdle in untargeted lipid/metabolomics research lies in the absence of reliable, cross-validated spectral libraries, leading to a considerable portion of LC-MS features being labeled as unknowns. Despite continuous advancement in annotation tools and libraries, it is important to safeguard, publish and share acquired data through public repositories. Embracing this trend of data sharing not only promotes efficient resource utilization but also paves the way for future repurposing and in-depth analysis; ultimately advancing our comprehension of Covid-19 and other diseases. In this work, we generated an extensive MS-dataset of 39 Covid-19 infected patients versus age- and gender-matched 39 healthy controls. We implemented state of the art acquisition techniques including IDA and SWATH-DIA to ensure a thorough insight in the lipidome and metabolome, ensuring a repurposable dataset.
Collapse
Affiliation(s)
- Ammar Tahir
- Department of Pharmaceutical Sciences, Division of Pharmacognosy, University of Vienna, Vienna, Austria.
- Section of Biomedical Sciences, Department of Health Sciences, FH Campus Wien, University of Applied Sciences, Vienna, Austria.
| | - Agnes Draxler
- Department of Nutritional Sciences, University of Vienna, Vienna, Austria
- Vienna Doctoral School for Pharmaceutical, Nutritional and Sport Sciences (PhaNuSpo), University of Vienna, Vienna, Austria
- Department of Health Sciences, FH Campus Wien, University of Applied Sciences, Vienna, Austria
| | - Tamara Stelzer
- Department of Nutritional Sciences, University of Vienna, Vienna, Austria
- Vienna Doctoral School for Pharmaceutical, Nutritional and Sport Sciences (PhaNuSpo), University of Vienna, Vienna, Austria
| | | | - Brenda Laky
- Medical University of Vienna, Vienna, Austria
- Austrian Society of Regenerative Medicine, Vienna, Austria
- Sigmund Freud University Vienna, Vienna, Austria
| | - Marton Széll
- Klinik Donaustadt, Emergency Department, Vienna, Austria
| | - Jessica Binar
- Section of Biomedical Sciences, Department of Health Sciences, FH Campus Wien, University of Applied Sciences, Vienna, Austria
- Department of Nutritional Sciences, University of Vienna, Vienna, Austria
| | - Viktoria Bartak
- Department of Nutritional Sciences, University of Vienna, Vienna, Austria
| | - Laura Bragagna
- Department of Nutritional Sciences, University of Vienna, Vienna, Austria
- Vienna Doctoral School for Pharmaceutical, Nutritional and Sport Sciences (PhaNuSpo), University of Vienna, Vienna, Austria
| | - Lina Maqboul
- Department of Nutritional Sciences, University of Vienna, Vienna, Austria
- Research Platform Active Ageing, University of Vienna, Vienna, Austria
| | - Theresa Herzog
- Klinik Donaustadt, Emergency Department, Vienna, Austria
| | - Rainer Thell
- Klinik Donaustadt, Emergency Department, Vienna, Austria
| | - Karl-Heinz Wagner
- Department of Nutritional Sciences, University of Vienna, Vienna, Austria
- Research Platform Active Ageing, University of Vienna, Vienna, Austria
| |
Collapse
|
2
|
Choudhary N, Pucker B. Conserved amino acid residues and gene expression patterns associated with the substrate preferences of the competing enzymes FLS and DFR. PLoS One 2024; 19:e0305837. [PMID: 39196921 PMCID: PMC11356453 DOI: 10.1371/journal.pone.0305837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Accepted: 06/05/2024] [Indexed: 08/30/2024] Open
Abstract
BACKGROUND Flavonoids, an important class of specialized metabolites, are synthesized from phenylalanine and present in almost all plant species. Different branches of flavonoid biosynthesis lead to products like flavones, flavonols, anthocyanins, and proanthocyanidins. Dihydroflavonols form the branching point towards the production of non-colored flavonols via flavonol synthase (FLS) and colored anthocyanins via dihydroflavonol 4-reductase (DFR). Despite the wealth of publicly accessible data, there remains a gap in understanding the mechanisms that mitigate competition between FLS and DFR for the shared substrate, dihydroflavonols. RESULTS An angiosperm-wide comparison of FLS and DFR sequences revealed the amino acids at positions associated with the substrate specificity in both enzymes. A global analysis of the phylogenetic distribution of these amino acid residues revealed that monocots generally possess FLS with Y132 (FLSY) and DFR with N133 (DFRN). In contrast, dicots generally possess FLSH and DFRN, DFRD, and DFRA. DFRA, which restricts substrate preference to dihydrokaempferol, previously believed to be unique to strawberry species, is found to be more widespread in angiosperms and has evolved independently multiple times. Generally, angiosperm FLS appears to prefer dihydrokaempferol, whereas DFR appears to favor dihydroquercetin or dihydromyricetin. Moreover, in the FLS-DFR competition, the dominance of one over the other is observed, with typically only one gene being expressed at any given time. CONCLUSION This study illustrates how almost mutually exclusive gene expression and substrate-preference determining residues could mitigate competition between FLS and DFR, delineates the evolution of these enzymes, and provides insights into mechanisms directing the metabolic flux of the flavonoid biosynthesis, with potential implications for ornamental plants and molecular breeding strategies.
Collapse
Affiliation(s)
- Nancy Choudhary
- Institute of Plant Biology & BRICS, Plant Biotechnology and Bioinformatics, TU Braunschweig, Braunschweig, Germany
| | - Boas Pucker
- Institute of Plant Biology & BRICS, Plant Biotechnology and Bioinformatics, TU Braunschweig, Braunschweig, Germany
| |
Collapse
|
3
|
Edfeldt K, Edwards AM, Engkvist O, Günther J, Hartley M, Hulcoop DG, Leach AR, Marsden BD, Menge A, Misquitta L, Müller S, Owen DR, Schütt KT, Skelton N, Steffen A, Tropsha A, Vernet E, Wang Y, Wellnitz J, Willson TM, Clevert DA, Haibe-Kains B, Schiavone LH, Schapira M. A data science roadmap for open science organizations engaged in early-stage drug discovery. Nat Commun 2024; 15:5640. [PMID: 38965235 PMCID: PMC11224410 DOI: 10.1038/s41467-024-49777-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 06/12/2024] [Indexed: 07/06/2024] Open
Abstract
The Structural Genomics Consortium is an international open science research organization with a focus on accelerating early-stage drug discovery, namely hit discovery and optimization. We, as many others, believe that artificial intelligence (AI) is poised to be a main accelerator in the field. The question is then how to best benefit from recent advances in AI and how to generate, format and disseminate data to enable future breakthroughs in AI-guided drug discovery. We present here the recommendations of a working group composed of experts from both the public and private sectors. Robust data management requires precise ontologies and standardized vocabulary while a centralized database architecture across laboratories facilitates data integration into high-value datasets. Lab automation and opening electronic lab notebooks to data mining push the boundaries of data sharing and data modeling. Important considerations for building robust machine-learning models include transparent and reproducible data processing, choosing the most relevant data representation, defining the right training and test sets, and estimating prediction uncertainty. Beyond data-sharing, cloud-based computing can be harnessed to build and disseminate machine-learning models. Important vectors of acceleration for hit and chemical probe discovery will be (1) the real-time integration of experimental data generation and modeling workflows within design-make-test-analyze (DMTA) cycles openly, and at scale and (2) the adoption of a mindset where data scientists and experimentalists work as a unified team, and where data science is incorporated into the experimental design.
Collapse
Affiliation(s)
- Kristina Edfeldt
- Structural Genomics Consortium, Department of Medicine, Karolinska University Hospital and Karolinska Institutet, Stockholm, Sweden
| | - Aled M Edwards
- Structural Genomics Consortium, University of Toronto, Toronto, ON, Canada
| | - Ola Engkvist
- Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden & Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Judith Günther
- Bayer AG Research and Development, Computational Molecular Design, Berlin, Germany
| | - Matthew Hartley
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - David G Hulcoop
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Andrew R Leach
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Brian D Marsden
- Centre for Medicines Discovery, NDM, University of Oxford, Oxford, UK
| | - Amelie Menge
- Institute of Pharmaceutical Chemistry, Johann Wolfgang Goethe University, Frankfurt am Main, 60438, Germany & Structural Genomics Consortium (SGC), Buchmann Institute for Life Sciences, Johann Wolfgang Goethe University, Frankfurt am Main, Germany
| | - Leonie Misquitta
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Susanne Müller
- Institute of Pharmaceutical Chemistry, Johann Wolfgang Goethe University, Frankfurt am Main, 60438, Germany & Structural Genomics Consortium (SGC), Buchmann Institute for Life Sciences, Johann Wolfgang Goethe University, Frankfurt am Main, Germany
| | - Dafydd R Owen
- Pfizer Worldwide Research, Development & Medical, Cambridge, MA, USA
| | - Kristof T Schütt
- Pfizer, Worldwide Research, Development and Medical, Machine Learning & Computational Sciences, Berlin, Germany
| | - Nicholas Skelton
- Department of Discovery Chemistry, Genentech, Inc., South San Francisco, CA, USA
| | - Andreas Steffen
- Pfizer, Worldwide Research, Development and Medical, Machine Learning & Computational Sciences, Berlin, Germany
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Erik Vernet
- Digital Science & Innovation, Novo Nordisk A/S, Maaloev, Denmark
| | - Yanli Wang
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - James Wellnitz
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Timothy M Willson
- Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Djork-Arné Clevert
- Pfizer, Worldwide Research, Development and Medical, Machine Learning & Computational Sciences, Berlin, Germany.
| | - Benjamin Haibe-Kains
- Structural Genomics Consortium, University of Toronto, Toronto, ON, Canada.
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada.
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada.
| | | | - Matthieu Schapira
- Structural Genomics Consortium, University of Toronto, Toronto, ON, Canada.
- Department of Pharmacology & Toxicology, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
4
|
Cunha-Oliveira T, Ioannidis JPA, Oliveira PJ. Best practices for data management and sharing in experimental biomedical research. Physiol Rev 2024; 104:1387-1408. [PMID: 38451234 PMCID: PMC11380994 DOI: 10.1152/physrev.00043.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/07/2024] [Accepted: 02/29/2024] [Indexed: 03/08/2024] Open
Abstract
Effective data management is crucial for scientific integrity and reproducibility, a cornerstone of scientific progress. Well-organized and well-documented data enable validation and building on results. Data management encompasses activities including organization, documentation, storage, sharing, and preservation. Robust data management establishes credibility, fostering trust within the scientific community and benefiting researchers' careers. In experimental biomedicine, comprehensive data management is vital due to the typically intricate protocols, extensive metadata, and large datasets. Low-throughput experiments, in particular, require careful management to address variations and errors in protocols and raw data quality. Transparent and accountable research practices rely on accurate documentation of procedures, data collection, and analysis methods. Proper data management ensures long-term preservation and accessibility of valuable datasets. Well-managed data can be revisited, contributing to cumulative knowledge and potential new discoveries. Publicly funded research has an added responsibility for transparency, resource allocation, and avoiding redundancy. Meeting funding agency expectations increasingly requires rigorous methodologies, adherence to standards, comprehensive documentation, and widespread sharing of data, code, and other auxiliary resources. This review provides critical insights into raw and processed data, metadata, high-throughput versus low-throughput datasets, a common language for documentation, experimental and reporting guidelines, efficient data management systems, sharing practices, and relevant repositories. We systematically present available resources and optimal practices for wide use by experimental biomedical researchers.
Collapse
Affiliation(s)
- Teresa Cunha-Oliveira
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
- Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
| | - John P A Ioannidis
- Meta-Research Innovation Center at Stanford (METRICS), Stanford, California, United States
- Department of Statistics, Stanford University, Stanford, California, United States
| | - Paulo J Oliveira
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
- Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
5
|
Lobato S, Castillo-Granada AL, Bucio-Pacheco M, Salomón-Soto VM, Álvarez-Valenzuela R, Meza-Inostroza PM, Villegas-Vizcaíno R. PM 2.5, component cause of severe metabolically abnormal obesity: An in silico, observational and analytical study. Heliyon 2024; 10:e28936. [PMID: 38601536 PMCID: PMC11004224 DOI: 10.1016/j.heliyon.2024.e28936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/26/2024] [Accepted: 03/27/2024] [Indexed: 04/12/2024] Open
Abstract
Obesity is currently one of the most alarming pathological conditions due to the progressive increase in its prevalence. In the last decade, it has been associated with fine particulate matter suspended in the air (PM2.5). The purpose of this study was to explore the mechanistic interaction of PM2.5 with a high-fat diet (HFD) through the differential regulation of transcriptional signatures, aiming to identify the association of these particles with metabolically abnormal obesity. The research design was observational, using bioinformatic methods and an explanatory approach based on Rothman's causal model. We propose three new transcriptional signatures in murine adipose tissue. The sum of transcriptional differences between the group exposed to an HFD and PM2.5, compared to the control group, were 0.851, 0.265, and -0.047 (p > 0.05). The HFD group increased body mass by 20% with two positive biomarkers of metabolic impact. The group exposed to PM2.5 maintained a similar weight to the control group but exhibited three positive biomarkers. Enriched biological pathways (p < 0.05) included PPAR signaling, small molecule transport, adipogenesis genes, cytokine-cytokine receptor interaction, and HIF-1 signaling. Transcriptional regulation predictions revealed CpG islands and common transcription factors. We propose three new transcriptional signatures: FAT-PM2.5-CEJUS, FAT-PM2.5-UP, and FAT-PM2.5-DN, whose transcriptional regulation profile in adipocytes was statistically similar by dietary intake and HFD and exposure to PM2.5 in mice; suggesting a mechanistic interaction between both factors. However, HFD-exposed murines developed moderate metabolically abnormal obesity, and PM2.5-exposed murines developed severe abnormal metabolism without obesity. Therefore, in Rothman's terms, it is concluded that HFD is a sufficient cause of the development of obesity, and PM2.5 is a component cause of severe abnormal metabolism of obesity. These signatures would be integrated into a systemic biological process that would induce transcriptional regulation in trans, activating obesogenic biological pathways, restricting lipid mobilization pathways, decreasing adaptive thermogenesis and angiogenesis, and altering vascular tone thus inducing a severe metabolically abnormal obesity.
Collapse
Affiliation(s)
- Sagrario Lobato
- Departamento de Investigación en Salud, Servicios de Salud del Estado de Puebla, 15 South Street 302, Puebla, Mexico
- Promoción y Educación para la Salud, Universidad Abierta y a Distancia de México. Universidad Avenue 1200, 1st Floor, quadrant 1-2, Xoco, Benito Juarez, 03330, Mexico City, Mexico
- Educación Superior, Centro de Estudios, “Justo Sierra”, Surutato, Badiraguato, Mexico
| | - A. Lourdes Castillo-Granada
- Educación Superior, Centro de Estudios, “Justo Sierra”, Surutato, Badiraguato, Mexico
- Facultad de Estudios Superiores Zaragoza, Universidad Nacional Autónoma de México, Guelatao Avenue 66, Ejército de Oriente Indeco II ISSSTE, Iztapalapa, 09230, Mexico City, Mexico
| | - Marcos Bucio-Pacheco
- Educación Superior, Centro de Estudios, “Justo Sierra”, Surutato, Badiraguato, Mexico
- Facultad de Biología, Universidad Autónoma de Sinaloa, Americas Avenue, Universitarios Blvd., University City, 80040, Culiacán Rosales, Mexico
| | | | | | | | | |
Collapse
|
6
|
Tone EB, Henrich CC. Principles, policies, and practices: Thoughts on their integration over the rise of the developmental psychopathology perspective and into the future. Dev Psychopathol 2024:1-9. [PMID: 38415398 DOI: 10.1017/s0954579424000257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/29/2024]
Abstract
Developmental psychopathology has, since the late 20th century, offered an influential integrative framework for conceptualizing psychological health, distress, and dysfunction across the lifespan. Leaders in the field have periodically generated predictions about its future and have proposed ways to increase the macroparadigm's impact. In this paper, we examine, using articles sampled from each decade of the journal Development and Psychopathology's existence as a rough guide, the degree to which the themes that earlier predictions have emphasized have come to fruition and the ways in which the field might further capitalize on the strengths of this approach to advance knowledge and practice in psychology. We focus in particular on two key themes first, we explore the degree to which researchers have capitalized on the framework's capacity for principled flexibility to generate novel work that integrates neurobiological and/or social-contextual factors measured at multiple levels and offer ideas for moving this kind of work forward. Second, we discuss how extensively articles have emphasized implications for intervention or prevention and how the field might amplify the voice of developmental psychopathology in applied settings.
Collapse
Affiliation(s)
- Erin B Tone
- Department of Psychology, Georgia State University, Atlanta, GA, USA
| | | |
Collapse
|
7
|
VanBuren R, Nguyen A, Marks RA, Mercado C, Pardo A, Pardo J, Schuster J, Aubin BS, Wilson ML, Rhee SY. Variability in drought gene expression datasets highlight the need for community standardization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.04.578814. [PMID: 38370805 PMCID: PMC10871248 DOI: 10.1101/2024.02.04.578814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Physiologically relevant drought stress is difficult to apply consistently, and the heterogeneity in experimental design, growth conditions, and sampling schemes make it challenging to compare water deficit studies in plants. Here, we re-analyzed hundreds of drought gene expression experiments across diverse model and crop species and quantified the variability across studies. We found that drought studies are surprisingly uncomparable, even when accounting for differences in genotype, environment, drought severity, and method of drying. Many studies, including most Arabidopsis work, lack high-quality phenotypic and physiological datasets to accompany gene expression, making it impossible to assess the severity or in some cases the occurrence of water deficit stress events. From these datasets, we developed supervised learning classifiers that can accurately predict if RNA-seq samples have experienced a physiologically relevant drought stress, and suggest this can be used as a quality control for future studies. Together, our analyses highlight the need for more community standardization, and the importance of paired physiology data to quantify stress severity for reproducibility and future data analyses.
Collapse
|
8
|
Riley M, Robinson K, Kilkenny MF, Leggat SG. The knowledge and reuse practices of researchers utilising government health information assets, Victoria, Australia, 2008-2020. PLoS One 2024; 19:e0297396. [PMID: 38300890 PMCID: PMC10833579 DOI: 10.1371/journal.pone.0297396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 01/04/2024] [Indexed: 02/03/2024] Open
Abstract
BACKGROUND Using government health datasets for secondary purposes is widespread; however, little is known on researchers' knowledge and reuse practices within Australia. OBJECTIVES To explore researchers' knowledge and experience of governance processes, and their data reuse practices, when using Victorian government health datasets for research between 2008-2020. METHOD A cross-sectional quantitative survey was conducted with authors who utilised selected Victorian, Australia, government health datasets for peer-reviewed research published between 2008-2020. Information was collected on researchers': data reuse practices; knowledge of government health information assets; perceptions of data trustworthiness for reuse; and demographic characteristics. RESULTS When researchers used government health datasets, 45% linked their data, 45% found the data access process easy and 27% found it difficult. Government-curated datasets were significantly more difficult to access compared to other-agency curated datasets (p = 0.009). Many respondents received their data in less than six months (58%), in aggregated or de-identified form (76%). Most reported performing their own data validation checks (70%). To assist in data reuse, almost 71% of researchers utilised (or created) contextual documentation, 69% a data dictionary, and 62% limitations documentation. Almost 20% of respondents were not aware if data quality information existed for the dataset they had accessed. Researchers reported data was managed by custodians with rigorous confidentiality/privacy processes (94%) and good data quality processes (76%), yet half lacked knowledge of what these processes entailed. Many respondents (78%) were unaware if dataset owners had obtained consent from the dataset subjects for research applications of the data. CONCLUSION Confidentiality/privacy processes and quality control activities undertaken by data custodians were well-regarded. Many respondents included data linkage to additional government datasets in their research. Ease of data access was variable. Some documentation types were well provided and used, but improvement is required for the provision of data quality statements and limitations documentation. Provision of information on participants' informed consent in a dataset is required.
Collapse
Affiliation(s)
- Merilyn Riley
- Department of Public Health, School of Psychology and Public Health, La Trobe University, Melbourne, Australia
| | - Kerin Robinson
- Department of Public Health, School of Psychology and Public Health, La Trobe University, Melbourne, Australia
| | - Monique F. Kilkenny
- Stroke and Ageing Research, Department of Medicine, School of Clinical Sciences at Monash Health, Monash University, Victoria, Australia
- Stroke Division, The Florey Institute of Neuroscience and Mental Health, Melbourne Brain Centre, University of Melbourne, Victoria, Australia
| | - Sandra G. Leggat
- Department of Public Health, School of Psychology and Public Health, La Trobe University, Melbourne, Australia
- School of Public Health and Tropical Medicine, James Cook University, Townsville, Australia
| |
Collapse
|
9
|
Wolff K, Friedhoff R, Schwarzer F, Pucker B. Data literacy in genome research. J Integr Bioinform 2023; 20:jib-2023-0033. [PMID: 38047760 PMCID: PMC10777367 DOI: 10.1515/jib-2023-0033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 11/15/2023] [Indexed: 12/05/2023] Open
Abstract
With an ever increasing amount of research data available, it becomes constantly more important to possess data literacy skills to benefit from this valuable resource. An integrative course was developed to teach students the fundamentals of data literacy through an engaging genome sequencing project. Each cohort of students performed planning of the experiment, DNA extraction, nanopore sequencing, genome sequence assembly, prediction of genes in the assembled sequence, and assignment of functional annotation terms to predicted genes. Students learned how to communicate science through writing a protocol in the form of a scientific paper, providing comments during a peer-review process, and presenting their findings as part of an international symposium. Many students enjoyed the opportunity to own a project and to work towards a meaningful objective.
Collapse
Affiliation(s)
- Katharina Wolff
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & BRICS, TU Braunschweig, Braunschweig, Germany
| | - Ronja Friedhoff
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & BRICS, TU Braunschweig, Braunschweig, Germany
| | - Friderieke Schwarzer
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & BRICS, TU Braunschweig, Braunschweig, Germany
| | - Boas Pucker
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & BRICS, TU Braunschweig, Braunschweig, Germany
| |
Collapse
|
10
|
Riley M, Robinson K, Kilkenny MF, Leggat SG. The suitability of government health information assets for secondary use in research: A fit-for-purpose analysis. HEALTH INF MANAG J 2023; 52:157-166. [PMID: 35471919 DOI: 10.1177/18333583221078377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND Governments have responsibility for ensuring the quality and fitness-for-purpose of personal health data provided to them. While these health information assets are used widely for research, this secondary usage has received minimal research attention. OBJECTIVE This study aimed to investigate the secondary uses, in research, of population health and administrative datasets (information assets) of the Department of Health (DoH), Victoria, Australia. The objectives were to (i) identify research based on these datasets published between 2008 and 2020; (ii) describe the data quality studies published between 2008 and 2020 for each dataset and (iii) evaluate "fitness-for-purpose" of the published research. METHOD Using a modified scoping review, research publications from 2008 to 2020 based on information assets related to health service provision and containing person-level data were reviewed. Publications were summarised by data quality and purpose-categories based on a taxonomy of data use. Fitness-for-purpose was evaluated by comparing the publicly stated purpose(s) for which each information asset was collected, with the purpose(s) assigned to the published research. RESULTS Of the >1000 information assets, 28 were utilised in 756 publications: 54% were utilised for general research purposes, 14% for patient safety, 10% for quality of care and 39% included data quality-related publications. Almost 85% of publications used information assets that were fit-for-purpose. CONCLUSION The DoH information assets were used widely for secondary purposes, with the majority identified as fit-for-purpose. We recommend that data custodians, including governments, provide information on data quality and transparency on data use of their health information assets.
Collapse
Affiliation(s)
- Merilyn Riley
- Department of Public Health, School of Psychology and Public Health, La Trobe University, Melbourne, VIC, Australia
| | - Kerin Robinson
- Department of Public Health, School of Psychology and Public Health, La Trobe University, Melbourne, VIC, Australia
| | - Monique F Kilkenny
- Department of Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC, Australia
- Florey Institute of Neuroscience and Mental Health, Parkville, VIC, Australia
| | - Sandra G Leggat
- Department of Public Health, School of Psychology and Public Health, La Trobe University, Melbourne, VIC, Australia
- School of Public Health and Tropical Medicine, James Cook University, Townsville, Australia
| |
Collapse
|
11
|
Ahmed M, Kim HJ, Kim DR. Maximizing the utility of public data. Front Genet 2023; 14:1106631. [PMID: 37065493 PMCID: PMC10102460 DOI: 10.3389/fgene.2023.1106631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 03/21/2023] [Indexed: 04/03/2023] Open
Abstract
The human genome project galvanized the scientific community around an ambitious goal. Upon completion, the project delivered several discoveries, and a new era of research commenced. More importantly, novel technologies and analysis methods materialized during the project period. The cost reduction allowed many more labs to generate high-throughput datasets. The project also served as a model for other extensive collaborations that generated large datasets. These datasets were made public and continue to accumulate in repositories. As a result, the scientific community should consider how these data can be utilized effectively for the purposes of research and the public good. A dataset can be re-analyzed, curated, or integrated with other forms of data to enhance its utility. We highlight three important areas to achieve this goal in this brief perspective. We also emphasize the critical requirements for these strategies to be successful. We draw on our own experience and others in using publicly available datasets to support, develop, and extend our research interest. Finally, we underline the beneficiaries and discuss some risks involved in data reuse.
Collapse
Affiliation(s)
- Mahmoud Ahmed
- Department of Biochemistry and Convergence Medical Sciences, Institute of Health Sciences, College of Medicine, Gyeongsang National University, Jinju, Republic of Korea
| | - Hyun Joon Kim
- Department of Anatomy and Convergence Medical Sciences, Institute of Health Sciences, College of Medicine, Gyeongsang National University, Jinju, Republic of Korea
| | - Deok Ryong Kim
- Department of Biochemistry and Convergence Medical Sciences, Institute of Health Sciences, College of Medicine, Gyeongsang National University, Jinju, Republic of Korea
- *Correspondence: Deok Ryong Kim,
| |
Collapse
|
12
|
Lee B, Hwang S, Kim PG, Ko G, Jang K, Kim S, Kim JH, Jeon J, Kim H, Jung J, Yoon BH, Byeon I, Jang I, Song W, Choi J, Kim SY. Introduction of the Korea BioData Station (K-BDS) for sharing biological data. Genomics Inform 2023; 21:e12. [PMID: 37037470 PMCID: PMC10085736 DOI: 10.5808/gi.22073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 03/06/2023] [Indexed: 04/03/2023] Open
Abstract
A wave of new technologies has created opportunities for the cost-effective generation of high-throughput profiles of biological systems, foreshadowing a "data-driven science" era. The large variety of data available from biological research is also a rich resource that can be used for innovative endeavors. However, we are facing considerable challenges in big data deposition, integration, and translation due to the complexity of biological data and its production at unprecedented exponential rates. To address these problems, in 2020, the Korean government officially announced a national strategy to collect and manage the biological data produced through national R&D fund allocations and provide the collected data to researchers. To this end, the Korea Bioinformation Center (KOBIC) developed a new biological data repository, the Korea BioData Station (K-BDS), for sharing data from individual researchers and research programs to create a data-driven biological study environment. The K-BDS is dedicated to providing free open access to a suite of featured data resources in support of worldwide activities in both academia and industry.
Collapse
|
13
|
Mante J, Abam J, Samineni SP, Pötzsch IM, Beal J, Myers CJ. Excel-SBOL Converter: Creating SBOL from Excel Templates and Vice Versa. ACS Synth Biol 2023; 12:340-346. [PMID: 36595709 DOI: 10.1021/acssynbio.2c00521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Standards support synthetic biology research by enabling the exchange of component information. However, using formal representations, such as the Synthetic Biology Open Language (SBOL), typically requires either a thorough understanding of these standards or a suite of tools developed in concurrence with the ontologies. Since these tools may be a barrier for use by many practitioners, the Excel-SBOL Converter was developed to facilitate the use of SBOL and integration into existing workflows. The converter consists of two Python libraries: one that converts Excel templates to SBOL and another that converts SBOL to an Excel workbook. Both libraries can be used either directly or via a SynBioHub plugin.
Collapse
Affiliation(s)
- Jeanet Mante
- University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Julian Abam
- University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Sai P Samineni
- University of Colorado Boulder, Boulder, Colorado 80309, United States
| | | | - Jacob Beal
- Raytheon BBN Technologies, Cambridge, Massachusetts 02138, United States
| | - Chris J Myers
- University of Colorado Boulder, Boulder, Colorado 80309, United States
| |
Collapse
|
14
|
The sheep miRNAome: Characterization and distribution of miRNAs in 21 tissues. Gene X 2023; 851:146998. [DOI: 10.1016/j.gene.2022.146998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 10/10/2022] [Accepted: 10/18/2022] [Indexed: 11/06/2022] Open
|
15
|
Rey-Campos M, Ríos-Castro R, Gallardo-Escárate C, Novoa B, Figueras A. Exploring the Potential of Metatranscriptomics to Describe Microbial Communities and Their Effects in Molluscs. Int J Mol Sci 2022; 23:ijms232416029. [PMID: 36555669 PMCID: PMC9784687 DOI: 10.3390/ijms232416029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/01/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
Metatranscriptomics has emerged as a very useful technology for the study of microbiomes from RNA-seq reads. This method provides additional information compared to the sequencing of ribosomal genes because the gene expression can also be analysed. In this work, we used the metatranscriptomic approach to study the whole microbiome of mussels, including bacteria, viruses, fungi, and protozoans, by mapping the RNA-seq reads to custom assembly databases (including the genomes of microorganisms publicly available). This strategy allowed us not only to describe the diversity of microorganisms but also to relate the host transcriptome and microbiome, finding the genes more affected by the pathogen load. Although some bacteria abundant in the metatranscriptomic analysis were undetectable by 16S rRNA sequencing, a common core of the taxa was detected by both methodologies (62% of the metatranscriptomic detections were also identified by 16S rRNA sequencing, the Oceanospirillales, Flavobacteriales and Vibrionales orders being the most relevant). However, the differences in the microbiome composition were observed among different tissues of Mytilus galloprovincialis, with the fungal kingdom being especially diverse, or among molluscan species. These results confirm the potential of a meta-analysis of transcriptome data to obtain new information on the molluscs' microbiome.
Collapse
Affiliation(s)
- Magalí Rey-Campos
- Institute of Marine Research (IIM), National Research Council (CSIC), Eduardo Cabello 6, 36208 Vigo, Spain
| | - Raquel Ríos-Castro
- Institute of Marine Research (IIM), National Research Council (CSIC), Eduardo Cabello 6, 36208 Vigo, Spain
| | - Cristian Gallardo-Escárate
- Interdisciplinary Center for Aquaculture Research (INCAR), University of Concepción, Concepción P.O. Box 160-C, Chile
| | - Beatriz Novoa
- Institute of Marine Research (IIM), National Research Council (CSIC), Eduardo Cabello 6, 36208 Vigo, Spain
| | - Antonio Figueras
- Institute of Marine Research (IIM), National Research Council (CSIC), Eduardo Cabello 6, 36208 Vigo, Spain
- Correspondence:
| |
Collapse
|
16
|
Mwinami NV, Dulle FW, Mtega WP. Data preservation practices for enhancing agricultural research data usage among agricultural researchers in Tanzania. JOURNAL OF LIBRARIANSHIP AND INFORMATION SCIENCE 2022. [DOI: 10.1177/09610006221138110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The objective of this study was to investigate the role of research data preservation for enhanced data usage among agricultural researchers in Tanzania. Specifically, the study aimed to examine the data preservation methods used by agriculture researchers, find out how long agriculture researchers preserve their agriculture research data, and determine factors that influence agriculture researchers on their choice of data preservation methods for use. The study employed a cross-sectional research design. The study employed both qualitative and quantitative approaches. A survey was conducted to collect data in 11 research institutions. A simple random sampling technique was used to select 204 respondents from the study area while purposive sampling techniques were used to select 11 agriculture research institutions including 10 Tanzanian Agricultural Research Institution (TARI) centers, and Sokoine University of Agriculture (SUA). Also, 12 respondents were selected purposively for an in-depth interview as key informants. The study adopted Data Curation Centre (DCC) Lifecycle Model to explain data preservation process. Findings indicated that a majority of more than 90% of researchers preferred to preserve their data using different storage devices such as field notebooks, computers, and institutional libraries. Moreover, findings indicated that about 74% of agricultural researchers preferred to preserve their data for more than 6 years after the end of the project. Findings also indicated factors that influence researchers in the choice of data preservation methods to be easy to reach, cost-effective storage devices, support to use the devices, adequate infrastructure for data preservation, and reliable power supply. It can be concluded that there is yet a great role of research data preservation in enhancing data usage among researchers in Tanzania. It is recommended that the government should establish an agricultural research data bank to guarantee permanent availability of data at all times when needed.
Collapse
|
17
|
New Insights into the Identity of the DFNA58 Gene. Genes (Basel) 2022; 13:genes13122274. [PMID: 36553541 PMCID: PMC9777997 DOI: 10.3390/genes13122274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 11/21/2022] [Accepted: 11/28/2022] [Indexed: 12/12/2022] Open
Abstract
Hearing loss is the most common sensory deficit, affecting 466 million people worldwide. The vast and diverse genes involved reflect the complexity of auditory physiology, which requires the use of animal models in order to gain a fuller understanding. Among the loci with a yet-to-be validated gene is the DFNA58, in which ~200 Kb genomic duplication, including three protein-coding genes (PLEK, CNRIP1, and PPP3R1's exon1), was found to segregate with autosomal dominant hearing loss. Through whole genome sequencing, the duplication was found to be in tandem and inserted in an intergenic region, without the disruption of the topological domains. Reanalysis of transcriptomes data studies (zebrafish and mouse), and RT-qPCR analysis of adult zebrafish target organs, in order to access their orthologues expression, highlighted promising results with Cnrip1a, corroborated by zebrafish in situ hybridization and immunofluorescence. Mouse data also suggested Cnrip1 as the best candidate for a relevant role in auditory physiology, and its importance in hearing seems to have remained conserved but the cell type exerting its function might have changed, from hair cells to spiral ganglion neurons.
Collapse
|
18
|
Bilbao-Arribas M, Jugo BM. Transcriptomic meta-analysis reveals unannotated long non-coding RNAs related to the immune response in sheep. Front Genet 2022; 13:1067350. [PMID: 36482891 PMCID: PMC9725098 DOI: 10.3389/fgene.2022.1067350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 11/08/2022] [Indexed: 11/23/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are involved in several biological processes, including the immune system response to pathogens and vaccines. The annotation and functional characterization of lncRNAs is more advanced in humans than in livestock species. Here, we take advantage of the increasing number of high-throughput functional experiments deposited in public databases in order to uniformly analyse, profile unannotated lncRNAs and integrate 422 ovine RNA-seq samples from the ovine immune system. We identified 12302 unannotated lncRNA genes with support from independent CAGE-seq and histone modification ChIP-seq assays. Unannotated lncRNAs showed low expression levels and sequence conservation across other mammal species. There were differences in expression levels depending on the genomic location-based lncRNA classification. Differential expression analyses between unstimulated and samples stimulated with pathogen infection or vaccination resulted in hundreds of lncRNAs with changed expression. Gene co-expression analyses revealed immune gene-enriched clusters associated with immune system activation and related to interferon signalling, antiviral response or endoplasmic reticulum stress. Besides, differential co-expression networks were constructed in order to find condition-specific relationships between coding genes and lncRNAs. Overall, using a diverse set of immune system samples and bioinformatic approaches we identify several ovine lncRNAs associated with the response to an external stimulus. These findings help in the improvement of the ovine lncRNA catalogue and provide sheep-specific evidence for the implication in the general immune response for several lncRNAs.
Collapse
|
19
|
Wood-Charlson EM, Crockett Z, Erdmann C, Arkin AP, Robinson CB. Ten simple rules for getting and giving credit for data. PLoS Comput Biol 2022; 18:e1010476. [PMID: 36173960 PMCID: PMC9521804 DOI: 10.1371/journal.pcbi.1010476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Affiliation(s)
- Elisha M. Wood-Charlson
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- * E-mail:
| | - Zachary Crockett
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America
| | - Chris Erdmann
- American Geophysical Union, Washington, DC, United States of America
| | - Adam P. Arkin
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Carly B. Robinson
- U.S. Department of Energy Office of Scientific and Technical Information, Oak Ridge, Tennessee, United States of America
| |
Collapse
|
20
|
Muñoz-Tamayo R, Nielsen BL, Gagaoua M, Gondret F, Krause ET, Morgavi DP, Olsson IAS, Pastell M, Taghipoor M, Tedeschi L, Veissier I, Nawroth C. Seven steps to enhance Open Science practices in animal science. PNAS NEXUS 2022; 1:pgac106. [PMID: 36741429 PMCID: PMC9896936 DOI: 10.1093/pnasnexus/pgac106] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 06/30/2022] [Indexed: 04/14/2023]
Abstract
The Open Science movement aims at ensuring accessibility, reproducibility, and transparency of research. The adoption of Open Science practices in animal science, however, is still at an early stage. To move ahead as a field, we here provide seven practical steps to embrace Open Science in animal science. We hope that this paper contributes to the shift in research practices of animal scientists towards open, reproducible, and transparent science, enabling the field to gain additional public trust and deal with future challenges to guarantee reliable research. Although the paper targets primarily animal science researchers, the steps discussed here are also applicable to other research domains.
Collapse
Affiliation(s)
- Rafael Muñoz-Tamayo
- INRAE, AgroParisTech, Université Paris-Saclay, UMR Modélisation Systémique Appliquée aux Ruminants, 75005 Paris, France
| | - Birte L Nielsen
- Universities Federation for Animal Welfare (UFAW), The Old School, Brewhouse Hill, Wheathampstead, Hertfordshire AL4 8AN, UK
| | | | | | - E Tobias Krause
- Institute of Animal Welfare and Animal Husbandry, Friedrich-Loeffler-Institut, Dörnbergstr. 25/27, 29223 Celle, Germany
| | - Diego P Morgavi
- Université Clermont Auvergne, INRAE, VetAgro Sup, UMR Herbivores, F-63122 Saint-Genes-Champanelle, France
| | - I Anna S Olsson
- i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Rua Alfredo Allen 208, 4200-180 Porto, Portugal
| | - Matti Pastell
- Natural Resources Institute Finland (Luke), Production Systems, Latokartanonkaari 9, FI-00790 Helsinki, Finland
| | - Masoomeh Taghipoor
- INRAE, AgroParisTech, Université Paris-Saclay, UMR Modélisation Systémique Appliquée aux Ruminants, 75005 Paris, France
| | - Luis Tedeschi
- Department of Animal Science, Texas A&M University, College Station, TX 77843-2471, USA
| | - Isabelle Veissier
- Université Clermont Auvergne, INRAE, VetAgro Sup, UMR Herbivores, F-63122 Saint-Genes-Champanelle, France
| | - Christian Nawroth
- Institute of Behavioural Physiology, Research Institute for Farm Animal Biology (FBN), Wilhelm-Stahl-Allee 2, 18196 Dummerstorf, Germany
| |
Collapse
|
21
|
Beier S, Fiebig A, Pommier C, Liyanage I, Lange M, Kersey PJ, Weise S, Finkers R, Koylass B, Cezard T, Courtot M, Contreras-Moreira B, Naamati G, Dyer S, Scholz U. Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR. F1000Res 2022; 11. [PMID: 35811804 PMCID: PMC9218589 DOI: 10.12688/f1000research.109080.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/17/2022] [Indexed: 11/20/2022] Open
Abstract
In this opinion article, we discuss the formatting of files from (plant) genotyping studies, in particular the formatting of metadata in Variant Call Format (VCF) files. The flexibility of the VCF format specification facilitates its use as a generic interchange format across domains but can lead to inconsistency between files in the presentation of metadata. To enable fully autonomous machine actionable data flow, generic elements need to be further specified. We strongly support the merits of the FAIR principles and see the need to facilitate them also through technical implementation specifications. They form a basis for the proposed VCF extensions here. We have learned from the existing application of VCF that the definition of relevant metadata using controlled standards, vocabulary and the consistent use of cross-references via resolvable identifiers (machine-readable) are particularly necessary and propose their encoding. VCF is an established standard for the exchange and publication of genotyping data. Other data formats are also used to capture variant data (for example, the HapMap and the gVCF formats), but none currently have the reach of VCF. For the sake of simplicity, we will only discuss VCF and our recommendations for its use, but these recommendations could also be applied to gVCF. However, the part of the VCF standard relating to metadata (as opposed to the actual variant calls) defines a syntactic format but no vocabulary, unique identifier or recommended content. In practice, often only sparse descriptive metadata is included. When descriptive metadata is provided, proprietary metadata fields are frequently added that have not been agreed upon within the community which may limit long-term and comprehensive interoperability. To address this, we propose recommendations for supplying and encoding metadata, focusing on use cases from plant sciences. We expect there to be overlap, but also divergence, with the needs of other domains.
Collapse
Affiliation(s)
- Sebastian Beier
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
- Institute of Bio- and Geosciences, Bioinformatics (IBG-4), Forschungszentrum Jülich GmbH, Jülich, 52425, Germany
| | - Anne Fiebig
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Cyril Pommier
- BioinfOmics, Plant bioinformatics facility, Université Paris-Saclay, INRAE, Versailles, France
| | - Isuru Liyanage
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Matthias Lange
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | | | - Stephan Weise
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Richard Finkers
- Plant Breeding, Wageningen University & Research, Wageningen, The Netherlands
- Gennovation B.V., Wageningen, The Netherlands
| | - Baron Koylass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Timothee Cezard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Mélanie Courtot
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- Ontario Institute for Cancer Research, Toronto, Canada
| | - Bruno Contreras-Moreira
- Laboratorio de Biología Computacional y Estructural, Estación Experimental Aula Dei-CSIC, Zaragoza, 50059, Spain
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Sarah Dyer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Uwe Scholz
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| |
Collapse
|
22
|
Igumbor JO, Bosire EN, Vicente-Crespo M, Igumbor EU, Olalekan UA, Chirwa TF, Kinyanjui SM, Kyobutungi C, Fonn S. Considerations for an integrated population health databank in Africa: lessons from global best practices. Wellcome Open Res 2021; 6:214. [PMID: 35224211 PMCID: PMC8844538 DOI: 10.12688/wellcomeopenres.17000.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/12/2021] [Indexed: 12/17/2022] Open
Abstract
Background: The rising digitisation and proliferation of data sources and repositories cannot be ignored. This trend expands opportunities to integrate and share population health data. Such platforms have many benefits, including the potential to efficiently translate information arising from such data to evidence needed to address complex global health challenges. There are pockets of quality data on the continent that may benefit from greater integration. Integration of data sources is however under-explored in Africa. The aim of this article is to identify the requirements and provide practical recommendations for developing a multi-consortia public and population health data-sharing framework for Africa. Methods: We conducted a narrative review of global best practices and policies on data sharing and its optimisation. We searched eight databases for publications and undertook an iterative snowballing search of articles cited in the identified publications. The Leximancer software © enabled content analysis and selection of a sample of the most relevant articles for detailed review. Themes were developed through immersion in the extracts of selected articles using inductive thematic analysis. We also performed interviews with public and population health stakeholders in Africa to gather their experiences, perceptions, and expectations of data sharing. Results: Our findings described global stakeholder experiences on research data sharing. We identified some challenges and measures to harness available resources and incentivise data sharing. We further highlight progress made by the different groups in Africa and identified the infrastructural requirements and considerations when implementing data sharing platforms. Furthermore, the review suggests key reforms required, particularly in the areas of consenting, privacy protection, data ownership, governance, and data access. Conclusions: The findings underscore the critical role of inclusion, social justice, public good, data security, accountability, legislation, reciprocity, and mutual respect in developing a responsive, ethical, durable, and integrated research data sharing ecosystem.
Collapse
Affiliation(s)
- Jude O. Igumbor
- School of Public Health, University of the Witwatersrand, Johannesburg, Gauteng, 2193, South Africa
| | - Edna N. Bosire
- School of Public Health, University of the Witwatersrand, Johannesburg, Gauteng, 2193, South Africa
| | - Marta Vicente-Crespo
- School of Public Health, University of the Witwatersrand, Johannesburg, Gauteng, 2193, South Africa
- African Population and Health Research Centre, Nairobi, Kenya
| | - Ehimario U. Igumbor
- Nigeria Centre for Disease Control, Abuja, Nigeria
- School of Public Health, University of the Western Cape, Cape Town, Western Cape, South Africa
| | - Uthman A. Olalekan
- Warwick-Centre for Applied Health Research and Delivery (WCAHRD), Division of Health Sciences, Warwick Medical School, University of Warwick, Coventry, UK
| | - Tobias F. Chirwa
- School of Public Health, University of the Witwatersrand, Johannesburg, Gauteng, 2193, South Africa
| | | | | | - Sharon Fonn
- School of Public Health, University of the Witwatersrand, Johannesburg, Gauteng, 2193, South Africa
| |
Collapse
|
23
|
The Changes in the p53 Protein across the Animal Kingdom Point to Its Involvement in Longevity. Int J Mol Sci 2021; 22:ijms22168512. [PMID: 34445220 PMCID: PMC8395165 DOI: 10.3390/ijms22168512] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 07/29/2021] [Accepted: 07/30/2021] [Indexed: 12/14/2022] Open
Abstract
Recently, the quest for the mythical fountain of youth has produced extensive research programs that aim to extend the healthy lifespan of humans. Despite advances in our understanding of the aging process, the surprisingly extended lifespan and cancer resistance of some animal species remain unexplained. The p53 protein plays a crucial role in tumor suppression, tissue homeostasis, and aging. Long-lived, cancer-free African elephants have 20 copies of the TP53 gene, including 19 retrogenes (38 alleles), which are partially active, whereas humans possess only one copy of TP53 and have an estimated cancer mortality rate of 11–25%. The mechanism through which p53 contributes to the resolution of the Peto’s paradox in Animalia remains vague. Thus, in this work, we took advantage of the available datasets and inspected the p53 amino acid sequence of phylogenetically related organisms that show variations in their lifespans. We discovered new correlations between specific amino acid deviations in p53 and the lifespans across different animal species. We found that species with extended lifespans have certain characteristic amino acid substitutions in the p53 DNA-binding domain that alter its function, as depicted from the Phenotypic Annotation of p53 Mutations, using the PROVEAN tool or SWISS-MODEL workflow. In addition, the loop 2 region of the human p53 DNA-binding domain was identified as the longest region that was associated with longevity. The 3D model revealed variations in the loop 2 structure in long-lived species when compared with human p53. Our findings show a direct association between specific amino acid residues in p53 protein, changes in p53 functionality, and the extended animal lifespan, and further highlight the importance of p53 protein in aging.
Collapse
|
24
|
Imker HJ, Luong H, Mischo WH, Schlembach MC, Wiley C. An examination of data reuse practices within highly cited articles of faculty at a research university. JOURNAL OF ACADEMIC LIBRARIANSHIP 2021. [DOI: 10.1016/j.acalib.2021.102369] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|