Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Oestreich M, Chen D, Schultze JL, Fritz M, Becker M. Privacy considerations for sharing genomics data. EXCLI J 2021;20:1243-1260. [PMID: 34345236 PMCID: PMC8326502 DOI: 10.17179/excli2021-4002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Accepted: 07/07/2021] [Indexed: 01/23/2023]

For:	Oestreich M, Chen D, Schultze JL, Fritz M, Becker M. Privacy considerations for sharing genomics data. EXCLI J 2021;20:1243-1260. [PMID: 34345236 PMCID: PMC8326502 DOI: 10.17179/excli2021-4002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Accepted: 07/07/2021] [Indexed: 01/23/2023]

Number

Cited by Other Article(s)

Chicco D, Fabris A, Jurman G. The Venus score for the assessment of the quality and trustworthiness of biomedical datasets. BioData Min 2025;18:1. [PMID: 39780220 PMCID: PMC11716409 DOI: 10.1186/s13040-024-00412-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 12/02/2024] [Indexed: 01/11/2025] Open

Abstract

Biomedical datasets are the mainstays of computational biology and health informatics projects, and can be found on multiple data platforms online or obtained from wet-lab biologists and physicians. The quality and the trustworthiness of these datasets, however, can sometimes be poor, producing bad results in turn, which can harm patients and data subjects. To address this problem, policy-makers, researchers, and consortia have proposed diverse regulations, guidelines, and scores to assess the quality and increase the reliability of datasets. Although generally useful, however, they are often incomplete and impractical. The guidelines of Datasheets for Datasets, in particular, are too numerous; the requirements of the Kaggle Dataset Usability Score focus on non-scientific requisites (for example, including a cover image); and the European Union Artificial Intelligence Act (EU AI Act) sets forth sparse and general data governance requirements, which we tailored to datasets for biomedical AI. Against this backdrop, we introduce our new Venus score to assess the data quality and trustworthiness of biomedical datasets. Our score ranges from 0 to 10 and consists of ten questions that anyone developing a bioinformatics, medical informatics, or cheminformatics dataset should answer before the release. In this study, we first describe the EU AI Act, Datasheets for Datasets, and the Kaggle Dataset Usability Score, presenting their requirements and their drawbacks. To do so, we reverse-engineer the weights of the influential Kaggle Score for the first time and report them in this study. We distill the most important data governance requirements into ten questions tailored to the biomedical domain, comprising the Venus score. We apply the Venus score to twelve datasets from multiple subdomains, including electronic health records, medical imaging, microarray and bulk RNA-seq gene expression, cheminformatics, physiologic electrogram signals, and medical text. Analyzing the results, we surface fine-grained strengths and weaknesses of popular datasets, as well as aggregate trends. Most notably, we find a widespread tendency to gloss over sources of data inaccuracy and noise, which may hinder the reliable exploitation of data and, consequently, research results. Overall, our results confirm the applicability and utility of the Venus score to assess the trustworthiness of biomedical data.

Collapse

Carraro C, Montgomery JV, Klimmt J, Paquet D, Schultze JL, Beyer MD. Tackling neurodegeneration in vitro with omics: a path towards new targets and drugs. Front Mol Neurosci 2024;17:1414886. [PMID: 38952421 PMCID: PMC11215216 DOI: 10.3389/fnmol.2024.1414886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 06/04/2024] [Indexed: 07/03/2024] Open

Abstract

Drug discovery is a generally inefficient and capital-intensive process. For neurodegenerative diseases (NDDs), the development of novel therapeutics is particularly urgent considering the long list of late-stage drug candidate failures. Although our knowledge on the pathogenic mechanisms driving neurodegeneration is growing, additional efforts are required to achieve a better and ultimately complete understanding of the pathophysiological underpinnings of NDDs. Beyond the etiology of NDDs being heterogeneous and multifactorial, this process is further complicated by the fact that current experimental models only partially recapitulate the major phenotypes observed in humans. In such a scenario, multi-omic approaches have the potential to accelerate the identification of new or repurposed drugs against a multitude of the underlying mechanisms driving NDDs. One major advantage for the implementation of multi-omic approaches in the drug discovery process is that these overarching tools are able to disentangle disease states and model perturbations through the comprehensive characterization of distinct molecular layers (i.e., genome, transcriptome, proteome) up to a single-cell resolution. Because of recent advances increasing their affordability and scalability, the use of omics technologies to drive drug discovery is nascent, but rapidly expanding in the neuroscience field. Combined with increasingly advanced in vitro models, which particularly benefited from the introduction of human iPSCs, multi-omics are shaping a new paradigm in drug discovery for NDDs, from disease characterization to therapeutics prediction and experimental screening. In this review, we discuss examples, main advantages and open challenges in the use of multi-omic approaches for the in vitro discovery of targets and therapies against NDDs.

Collapse

Thomas M, Mackes N, Preuss-Dodhy A, Wieland T, Bundschus M. Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2024;5:e54332. [PMID: 38935957 PMCID: PMC11165293 DOI: 10.2196/54332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 03/26/2024] [Accepted: 03/29/2024] [Indexed: 06/29/2024]

Pijnacker R, van den Beld M, van der Zwaluw K, Verbruggen A, Coipan C, Segura AH, Mughini-Gras L, Franz E, Bosch T. Comparing Multiple Locus Variable-Number Tandem Repeat Analyses with Whole-Genome Sequencing as Typing Method for Salmonella Enteritidis Surveillance in The Netherlands, January 2019 to March 2020. Microbiol Spectr 2022;10:e0137522. [PMID: 36121225 PMCID: PMC9603844 DOI: 10.1128/spectrum.01375-22] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 08/29/2022] [Indexed: 12/30/2022] Open

Abstract

In the Netherlands, whole-genome sequencing (WGS) was implemented as routine typing tool for Salmonella Enteritidis isolates in 2019. Multiple locus variable-number tandem repeat analyses (MLVA) was performed in parallel. The objective was to determine the concordance of MLVA and WGS as typing methods for S. Enteritidis isolates. We included S. Enteritidis isolates from patients that were subtyped using MLVA and WGS-based core-genome Multilocus Sequence Typing (cgMLST) as part of the national laboratory surveillance of Salmonella during January 2019 to March 2020. The concordance of clustering based on MLVA and cgMLST, with a distance of ≤5 alleles, was assessed using the Fowlkes-Mallows (FM) index, and their discriminatory power using Simpson's diversity index. Of 439 isolates in total, 404 (92%) were typed as 32 clusters based on MLVA, with a median size of 4 isolates (range:2 to 141 isolates). Based on cgMLST, 313 (71%) isolates were typed as 48 clusters, with a median size of 3 isolates (range:2 to 39 isolates). The FM index was 0.34 on a scale from 0 to 1, where a higher value indicates greater similarity between the typing methods. The Simpson's diversity index of MLVA and cgMLST was 0.860 and 0.974, respectively. The median cgMLST distance between isolates with the same MLVA type was 27 alleles (interquartile range [IQR]:17 to 34 alleles), and 2 alleles within cgMLST clusters (IQR:1-5 alleles). This study shows the higher discriminatory power of WGS over MLVA and a poor concordance between both typing methods regarding clustering of S. Enteritidis isolates. IMPORTANCE Salmonella is the most frequently reported agent causing foodborne outbreaks and the second most common zoonoses in the European Union. The incidence of the most dominant serotype Enteritidis has increased in recent years. To differentiate between Salmonella isolates, traditional typing methods such as pulsed-field gel electrophoresis (PFGE) and multiple locus variable-number tandem repeat analyses (MLVA) are increasingly replaced with whole-genome sequencing (WGS). This study compared MLVA and WGS-based core-genome Multilocus Sequence Typing (cgMLST) as typing tools for S. Enteritidis isolates that were collected as part of the national Salmonella surveillance in the Netherlands. We found a higher discriminatory power of WGS-based cgMLST over MLVA, as well as a poor concordance between both typing methods regarding clustering of S. Enteritidis isolates. This is especially relevant for cluster delineation in outbreak investigations and confirmation of the outbreak source in trace-back investigations.

Collapse