1
|
Gierend K, Krüger F, Genehr S, Hartmann F, Siegel F, Waltemath D, Ganslandt T, Zeleke AA. Provenance Information for Biomedical Data and Workflows: Scoping Review. J Med Internet Res 2024; 26:e51297. [PMID: 39178413 PMCID: PMC11380065 DOI: 10.2196/51297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 05/30/2024] [Accepted: 06/17/2024] [Indexed: 08/25/2024] Open
Abstract
BACKGROUND The record of the origin and the history of data, known as provenance, holds importance. Provenance information leads to higher interpretability of scientific results and enables reliable collaboration and data sharing. However, the lack of comprehensive evidence on provenance approaches hinders the uptake of good scientific practice in clinical research. OBJECTIVE This scoping review aims to identify approaches and criteria for provenance tracking in the biomedical domain. We reviewed the state-of-the-art frameworks, associated artifacts, and methodologies for provenance tracking. METHODS This scoping review followed the methodological framework developed by Arksey and O'Malley. We searched the PubMed and Web of Science databases for English-language articles published from 2006 to 2022. Title and abstract screening were carried out by 4 independent reviewers using the Rayyan screening tool. A majority vote was required for consent on the eligibility of papers based on the defined inclusion and exclusion criteria. Full-text reading and screening were performed independently by 2 reviewers, and information was extracted into a pretested template for the 5 research questions. Disagreements were resolved by a domain expert. The study protocol has previously been published. RESULTS The search resulted in a total of 764 papers. Of 624 identified, deduplicated papers, 66 (10.6%) studies fulfilled the inclusion criteria. We identified diverse provenance-tracking approaches ranging from practical provenance processing and managing to theoretical frameworks distinguishing diverse concepts and details of data and metadata models, provenance components, and notations. A substantial majority investigated underlying requirements to varying extents and validation intensities but lacked completeness in provenance coverage. Mostly, cited requirements concerned the knowledge about data integrity and reproducibility. Moreover, these revolved around robust data quality assessments, consistent policies for sensitive data protection, improved user interfaces, and automated ontology development. We found that different stakeholder groups benefit from the availability of provenance information. Thereby, we recognized that the term provenance is subjected to an evolutionary and technical process with multifaceted meanings and roles. Challenges included organizational and technical issues linked to data annotation, provenance modeling, and performance, amplified by subsequent matters such as enhanced provenance information and quality principles. CONCLUSIONS As data volumes grow and computing power increases, the challenge of scaling provenance systems to handle data efficiently and assist complex queries intensifies, necessitating automated and scalable solutions. With rising legal and scientific demands, there is an urgent need for greater transparency in implementing provenance systems in research projects, despite the challenges of unresolved granularity and knowledge bottlenecks. We believe that our recommendations enable quality and guide the implementation of auditable and measurable provenance approaches as well as solutions in the daily tasks of biomedical scientists. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) RR2-10.2196/31750.
Collapse
Affiliation(s)
- Kerstin Gierend
- Department of Biomedical Informatics, Mannheim Institute for intelligent Systems in Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Frank Krüger
- Faculty of Engineering, Wismar University of Applied Sciences, Wismar, Germany
- Institute of Communications Engineering, University of Rostock, Rostock, Germany
| | - Sascha Genehr
- Institute of Communications Engineering, University of Rostock, Rostock, Germany
| | - Francisca Hartmann
- Department of Biomedical Informatics, Mannheim Institute for intelligent Systems in Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Fabian Siegel
- Department of Biomedical Informatics, Mannheim Institute for intelligent Systems in Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Dagmar Waltemath
- Department of Medical Informatics, University Medicine Greifswald, Greifswald, Germany
| | - Thomas Ganslandt
- Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | | |
Collapse
|
2
|
Towett G, Snead RS, Marczika J, Prada I. Discursive framework for a multi-disease digital health passport in Africa: a perspective. Global Health 2024; 20:64. [PMID: 39164710 PMCID: PMC11337601 DOI: 10.1186/s12992-024-01067-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 08/05/2024] [Indexed: 08/22/2024] Open
Abstract
Africa's dual burden of rising incidence of infectious diseases and increasing prevalence of non-communicable diseases (NCDs), such as cardiovascular diseases and diabetes, demands innovative approaches to disease surveillance, response, and cross-border health management in response to growing economic integration and global connectivity. In this context, we propose a discursive framework for the development and implementation of a multi-disease digital health passport (MDDHP) in Africa. The MDDHP would serve as a secure platform for storing and sharing individual health data, offering a comprehensive solution to track and respond to infectious diseases, facilitate the management of NCDs, and improve healthcare access across borders. Empowering individuals to proactively manage their health and improve overall outcomes is a key aspect of the MDDHP. In the paper, we examine the key elements necessary to effectively implement MDDHP, focusing on minimizing risks, maintaining efficacy, and driving its adoption while also taking into consideration the unique contexts of the continent. The paper is intended to provide an understanding of the key principles involved and contribute to the discussion on the development and successful implementation of MDDHP in Africa.
Collapse
Affiliation(s)
- Gideon Towett
- The Self Research Institute, Broken Arrow, Oklahoma, USA.
- Department of Biochemistry, Microbiology and Biotechnology, Kenyatta University, Nairobi, Kenya.
| | | | - Julia Marczika
- The Self Research Institute, Broken Arrow, Oklahoma, USA
| | - Isaac Prada
- The Self Research Institute, Broken Arrow, Oklahoma, USA
| |
Collapse
|
3
|
Tegegne HA, Freeth FT, Bogaardt C, Taylor E, Reinhardt J, Collineau L, Prada JM, Hénaux V. Implementation of One Health surveillance systems: Opportunities and challenges - lessons learned from the OH-EpiCap application. One Health 2024; 18:100704. [PMID: 38496337 PMCID: PMC10940803 DOI: 10.1016/j.onehlt.2024.100704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 03/04/2024] [Indexed: 03/19/2024] Open
Abstract
As the complexity of health systems has increased over time, there is an urgent need for developing multi-sectoral and multi-disciplinary collaborations within the domain of One Health (OH). Despite the efforts to promote collaboration in health surveillance and overcome professional silos, implementing OH surveillance systems in practice remains challenging for multiple reasons. In this study, we describe the lessons learned from the evaluation of OH surveillance using OH-EpiCap (an online evaluation tool for One Health epidemiological surveillance capacities and capabilities), the challenges identified with the implementation of OH surveillance, and the main barriers that contribute to its sub-optimal functioning, as well as possible solutions to address them. We conducted eleven case studies targeting the multi-sectoral surveillance systems for antimicrobial resistance in Portugal and France, Salmonella in France, Germany, and the Netherlands, Listeria in The Netherlands, Finland and Norway, Campylobacter in Norway and Sweden, and psittacosis in Denmark. These evaluations facilitated the identification of common strengths and weaknesses, focusing on the organization and functioning of existing collaborations and their impacts on the surveillance system. Lack of operational and shared leadership, adherence to FAIR data principles, sharing of techniques, and harmonized indicators led to poor organization and sub-optimal functioning of OH surveillance systems. In the majority of studied systems, the effectiveness, operational costs, behavioral changes, and population health outcomes brought by the OH surveillance over traditional surveillance (i.e. compartmentalized into sectors) have not been evaluated. To this end, the establishment of a formal governance body with representatives from each sector could assist in overcoming long-standing barriers. Moreover, demonstrating the impacts of OH-ness of surveillance may facilitate the implementation of OH surveillance systems.
Collapse
Affiliation(s)
- Henok Ayalew Tegegne
- University of Lyon - ANSES, Laboratory of Lyon, Epidemiology and Support to Surveillance Unit, 69007 Lyon, France
| | - Frederick T.A. Freeth
- University of Surrey, School of Veterinary Medicine, Guildford, GU2 7XH Surrey, United Kingdom
| | - Carlijn Bogaardt
- University of Surrey, School of Veterinary Medicine, Guildford, GU2 7XH Surrey, United Kingdom
| | - Emma Taylor
- University of Surrey, School of Veterinary Medicine, Guildford, GU2 7XH Surrey, United Kingdom
| | - Johana Reinhardt
- ANSES, Risk Assessment Department, Animal Health, Welfare, Feed and Vectors Risk Assessment Unit, 94700 Maisons-Alfort, France
| | - Lucie Collineau
- University of Lyon - ANSES, Laboratory of Lyon, Epidemiology and Support to Surveillance Unit, 69007 Lyon, France
| | - Joaquin M. Prada
- University of Surrey, School of Veterinary Medicine, Guildford, GU2 7XH Surrey, United Kingdom
| | - Viviane Hénaux
- University of Lyon - ANSES, Laboratory of Lyon, Epidemiology and Support to Surveillance Unit, 69007 Lyon, France
| |
Collapse
|
4
|
Ke Y, Yang R, Liu N. Comparing Open-Access Database and Traditional Intensive Care Studies Using Machine Learning: Bibliometric Analysis Study. J Med Internet Res 2024; 26:e48330. [PMID: 38630522 PMCID: PMC11063894 DOI: 10.2196/48330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 08/01/2023] [Accepted: 01/14/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND Intensive care research has predominantly relied on conventional methods like randomized controlled trials. However, the increasing popularity of open-access, free databases in the past decade has opened new avenues for research, offering fresh insights. Leveraging machine learning (ML) techniques enables the analysis of trends in a vast number of studies. OBJECTIVE This study aims to conduct a comprehensive bibliometric analysis using ML to compare trends and research topics in traditional intensive care unit (ICU) studies and those done with open-access databases (OADs). METHODS We used ML for the analysis of publications in the Web of Science database in this study. Articles were categorized into "OAD" and "traditional intensive care" (TIC) studies. OAD studies were included in the Medical Information Mart for Intensive Care (MIMIC), eICU Collaborative Research Database (eICU-CRD), Amsterdam University Medical Centers Database (AmsterdamUMCdb), High Time Resolution ICU Dataset (HiRID), and Pediatric Intensive Care database. TIC studies included all other intensive care studies. Uniform manifold approximation and projection was used to visualize the corpus distribution. The BERTopic technique was used to generate 30 topic-unique identification numbers and to categorize topics into 22 topic families. RESULTS A total of 227,893 records were extracted. After exclusions, 145,426 articles were identified as TIC and 1301 articles as OAD studies. TIC studies experienced exponential growth over the last 2 decades, culminating in a peak of 16,378 articles in 2021, while OAD studies demonstrated a consistent upsurge since 2018. Sepsis, ventilation-related research, and pediatric intensive care were the most frequently discussed topics. TIC studies exhibited broader coverage than OAD studies, suggesting a more extensive research scope. CONCLUSIONS This study analyzed ICU research, providing valuable insights from a large number of publications. OAD studies complement TIC studies, focusing on predictive modeling, while TIC studies capture essential qualitative information. Integrating both approaches in a complementary manner is the future direction for ICU research. Additionally, natural language processing techniques offer a transformative alternative for literature review and bibliometric analysis.
Collapse
Affiliation(s)
- Yuhe Ke
- Division of Anesthesiology and Perioperative Medicine, Singapore General Hospital, Singapore, Singapore
| | - Rui Yang
- Centre for Quantitative Medicine, Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
| |
Collapse
|
5
|
Ouwerkerk J, Rasche H, Spalding JD, Hiltemann S, Stubbs AP. FAIR data retrieval for sensitive clinical research data in Galaxy. Gigascience 2024; 13:giad099. [PMID: 38280189 PMCID: PMC10821763 DOI: 10.1093/gigascience/giad099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 10/16/2023] [Accepted: 11/01/2023] [Indexed: 01/29/2024] Open
Abstract
BACKGROUND In clinical research, data have to be accessible and reproducible, but the generated data are becoming larger and analysis complex. Here we propose a platform for Findable, Accessible, Interoperable, and Reusable (FAIR) data access and creating reproducible findings. Standardized access to a major genomic repository, the European Genome-Phenome Archive (EGA), has been achieved with API services like PyEGA3. We aim to provide a FAIR data analysis service in Galaxy by retrieving genomic data from the EGA and provide a generalized "omics" platform for FAIR data analysis. RESULTS To demonstrate this, we implemented an end-to-end Galaxy workflow to replicate the findings from an RD-Connect synthetic dataset Beyond the 1 Million Genomes (synB1MG) available from the EGA. We developed the PyEGA3 connector within Galaxy to easily download multiple datasets from the EGA. We added the gene.iobio tool, a diagnostic environment for precision genomics, to Galaxy and demonstrate that it provides a more dynamic and interpretable view for trio analysis results. We developed a Galaxy trio analysis workflow to determine the pathogenic variants from the synB1MG trios using the GEMINI and gene.iobio tool. The complete workflow is available at WorkflowHub, and an associated tutorial was created in the Galaxy Training Network, which helps researchers unfamiliar with Galaxy to run the workflow. CONCLUSIONS We showed the feasibility of reusing data from the EGA in Galaxy via PyEGA3 and validated the workflow by rediscovering spiked-in variants in synthetic data. Finally, we improved existing tools in Galaxy and created a workflow for trio analysis to demonstrate the value of FAIR genomics analysis in Galaxy.
Collapse
Affiliation(s)
- Jasper Ouwerkerk
- Clinical Bioinformatics Group, Department of Pathology, Erasmus Medical Center, 3015 CN, Rotterdam, the Netherlands
| | - Helena Rasche
- Clinical Bioinformatics Group, Department of Pathology, Erasmus Medical Center, 3015 CN, Rotterdam, the Netherlands
| | | | - Saskia Hiltemann
- Clinical Bioinformatics Group, Department of Pathology, Erasmus Medical Center, 3015 CN, Rotterdam, the Netherlands
| | - Andrew P Stubbs
- Clinical Bioinformatics Group, Department of Pathology, Erasmus Medical Center, 3015 CN, Rotterdam, the Netherlands
| |
Collapse
|
6
|
Cen HS, Dandamudi S, Lei X, Weight C, Desai M, Gill I, Duddalwar V. Diversity in Renal Mass Data Cohorts: Implications for Urology AI Researchers. Oncology 2023; 102:574-584. [PMID: 38104555 PMCID: PMC11178677 DOI: 10.1159/000535841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 12/08/2023] [Indexed: 12/19/2023]
Abstract
INTRODUCTION We examine the heterogeneity and distribution of the cohort populations in two publicly used radiological image cohorts, the Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma (TCIA TCGA KIRC) collection and 2019 MICCAI Kidney Tumor Segmentation Challenge (KiTS19), and deviations in real-world population renal cancer data from the National Cancer Database (NCDB) Participant User Data File (PUF) and tertiary center data. PUF data are used as an anchor for prevalence rate bias assessment. Specific gene expression and, therefore, biology of RCC differ by self-reported race, especially between the African American and Caucasian populations. AI algorithms learn from datasets, but if the dataset misrepresents the population, reinforcing bias may occur. Ignoring these demographic features may lead to inaccurate downstream effects, thereby limiting the translation of these analyses to clinical practice. Consciousness of model training biases is vital to patient care decisions when using models in clinical settings. METHODS Data elements evaluated included gender, demographics, reported pathologic grading, and cancer staging. American Urological Association risk levels were used. Poisson regression was performed to estimate the population-based and sample-specific estimation for prevalence rate and corresponding 95% confidence interval. SAS 9.4 was used for data analysis. RESULTS Compared to PUF, KiTS19 and TCGA KIRC oversampled Caucasian by 9.5% (95% CI, -3.7 to 22.7%) and 15.1% (95% CI, 1.5 to 28.8%), undersampled African American by -6.7% (95% CI, -10% to -3.3%), and -5.5% (95% CI, -9.3% to -1.8%). Tertiary also undersampled African American by -6.6% (95% CI, -8.7% to -4.6%). The tertiary cohort largely undersampled aggressive cancers by -14.7% (95% CI, -20.9% to -8.4%). No statistically significant difference was found among PUF, TCGA, and KiTS19 in aggressive rate; however, heterogeneities in risk are notable. CONCLUSION Heterogeneities between cohorts need to be considered in future AI training and cross-validation for renal masses.
Collapse
Affiliation(s)
- Harmony Selena Cen
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA,
| | - Siddhartha Dandamudi
- College of Human Medicine, Michigan State University, East Lansing, Michigan, USA
| | - Xiaomeng Lei
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Chris Weight
- Urologic Oncology, Cleveland Clinic, Cleveland, Ohio, USA
| | - Mihir Desai
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Inderbir Gill
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Vinay Duddalwar
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
7
|
Ting JM, Tamayo-Mendoza T, Petersen SR, Van Reet J, Ahmed UA, Snell NJ, Fisher JD, Stern M, Oviedo F. Frontiers in nonviral delivery of small molecule and genetic drugs, driven by polymer chemistry and machine learning for materials informatics. Chem Commun (Camb) 2023; 59:14197-14209. [PMID: 37955165 DOI: 10.1039/d3cc04705a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2023]
Abstract
Materials informatics (MI) has immense potential to accelerate the pace of innovation and new product development in biotechnology. Close collaborations between skilled physical and life scientists with data scientists are being established in pursuit of leveraging MI tools in automation and artificial intelligence (AI) to predict material properties in vitro and in vivo. However, the scarcity of large, standardized, and labeled materials data for connecting structure-function relationships represents one of the largest hurdles to overcome. In this Highlight, focus is brought to emerging developments in polymer-based therapeutic delivery platforms, where teams generate large experimental datasets around specific therapeutics and successfully establish a design-to-deployment cycle of specialized nanocarriers. Three select collaborations demonstrate how custom-built polymers protect and deliver small molecules, nucleic acids, and proteins, representing ideal use-cases for machine learning to understand how molecular-level interactions impact drug stabilization and release. We conclude with our perspectives on how MI innovations in automation efficiencies and digitalization of data-coupled with fundamental insight and creativity from the polymer science community-can accelerate translation of more gene therapies into lifesaving medicines.
Collapse
|
8
|
Inau ET, Sack J, Waltemath D, Zeleke AA. Initiatives, Concepts, and Implementation Practices of the Findable, Accessible, Interoperable, and Reusable Data Principles in Health Data Stewardship: Scoping Review. J Med Internet Res 2023; 25:e45013. [PMID: 37639292 PMCID: PMC10495848 DOI: 10.2196/45013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 03/25/2023] [Accepted: 04/14/2023] [Indexed: 08/29/2023] Open
Abstract
BACKGROUND Thorough data stewardship is a key enabler of comprehensive health research. Processes such as data collection, storage, access, sharing, and analytics require researchers to follow elaborate data management strategies properly and consistently. Studies have shown that findable, accessible, interoperable, and reusable (FAIR) data leads to improved data sharing in different scientific domains. OBJECTIVE This scoping review identifies and discusses concepts, approaches, implementation experiences, and lessons learned in FAIR initiatives in health research data. METHODS The Arksey and O'Malley stage-based methodological framework for scoping reviews was applied. PubMed, Web of Science, and Google Scholar were searched to access relevant publications. Articles written in English, published between 2014 and 2020, and addressing FAIR concepts or practices in the health domain were included. The 3 data sources were deduplicated using a reference management software. In total, 2 independent authors reviewed the eligibility of each article based on defined inclusion and exclusion criteria. A charting tool was used to extract information from the full-text papers. The results were reported using the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. RESULTS A total of 2.18% (34/1561) of the screened articles were included in the final review. The authors reported FAIRification approaches, which include interpolation, inclusion of comprehensive data dictionaries, repository design, semantic interoperability, ontologies, data quality, linked data, and requirement gathering for FAIRification tools. Challenges and mitigation strategies associated with FAIRification, such as high setup costs, data politics, technical and administrative issues, privacy concerns, and difficulties encountered in sharing health data despite its sensitive nature were also reported. We found various workflows, tools, and infrastructures designed by different groups worldwide to facilitate the FAIRification of health research data. We also uncovered a wide range of problems and questions that researchers are trying to address by using the different workflows, tools, and infrastructures. Although the concept of FAIR data stewardship in the health research domain is relatively new, almost all continents have been reached by at least one network trying to achieve health data FAIRness. Documented outcomes of FAIRification efforts include peer-reviewed publications, improved data sharing, facilitated data reuse, return on investment, and new treatments. Successful FAIRification of data has informed the management and prognosis of various diseases such as cancer, cardiovascular diseases, and neurological diseases. Efforts to FAIRify data on a wider variety of diseases have been ongoing since the COVID-19 pandemic. CONCLUSIONS This work summarises projects, tools, and workflows for the FAIRification of health research data. The comprehensive review shows that implementing the FAIR concept in health data stewardship carries the promise of improved research data management and transparency in the era of big data and open research publishing. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) RR2-10.2196/22505.
Collapse
Affiliation(s)
- Esther Thea Inau
- Department of Medical Informatics, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
| | - Jean Sack
- International Health Department, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Dagmar Waltemath
- Department of Medical Informatics, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
| | - Atinkut Alamirrew Zeleke
- Department of Medical Informatics, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
| |
Collapse
|
9
|
Wolfien M, Ahmadi N, Fitzer K, Grummt S, Heine KL, Jung IC, Krefting D, Kühn A, Peng Y, Reinecke I, Scheel J, Schmidt T, Schmücker P, Schüttler C, Waltemath D, Zoch M, Sedlmayr M. Ten Topics to Get Started in Medical Informatics Research. J Med Internet Res 2023; 25:e45948. [PMID: 37486754 PMCID: PMC10407648 DOI: 10.2196/45948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 03/29/2023] [Accepted: 04/11/2023] [Indexed: 07/25/2023] Open
Abstract
The vast and heterogeneous data being constantly generated in clinics can provide great wealth for patients and research alike. The quickly evolving field of medical informatics research has contributed numerous concepts, algorithms, and standards to facilitate this development. However, these difficult relationships, complex terminologies, and multiple implementations can present obstacles for people who want to get active in the field. With a particular focus on medical informatics research conducted in Germany, we present in our Viewpoint a set of 10 important topics to improve the overall interdisciplinary communication between different stakeholders (eg, physicians, computational experts, experimentalists, students, patient representatives). This may lower the barriers to entry and offer a starting point for collaborations at different levels. The suggested topics are briefly introduced, then general best practice guidance is given, and further resources for in-depth reading or hands-on tutorials are recommended. In addition, the topics are set to cover current aspects and open research gaps of the medical informatics domain, including data regulations and concepts; data harmonization and processing; and data evaluation, visualization, and dissemination. In addition, we give an example on how these topics can be integrated in a medical informatics curriculum for higher education. By recognizing these topics, readers will be able to (1) set clinical and research data into the context of medical informatics, understanding what is possible to achieve with data or how data should be handled in terms of data privacy and storage; (2) distinguish current interoperability standards and obtain first insights into the processes leading to effective data transfer and analysis; and (3) value the use of newly developed technical approaches to utilize the full potential of clinical data.
Collapse
Affiliation(s)
- Markus Wolfien
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- Center for Scalable Data Analytics and Artificial Intelligence, Dresden, Germany
| | - Najia Ahmadi
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Kai Fitzer
- Core Unit Data Integration Center, University Medicine Greifswald, Greifswald, Germany
| | - Sophia Grummt
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Kilian-Ludwig Heine
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Ian-C Jung
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Dagmar Krefting
- Department of Medical Informatics, University Medical Center, Goettingen, Germany
| | - Andreas Kühn
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Yuan Peng
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Ines Reinecke
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Julia Scheel
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany
| | - Tobias Schmidt
- Institute for Medical Informatics, University of Applied Sciences Mannheim, Mannheim, Germany
| | - Paul Schmücker
- Institute for Medical Informatics, University of Applied Sciences Mannheim, Mannheim, Germany
| | - Christina Schüttler
- Central Biobank Erlangen, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Dagmar Waltemath
- Core Unit Data Integration Center, University Medicine Greifswald, Greifswald, Germany
- Department of Medical Informatics, University Medicine Greifswald, Greifswald, Germany
| | - Michele Zoch
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Martin Sedlmayr
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- Center for Scalable Data Analytics and Artificial Intelligence, Dresden, Germany
| |
Collapse
|
10
|
Sonin J, Becker A, Nipp K. Designing health outcomes through patient data ownership. J Hosp Med 2023. [PMID: 37321927 DOI: 10.1002/jhm.13148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 05/23/2023] [Accepted: 05/24/2023] [Indexed: 06/17/2023]
Affiliation(s)
- Juhan Sonin
- University of Illinois at Champaign-Urbana, Champaign, Illinois, USA
- GoInvo.com (LLC), Arlington, Massachusetts, USA
- Mechanical Engineering, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, United States
| | - Annie Becker
- University of North Carolina Gillings School for Global Public Health, Chapel Hill, North Carolina, USA
- University of Washington School of Public Health, Seattle, Washington, USA
| | - Kim Nipp
- University of Toronto, Toronto, Ontario, Canada
- University of British Columbia, Vancouver, University of British Columbia, Canada
| |
Collapse
|
11
|
Eminaga O, Lee TJ, Ge J, Shkolyar E, Laurie M, Long J, Hockman LG, Liao JC. Conceptual framework and documentation standards of cystoscopic media content for artificial intelligence. J Biomed Inform 2023; 142:104369. [PMID: 37088456 PMCID: PMC10643098 DOI: 10.1016/j.jbi.2023.104369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Revised: 04/03/2023] [Accepted: 04/18/2023] [Indexed: 04/25/2023]
Abstract
BACKGROUND The clinical documentation of cystoscopy includes visual and textual materials. However, the secondary use of visual cystoscopic data for educational and research purposes remains limited due to inefficient data management in routine clinical practice. METHODS A conceptual framework was designed to document cystoscopy in a standardized manner with three major sections: data management, annotation management, and utilization management. A Swiss-cheese model was proposed for quality control and root cause analyses. We defined the infrastructure required to implement the framework with respect to FAIR (findable, accessible, interoperable, reusable) principles. We applied two scenarios exemplifying data sharing for research and educational projects to ensure compliance with FAIR principles. RESULTS The framework was successfully implemented while following FAIR principles. The cystoscopy atlas produced from the framework could be presented in an educational web portal; a total of 68 full-length qualitative videos and corresponding annotation data were sharable for artificial intelligence projects covering frame classification and segmentation problems at case, lesion, and frame levels. CONCLUSION Our study shows that the proposed framework facilitates the storage of visual documentation in a standardized manner and enables FAIR data for education and artificial intelligence research.
Collapse
Affiliation(s)
- Okyaz Eminaga
- Department of Urology, Stanford University School of Medicine, Stanford, USA; Center for Artificial Intelligence and Medical Imaging, Stanford University School of Medicine, Stanford, CA, USA.
| | - Timothy Jiyong Lee
- Department of Urology, Stanford University School of Medicine, Stanford, USA
| | - Jessie Ge
- Department of Urology, Stanford University School of Medicine, Stanford, USA
| | - Eugene Shkolyar
- Department of Urology, Stanford University School of Medicine, Stanford, USA
| | - Mark Laurie
- Department of Urology, Stanford University School of Medicine, Stanford, USA
| | - Jin Long
- Center for Artificial Intelligence and Medical Imaging, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Joseph C Liao
- Department of Urology, Stanford University School of Medicine, Stanford, USA; Center for Artificial Intelligence and Medical Imaging, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
12
|
Eminaga O, Lee TJ, Ge J, Shkolyar E, Laurie M, Long J, Hockman LG, Liao JC. Conceptual Framework and Documentation Standards of Cystoscopic Media Content for Artificial Intelligence. ARXIV 2023:arXiv:2301.05991v2. [PMID: 36713258 PMCID: PMC9882574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
BACKGROUND The clinical documentation of cystoscopy includes visual and textual materials. However, the secondary use of visual cystoscopic data for educational and research purposes remains limited due to inefficient data management in routine clinical practice. METHODS A conceptual framework was designed to document cystoscopy in a standardized manner with three major sections: data management, annotation management, and utilization management. A Swiss-cheese model was proposed for quality control and root cause analyses. We defined the infrastructure required to implement the framework with respect to FAIR (findable, accessible, interoperable, re-usable) principles. We applied two scenarios exemplifying data sharing for research and educational projects to ensure the compliance with FAIR principles. RESULTS The framework was successfully implemented while following FAIR principles. The cystoscopy atlas produced from the framework could be presented in an educational web portal; a total of 68 full-length qualitative videos and corresponding annotation data were sharable for artificial intelligence projects covering frame classification and segmentation problems at case, lesion and frame levels. CONCLUSION Our study shows that the proposed framework facilitates the storage of the visual documentation in a standardized manner and enables FAIR data for education and artificial intelligence research.
Collapse
Affiliation(s)
- Okyaz Eminaga
- Department of Urology, Stanford University School of Medicine, Stanford
- Center for Artificial Intelligence and Medical Imaging, Stanford University School of Medicine, Stanford, CA
| | | | - Jessie Ge
- Department of Urology, Stanford University School of Medicine, Stanford
| | - Eugene Shkolyar
- Department of Urology, Stanford University School of Medicine, Stanford
| | - Mark Laurie
- Department of Urology, Stanford University School of Medicine, Stanford
| | - Jin Long
- Center for Artificial Intelligence and Medical Imaging, Stanford University School of Medicine, Stanford, CA
| | | | - Joseph C. Liao
- Department of Urology, Stanford University School of Medicine, Stanford
- Center for Artificial Intelligence and Medical Imaging, Stanford University School of Medicine, Stanford, CA
| |
Collapse
|
13
|
Thomas DM, Kleinberg S, Brown AW, Crow M, Bastian ND, Reisweber N, Lasater R, Kendall T, Shafto P, Blaine R, Smith S, Ruiz D, Morrell C, Clark N. Machine learning modeling practices to support the principles of AI and ethics in nutrition research. Nutr Diabetes 2022; 12:48. [PMID: 36456550 PMCID: PMC9715415 DOI: 10.1038/s41387-022-00226-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Revised: 10/28/2022] [Accepted: 11/15/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Nutrition research is relying more on artificial intelligence and machine learning models to understand, diagnose, predict, and explain data. While artificial intelligence and machine learning models provide powerful modeling tools, failure to use careful and well-thought-out modeling processes can lead to misleading conclusions and concerns surrounding ethics and bias. METHODS Based on our experience as reviewers and journal editors in nutrition and obesity, we identified the most frequently omitted best practices from statistical modeling and how these same practices extend to machine learning models. We next addressed areas required for implementation of machine learning that are not included in commercial software packages. RESULTS Here, we provide a tutorial on best artificial intelligence and machine learning modeling practices that can reduce potential ethical problems with a checklist and guiding principles to aid nutrition researchers in developing, evaluating, and implementing artificial intelligence and machine learning models in nutrition research. CONCLUSION The quality of AI/ML modeling in nutrition research requires iterative and tailored processes to mitigate against potential ethical problems or to predict conclusions that are free of bias.
Collapse
Affiliation(s)
- Diana M. Thomas
- grid.419884.80000 0001 2287 2270Department of Mathematical Sciences, United States Military Academy, West Point, NY 10996 USA
| | - Samantha Kleinberg
- grid.217309.e0000 0001 2180 0654Department of Computer Science, Stevens Institute of Technology, Hoboken, NJ 07030 USA
| | - Andrew W. Brown
- grid.241054.60000 0004 4687 1637Department of Biostatistics, University of Arkansas for Medical Sciences, Little Rock, AR 72205 USA ,grid.488749.eArkansas Children’s Research Institute, Little Rock, AR 72202 USA
| | - Mason Crow
- grid.419884.80000 0001 2287 2270Department of Mathematical Sciences, United States Military Academy, West Point, NY 10996 USA
| | - Nathaniel D. Bastian
- grid.419884.80000 0001 2287 2270Army Cyber Institute, United States Military Academy, West Point, NY 10996 USA
| | - Nicholas Reisweber
- grid.419884.80000 0001 2287 2270Department of Mathematical Sciences, United States Military Academy, West Point, NY 10996 USA
| | - Robert Lasater
- grid.419884.80000 0001 2287 2270Department of Mathematical Sciences, United States Military Academy, West Point, NY 10996 USA
| | - Thomas Kendall
- grid.419884.80000 0001 2287 2270Department of Mathematical Sciences, United States Military Academy, West Point, NY 10996 USA
| | - Patrick Shafto
- grid.430387.b0000 0004 1936 8796Department of Mathematics and Computer Science, Rutgers University, Newark, NJ 07102 USA
| | - Raymond Blaine
- grid.419884.80000 0001 2287 2270Department of Electrical Engineering and Computer Science, United States Military Academy, West Point, NY 10996 USA
| | - Sarah Smith
- grid.419884.80000 0001 2287 2270Department of Electrical Engineering and Computer Science, United States Military Academy, West Point, NY 10996 USA
| | - Daniel Ruiz
- grid.419884.80000 0001 2287 2270Department of Electrical Engineering and Computer Science, United States Military Academy, West Point, NY 10996 USA
| | - Christopher Morrell
- grid.419884.80000 0001 2287 2270Department of Electrical Engineering and Computer Science, United States Military Academy, West Point, NY 10996 USA
| | - Nicholas Clark
- grid.419884.80000 0001 2287 2270Department of Mathematical Sciences, United States Military Academy, West Point, NY 10996 USA
| |
Collapse
|
14
|
“Who Is the FAIRest of Them All?” Authors, Entities, and Journals Regarding FAIR Data Principles. PUBLICATIONS 2022. [DOI: 10.3390/publications10030031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The perceived need to improve the infrastructure supporting the re-use of scholarly data since the second decade of the 21st century led to the design of a concise number of principles and metrics, named FAIR Data Principles. This paper, part of an extended study, intends to identify the main authors, entities, and scientific journals linked to research conducted within the FAIR Data Principles. The research was developed by means of a qualitative approach, using documentary research and a constant comparison method for codification and categorization of the sampled data. The sample studied showed that most authors were located in the Netherlands, with Europe accounting for more than 70% of the number of authors considered. Most of these are researchers and work in higher education institutions. These entities can be found in most of the territorial-administrative areas under consideration, with the USA being the country with more entities and Europe being the world region where they are more numerous. The journal with more texts in the used sample was Insights, with 2020 being the year when more texts were published. Two of the most prominent authors present in the sample texts were located in the Netherlands, while the other two were in France and Australia.
Collapse
|
15
|
Niarakis A, Waltemath D, Glazier J, Schreiber F, Keating SM, Nickerson D, Chaouiya C, Siegel A, Noël V, Hermjakob H, Helikar T, Soliman S, Calzone L. Addressing barriers in comprehensiveness, accessibility, reusability, interoperability and reproducibility of computational models in systems biology. Brief Bioinform 2022; 23:bbac212. [PMID: 35671510 PMCID: PMC9294410 DOI: 10.1093/bib/bbac212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/20/2022] [Accepted: 05/06/2022] [Indexed: 11/14/2022] Open
Abstract
Computational models are often employed in systems biology to study the dynamic behaviours of complex systems. With the rise in the number of computational models, finding ways to improve the reusability of these models and their ability to reproduce virtual experiments becomes critical. Correct and effective model annotation in community-supported and standardised formats is necessary for this improvement. Here, we present recent efforts toward a common framework for annotated, accessible, reproducible and interoperable computational models in biology, and discuss key challenges of the field.
Collapse
Affiliation(s)
- Anna Niarakis
- Université Paris-Saclay, Laboratoire Européen de Recherche pour la Polyarthrite rhumatoïde - Genhotel, Univ Evry, Evry, France
- Lifeware Group, Inria, Saclay-île de France, 91120 Palaiseau, France
| | - Dagmar Waltemath
- Department of Medical Informatics, University Medicine Greifswald, Greifswald, Germany
| | - James Glazier
- Biocomplexity Institute and Department of Intelligent Systems Engineering, Indiana University, Bloomington, IN, USA
| | - Falk Schreiber
- Department of Computer and Information Science, University of Konstanz, Konstanz, Germany
- Faculty of Information Technology, Monash University, Clayton, Australia
| | | | - David Nickerson
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | | | - Anne Siegel
- Univ Rennes, CNRS, Inria - IRISA lab. Rennes
| | - Vincent Noël
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Henning Hermjakob
- EMBL-European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
| | - Tomáš Helikar
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Sylvain Soliman
- Lifeware Group, Inria, Saclay-île de France, 91120 Palaiseau, France
| | - Laurence Calzone
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| |
Collapse
|
16
|
Van Meenen J, Leysen H, Chen H, Baccarne R, Walter D, Martin B, Maudsley S. Making Biomedical Sciences publications more accessible for machines. MEDICINE, HEALTH CARE, AND PHILOSOPHY 2022; 25:179-190. [PMID: 35039972 DOI: 10.1007/s11019-022-10069-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 01/08/2022] [Indexed: 06/14/2023]
Abstract
With the rapidly expanding catalogue of scientific publications, especially within the Biomedical Sciences field, it is becoming increasingly difficult for researchers to search for, read or even interpret emerging scientific findings. PubMed, just one of the current biomedical data repositories, comprises over 33 million citations for biomedical research, and over 2500 publications are added each day. To further strengthen the impact biomedical research, we suggest that there should be more synergy between publications and machines. By bringing machines into the realm of research and publication, we can greatly augment the assessment, investigation and cataloging of the biomedical literary corpus. The effective application of machine-based manuscript assessment and interpretation is now crucial, and potentially stands as the most effective way for researchers to comprehend and process the tsunami of biomedical data and literature. Many biomedical manuscripts are currently published online in poorly searchable document types, with figures and data presented in formats that are partially inaccessible to machine-based approaches. The structure and format of biomedical manuscripts should be adapted to facilitate machine-assisted interrogation of this important literary corpus. In this context, it is important to embrace the concept that biomedical scientists should also write manuscripts that can be read by machines. It is likely that an enhanced human-machine synergy in reading biomedical publications will greatly enhance biomedical data retrieval and reveal novel insights into complex datasets.
Collapse
Affiliation(s)
- Joris Van Meenen
- Receptor Biology Lab, Department of Biomedical Sciences, University of Antwerp, Wilrijk, 2610, Antwerp, Belgium
- Antwerp Research Group for Ocular Science, Department of Translational Neurosciences, University of Antwerp, Wilrijk, 2610, Antwerp, Belgium
| | - Hanne Leysen
- Receptor Biology Lab, Department of Biomedical Sciences, University of Antwerp, Wilrijk, 2610, Antwerp, Belgium
| | - Hongyu Chen
- Weill Cornell Medical College, New York, NY, USA
| | - Rudi Baccarne
- Anet Library Automation, University of Antwerp, Wilrijk, 2610, Antwerp, Belgium
| | - Deborah Walter
- Receptor Biology Lab, Department of Biomedical Sciences, University of Antwerp, Wilrijk, 2610, Antwerp, Belgium
| | - Bronwen Martin
- Faculty of Pharmaceutical, Veterinary and Biomedical Sciences, University of Antwerp, Wilrijk, 2610, Antwerp, Belgium
| | - Stuart Maudsley
- Receptor Biology Lab, Department of Biomedical Sciences, University of Antwerp, Wilrijk, 2610, Antwerp, Belgium.
| |
Collapse
|
17
|
Françoise M, Frambourt C, Goodwin P, Haggerty F, Jacques M, Lama ML, Leroy C, Martin A, Calderon RM, Robert J, Schulz-Ruthenberg E, Tafur L, Nasser M, Stüwe L. Evidence based policy making during times of uncertainty through the lens of future policy makers: four recommendations to harmonise and guide health policy making in the future. Arch Public Health 2022; 80:140. [PMID: 35585647 PMCID: PMC9115540 DOI: 10.1186/s13690-022-00898-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Accepted: 05/10/2022] [Indexed: 11/10/2022] Open
Abstract
The Covid-19 pandemic has not only outlined the importance of using evidence in the healthcare policy making process but also the complexity that exists between policymakers and the scientific community. As a matter of fact, scientific data is just one of many other concurrent factors, including economic, social and cultural, that may provide the rationale for policy making. The pandemic has also raised citizens' awareness and represented an unprecedented moment of willingness to access and understand the evidence underpinning health policies.This commentary provides policy recommendations to improve evidence-based policy making in health, through the lens of a young generation of public policy students and future policymakers, enrolled in a 24-hour course at Sciences Po Paris entitled "Evidence-based policy-making in health: theory and practice(s)".Four out of 11 recommendations were prioritised and presented in this commentary which target both policymakers and the scientific community to make better use of evidence-based policy making in health. First, policy makers and scientists should build trusting partnerships with citizens and engage them, especially those facing our target health care issues or systems. Second, while artificial intelligence raises new opportunities in healthcare, its use in contexts of uncertainty should be addressed by policymakers in terms of liability and ethics. Third, conflicts of interest must be disclosed as much as possible and effectively managed to (re) build a trust relationship between policymakers, the scientific community and citizens, implying the need for risk management tools and cross border disclosure mechanisms. Last, well-designed and secure health information systems need to be implemented, following the FAIR (findable, accessible, interoperable and reusable) principles for health data. This will take us a step further from data to 'policy wisdom'.Overall, these recommendations identified and formulated by students highlight some key issues that need to be rethought in the health policy cycle through elements like institutional incentives, cultural changes and dialogue between policy makers and the scientific community. This input from a younger generation of students highlights the importance of making the conversation on evidence-based policy making in health accessible to all generations and backgrounds.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Mona Nasser
- Peninsula Dental School, Faculty of Health, University of Plymouth, Plymouth, UK
| | | |
Collapse
|
18
|
Pires J, Huisman JS, Bonhoeffer S, Van Boeckel TP. Increase in antimicrobial resistance in Escherichia coli in food animals between 1980 and 2018 assessed using genomes from public databases. J Antimicrob Chemother 2021; 77:646-655. [PMID: 34894245 DOI: 10.1093/jac/dkab451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 11/09/2021] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Next-generation sequencing has considerably increased the number of genomes available in the public domain. However, efforts to use these genomes for surveillance of antimicrobial resistance have thus far been limited and geographically heterogeneous. We inferred global resistance trends in Escherichia coli in food animals using genomes from public databases. METHODS We retrieved 7632 E. coli genomes from public databases (NCBI, PATRIC and EnteroBase) and screened for antimicrobial resistance genes (ARGs) using ResFinder. Selection bias towards resistance, virulence or specific strains was accounted for by screening BioProject descriptions. Temporal trends for MDR, resistance to antimicrobial classes and ARG prevalence were inferred using generalized linear models for all genomes, including those not subjected to selection bias. RESULTS MDR increased by 1.6 times between 1980 and 2018, as genomes carried, on average, ARGs conferring resistance to 2.65 antimicrobials in swine, 2.22 in poultry and 1.58 in bovines. Highest resistance levels were observed for tetracyclines (42.2%-69.1%), penicillins (19.4%-47.5%) and streptomycin (28.6%-56.6%). Resistance trends were consistent after accounting for selection bias, although lower mean absolute resistance estimates were associated with genomes not subjected to selection bias (difference of 3.16%±3.58% across years, hosts and antimicrobial classes). We observed an increase in extended-spectrum cephalosporin ARG blaCMY-2 and a progressive substitution of tetB by tetA. Estimates of resistance prevalence inferred from genomes in the public domain were in good agreement with reports from systematic phenotypic surveillance. CONCLUSIONS Our analysis illustrates the potential of using the growing volume of genomes in public databases to track AMR trends globally.
Collapse
Affiliation(s)
- João Pires
- Institute for Environmental Decisions, ETH Zurich, Zurich, Switzerland
| | - Jana S Huisman
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Thomas P Van Boeckel
- Institute for Environmental Decisions, ETH Zurich, Zurich, Switzerland.,Center for Disease Dynamics, Economics & Policy, New Delhi, India
| |
Collapse
|
19
|
Alvarez-Romero C, Martinez-Garcia A, Ternero Vega J, Díaz-Jimènez P, Jimènez-Juan C, Nieto-Martín MD, Román Villarán E, Kovacevic T, Bokan D, Hromis S, Djekic Malbasa J, Beslać S, Zaric B, Gencturk M, Sinaci AA, Ollero Baturone M, Parra Calderón CL. Predicting 30-days Readmission Risk for COPD Patients Care through a Federated Machine Learning Architecture on FAIR Data: Development and Validation Study (Preprint). JMIR Med Inform 2021; 10:e35307. [PMID: 35653170 PMCID: PMC9204581 DOI: 10.2196/35307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/16/2022] [Accepted: 04/21/2022] [Indexed: 12/16/2022] Open
Abstract
Background Owing to the nature of health data, their sharing and reuse for research are limited by legal, technical, and ethical implications. In this sense, to address that challenge and facilitate and promote the discovery of scientific knowledge, the Findable, Accessible, Interoperable, and Reusable (FAIR) principles help organizations to share research data in a secure, appropriate, and useful way for other researchers. Objective The objective of this study was the FAIRification of existing health research data sets and applying a federated machine learning architecture on top of the FAIRified data sets of different health research performing organizations. The entire FAIR4Health solution was validated through the assessment of a federated model for real-time prediction of 30-day readmission risk in patients with chronic obstructive pulmonary disease (COPD). Methods The application of the FAIR principles on health research data sets in 3 different health care settings enabled a retrospective multicenter study for the development of specific federated machine learning models for the early prediction of 30-day readmission risk in patients with COPD. This predictive model was generated upon the FAIR4Health platform. Finally, an observational prospective study with 30 days follow-up was conducted in 2 health care centers from different countries. The same inclusion and exclusion criteria were used in both retrospective and prospective studies. Results Clinical validation was demonstrated through the implementation of federated machine learning models on top of the FAIRified data sets from different health research performing organizations. The federated model for predicting the 30-day hospital readmission risk was trained using retrospective data from 4.944 patients with COPD. The assessment of the predictive model was performed using the data of 100 recruited (22 from Spain and 78 from Serbia) out of 2070 observed (records viewed) patients during the observational prospective study, which was executed from April 2021 to September 2021. Significant accuracy (0.98) and precision (0.25) of the predictive model generated upon the FAIR4Health platform were observed. Therefore, the generated prediction of 30-day readmission risk was confirmed in 87% (87/100) of cases. Conclusions Implementing a FAIR data policy in health research performing organizations to facilitate data sharing and reuse is relevant and needed, following the discovery, access, integration, and analysis of health research data. The FAIR4Health project proposes a technological solution in the health domain to facilitate alignment with the FAIR principles.
Collapse
Affiliation(s)
- Celia Alvarez-Romero
- Computational Health Informatics Group, Institute of Biomedicine of Seville, Virgen del Rocío University Hospital, Consejo Superior de Investigaciones Científicas, University of Seville, Seville, Spain
| | - Alicia Martinez-Garcia
- Computational Health Informatics Group, Institute of Biomedicine of Seville, Virgen del Rocío University Hospital, Consejo Superior de Investigaciones Científicas, University of Seville, Seville, Spain
| | - Jara Ternero Vega
- Internal Medicine Department, Virgen del Rocío University Hospital, Seville, Spain
| | - Pablo Díaz-Jimènez
- Internal Medicine Department, Virgen del Rocío University Hospital, Seville, Spain
| | - Carlos Jimènez-Juan
- Internal Medicine Department, Virgen del Rocío University Hospital, Seville, Spain
| | | | - Esther Román Villarán
- Computational Health Informatics Group, Institute of Biomedicine of Seville, Virgen del Rocío University Hospital, Consejo Superior de Investigaciones Científicas, University of Seville, Seville, Spain
| | - Tomi Kovacevic
- Institute for Pulmonary Diseases of Vojvodina, Sremska Kamenica,
- Medical Faculty, University of Novi Sad, Novi Sad,
| | - Darijo Bokan
- Institute for Pulmonary Diseases of Vojvodina, Sremska Kamenica,
| | - Sanja Hromis
- Institute for Pulmonary Diseases of Vojvodina, Sremska Kamenica,
- Medical Faculty, University of Novi Sad, Novi Sad,
| | - Jelena Djekic Malbasa
- Institute for Pulmonary Diseases of Vojvodina, Sremska Kamenica,
- Medical Faculty, University of Novi Sad, Novi Sad,
| | - Suzana Beslać
- Institute for Pulmonary Diseases of Vojvodina, Sremska Kamenica,
| | - Bojan Zaric
- Institute for Pulmonary Diseases of Vojvodina, Sremska Kamenica,
- Medical Faculty, University of Novi Sad, Novi Sad,
| | - Mert Gencturk
- Software Research & Development and Consultancy Corporation, Ankara, Turkey
| | - A Anil Sinaci
- Software Research & Development and Consultancy Corporation, Ankara, Turkey
| | | | - Carlos Luis Parra Calderón
- Computational Health Informatics Group, Institute of Biomedicine of Seville, Virgen del Rocío University Hospital, Consejo Superior de Investigaciones Científicas, University of Seville, Seville, Spain
| |
Collapse
|