1
|
Hu J, Zhao C, Shi C, Zhao Z, Ren Z. Speech-based recognition and estimating severity of PTSD using machine learning. J Affect Disord 2024; 362:859-868. [PMID: 39009320 DOI: 10.1016/j.jad.2024.07.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 05/31/2024] [Accepted: 07/11/2024] [Indexed: 07/17/2024]
Abstract
BACKGROUND Traditional methodologies for diagnosing post-traumatic stress disorder (PTSD) primarily rely on interviews, incurring considerable costs and lacking objective indices. Integrating biomarkers and machine learning techniques into this diagnostic process has the potential to facilitate accurate PTSD assessment by clinicians. METHODS We assembled a dataset encompassing recordings from 76 individuals diagnosed with PTSD and 60 healthy controls. Leveraging the openSmile framework, we extracted acoustic features from these recordings and employed a random forest algorithm for feature selection. Subsequently, these selected features were utilized as inputs for six distinct classification models and a regression model. RESULTS Classification models employing a feature set of 18 elements yielded robust binary prediction outcomes for PTSD. Notably, the RF model achieved peak accuracy at 0.975 with the highest AUC of 1.0. In terms of the regression model, it exhibited significant predictive capability for PCL-5 scores (MSE = 0.90, MAE = 0.76, R2 = 0.10, p < 0.001). Noteworthy was the correlation coefficient of 0.33 (p < 0.01) between predicted and actual values. LIMITATIONS Firstly, the process of feature selection may compromise the stability of models, which leads to potentially overestimating results. Secondly, it is hard to elucidate the nature of biological mechanisms behind between PTSD patients and healthy individuals. Lastly, the regression model has a limited prediction for PTSD. CONCLUSIONS Distinct speech patterns differentiate PTSD patients and controls. Classification models accurately discern both groups. Regression model gauges PTSD severity, but further validation on larger datasets is needed.
Collapse
Affiliation(s)
- Jiawei Hu
- School of Psychology, Central China Normal University, Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan 430079, China; Key Laboratory of Adolescent CyberPsychology and Behavior(CCNU), National Intelligent Society Governance Experiment Base (Education), Ministry of Education, Wuhan 430079, China
| | - Chunxiao Zhao
- School of Medical Humanities, Hubei University of Chinese Medicine, Wuhan 430065, China
| | - Congrong Shi
- School of Educational Science, Anhui Normal University, Wuhu 241000, China
| | - Ziyi Zhao
- School of Psychology, Central China Normal University, Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan 430079, China; Key Laboratory of Adolescent CyberPsychology and Behavior(CCNU), National Intelligent Society Governance Experiment Base (Education), Ministry of Education, Wuhan 430079, China
| | - Zhihong Ren
- School of Psychology, Central China Normal University, Key Laboratory of Human Development and Mental Health of Hubei Province, Wuhan 430079, China; Key Laboratory of Adolescent CyberPsychology and Behavior(CCNU), National Intelligent Society Governance Experiment Base (Education), Ministry of Education, Wuhan 430079, China.
| |
Collapse
|
2
|
Razavi M, Ziyadidegan S, Mahmoudzadeh A, Kazeminasab S, Baharlouei E, Janfaza V, Jahromi R, Sasangohar F. Machine Learning, Deep Learning, and Data Preprocessing Techniques for Detecting, Predicting, and Monitoring Stress and Stress-Related Mental Disorders: Scoping Review. JMIR Ment Health 2024; 11:e53714. [PMID: 39167782 PMCID: PMC11375388 DOI: 10.2196/53714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 05/01/2024] [Accepted: 05/17/2024] [Indexed: 08/23/2024] Open
Abstract
BACKGROUND Mental stress and its consequent mental health disorders (MDs) constitute a significant public health issue. With the advent of machine learning (ML), there is potential to harness computational techniques for better understanding and addressing mental stress and MDs. This comprehensive review seeks to elucidate the current ML methodologies used in this domain to pave the way for enhanced detection, prediction, and analysis of mental stress and its subsequent MDs. OBJECTIVE This review aims to investigate the scope of ML methodologies used in the detection, prediction, and analysis of mental stress and its consequent MDs. METHODS Using a rigorous scoping review process with PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, this investigation delves into the latest ML algorithms, preprocessing techniques, and data types used in the context of stress and stress-related MDs. RESULTS A total of 98 peer-reviewed publications were examined for this review. The findings highlight that support vector machine, neural network, and random forest models consistently exhibited superior accuracy and robustness among all ML algorithms examined. Physiological parameters such as heart rate measurements and skin response are prevalently used as stress predictors due to their rich explanatory information concerning stress and stress-related MDs, as well as the relative ease of data acquisition. The application of dimensionality reduction techniques, including mappings, feature selection, filtering, and noise reduction, is frequently observed as a crucial step preceding the training of ML algorithms. CONCLUSIONS The synthesis of this review identified significant research gaps and outlines future directions for the field. These encompass areas such as model interpretability, model personalization, the incorporation of naturalistic settings, and real-time processing capabilities for the detection and prediction of stress and stress-related MDs.
Collapse
Affiliation(s)
- Moein Razavi
- Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX, United States
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX, United States
| | - Samira Ziyadidegan
- Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX, United States
| | - Ahmadreza Mahmoudzadeh
- Zachry Department of Civil and Environmental Engineering, Texas A&M University, College Station, TX, United States
| | - Saber Kazeminasab
- Harvard Medical School, Harvard University, Boston, MA, United States
| | - Elaheh Baharlouei
- Department of Computer Science, University of Houston, Houston, TX, United States
| | - Vahid Janfaza
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX, United States
| | - Reza Jahromi
- Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX, United States
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX, United States
| | - Farzan Sasangohar
- Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX, United States
| |
Collapse
|
3
|
Ying G, Perez-Lao A, Adrien T, Maraganore D, Marra D, Smith G. TICS-M scores in an oldest-old normative cohort identified by computable phenotype. Clin Neuropsychol 2024:1-12. [PMID: 38997666 DOI: 10.1080/13854046.2024.2374894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Accepted: 06/27/2024] [Indexed: 07/14/2024]
Abstract
Objective: To (1) examine the distribution of Telephone Interview for Cognitive Status modified (TICS-m) scores in oldest-old individuals (age 85 and above) identified as cognitively healthy by a previously validated electronic health records-based computable phenotype (CP) and (2) to compare different cutoff scores for cognitive impairment in this population. Method: CP identified 24,024 persons, 470 were contacted and 252 consented and completed the assessment. Associations of TICS-m score with age, sex, and educational categories (<10 years, 11-15 years, and >16 years) were examined. The number of participants perceived as impaired was studied with commonly used cutoff scores (27-31). Results: TICS-m score ranged from 18 to 44 with a mean of 32.6 (SD = 4.7) in older adults aged 85-99 years old. A linear regression model including (range-restricted) age, education, and sex, showed beta estimates comparable to previous findings. Different cutoff scores (27 to 31) generated slightly lower MCI and dementia prevalence rates of participants meeting the criteria for the impairments than studies of younger elderly using traditional recruitment methods. Conclusions: The use of validated computable phenotype to identify a normative cohort generated a normative distribution for the TICS-m consistent with prior findings from more effortful approaches to cohort identification and established expected TICS-m performance in the oldest-old population.
Collapse
Affiliation(s)
- Gelan Ying
- Department of Clinical and Health Psychology, University of Florida, Gainesville, FL, USA
| | - Ambar Perez-Lao
- Department of Clinical and Health Psychology, University of Florida, Gainesville, FL, USA
| | - Tamare Adrien
- Department of Clinical and Health Psychology, University of Florida, Gainesville, FL, USA
| | - Demetrius Maraganore
- Department of Neurology, Tulane University School of Medicine, New Orleans, LA, USA
| | - David Marra
- VA Boston Health Care System, Boston, MA, USA
| | - Glenn Smith
- Department of Clinical and Health Psychology, University of Florida, Gainesville, FL, USA
| |
Collapse
|
4
|
Maier A, Hartung M, Abovsky M, Adamowicz K, Bader G, Baier S, Blumenthal D, Chen J, Elkjaer M, Garcia-Hernandez C, Helmy M, Hoffmann M, Jurisica I, Kotlyar M, Lazareva O, Levi H, List M, Lobentanzer S, Loscalzo J, Malod-Dognin N, Manz Q, Matschinske J, Mee M, Oubounyt M, Pastrello C, Pico A, Pillich R, Poschenrieder J, Pratt D, Pržulj N, Sadegh S, Saez-Rodriguez J, Sarkar S, Shaked G, Shamir R, Trummer N, Turhan U, Wang RS, Zolotareva O, Baumbach J. Drugst.One - a plug-and-play solution for online systems medicine and network-based drug repurposing. Nucleic Acids Res 2024; 52:W481-W488. [PMID: 38783119 PMCID: PMC11223884 DOI: 10.1093/nar/gkae388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/08/2024] [Accepted: 04/29/2024] [Indexed: 05/25/2024] Open
Abstract
In recent decades, the development of new drugs has become increasingly expensive and inefficient, and the molecular mechanisms of most pharmaceuticals remain poorly understood. In response, computational systems and network medicine tools have emerged to identify potential drug repurposing candidates. However, these tools often require complex installation and lack intuitive visual network mining capabilities. To tackle these challenges, we introduce Drugst.One, a platform that assists specialized computational medicine tools in becoming user-friendly, web-based utilities for drug repurposing. With just three lines of code, Drugst.One turns any systems biology software into an interactive web tool for modeling and analyzing complex protein-drug-disease networks. Demonstrating its broad adaptability, Drugst.One has been successfully integrated with 21 computational systems medicine tools. Available at https://drugst.one, Drugst.One has significant potential for streamlining the drug discovery process, allowing researchers to focus on essential aspects of pharmaceutical treatment research.
Collapse
Affiliation(s)
- Andreas Maier
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Michael Hartung
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Mark Abovsky
- Division of Orthopaedic Surgery, Schroeder Arthritis Institute, Toronto, Canada
- Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, Toronto, ON M5T 0S8, Canada
| | - Klaudia Adamowicz
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Gary D Bader
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada
| | - Sylvie Baier
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - David B Blumenthal
- Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander University Erlangen-Nürnberg (FAU), 91052 Erlangen, Germany
| | - Jing Chen
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Maria L Elkjaer
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Department of Neurology, Odense University Hospital, Odense, Denmark
- Institute of Clinical Research, University of Southern Denmark, Odense, Denmark
- Institute of Molecular Medicine, University of Southern Denmark, Odense, Denmark
| | | | - Mohamed Helmy
- Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan, Canada
- School of Public Health, University of Saskatchewan, Canada
- Department of Computer Science, University of Saskatchewan, Canada
- Department of Computer Science, Lakehead University, Canada
- Department of Computer Science, Idaho State University, USA
- Bioinformatics Institute (BII), A*STAR, Singapore
| | - Markus Hoffmann
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
- Institute for Advanced Study, Technical University of Munich, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, USA
| | - Igor Jurisica
- Division of Orthopaedic Surgery, Schroeder Arthritis Institute, Toronto, Canada
- Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, Toronto, ON M5T 0S8, Canada
- Departments of Medical Biophysics and Computer Science, University of Toronto, Toronto, Canada
- Institute of Neuroimmunology, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Max Kotlyar
- Division of Orthopaedic Surgery, Schroeder Arthritis Institute, Toronto, Canada
- Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, Toronto, ON M5T 0S8, Canada
| | - Olga Lazareva
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Junior Clinical Cooperation Unit Multiparametric methods for early detection of prostate cancer, German Cancer Research Center (DKFZ), Heidelberg, Germany
- European Molecular Biology Laboratory, Genome Biology Unit, 69117 Heidelberg, Germany
| | - Hagai Levi
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Markus List
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Sebastian Lobentanzer
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Joseph Loscalzo
- Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | | | - Quirin Manz
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Julian Matschinske
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Miles Mee
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Mhaned Oubounyt
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Chiara Pastrello
- Division of Orthopaedic Surgery, Schroeder Arthritis Institute, Toronto, Canada
- Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, Toronto, ON M5T 0S8, Canada
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, 1650 Owens Street, San Francisco, 94158 California, USA
| | - Rudolf T Pillich
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Julian M Poschenrieder
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Dexter Pratt
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Nataša Pržulj
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- Department of Computer Science, University College London, London WC1E 6BT, UK
- ICREA, Pg. Lluís Companys 23, 08010 Barcelona, Spain
| | - Sepideh Sadegh
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
- Department of Clinical Genetics, Odense University Hospital, Odense, Denmark
- Clinical Genome Center, Department of Clinical Research, University of Southern Denmark, Odense, Denmark
| | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Suryadipto Sarkar
- Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander University Erlangen-Nürnberg (FAU), 91052 Erlangen, Germany
| | - Gideon Shaked
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Nico Trummer
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Ugur Turhan
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Rui-Sheng Wang
- Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Olga Zolotareva
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Computational Biomedicine Lab, Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
5
|
Farag N, Noë A, Patrinos D, Zawati MH. Mapping the Apps: Ethical and Legal Issues with Crowdsourced Smartphone Data using mHealth Applications. Asian Bioeth Rev 2024; 16:437-470. [PMID: 39022376 PMCID: PMC11250705 DOI: 10.1007/s41649-024-00296-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/03/2024] [Accepted: 04/14/2024] [Indexed: 07/20/2024] Open
Abstract
More than 5 billion people in the world own a smartphone. More than half of these have been used to collect and process health-related data. As such, the existing volume of potentially exploitable health data is unprecedentedly large and growing rapidly. Mobile health applications (apps) on smartphones are some of the worst offenders and are increasingly being used for gathering and exchanging significant amounts of personal health data from the public. This data is often utilized for health research purposes and for algorithm training. While there are advantages to utilizing this data for expanding health knowledge, there are associated risks for the users of these apps, such as privacy concerns and the protection of their data. Consequently, gaining a deeper comprehension of how apps collect and crowdsource data is crucial. To explore how apps are crowdsourcing data and to identify potential ethical, legal, and social issues (ELSI), we conducted an examination of the Apple App Store and the Google Play Store in North America and Europe to identify apps that could potentially gather health data through crowdsourcing. Subsequently, we analyzed their privacy policies, terms of use, and other related documentation to gain insights into the utilization of users' data and the possibility of repurposing it for research or algorithm training purposes. More specifically, we reviewed privacy policies to identify clauses pertaining to the following key categories: research, data sharing, privacy/confidentiality, commercialization, and return of findings. Based on the results of these app search, we developed an App Atlas that presents apps which crowdsource data for research or algorithm training. We identified 46 apps available in the European and Canadian markets that either openly crowdsource health data for research or algorithm training or retain the legal or technical capability to do so. This app search showed an overall lack of consistency and transparency in privacy policies that poses challenges to user comprehensibility, trust, and informed consent. A significant proportion of applications presented contradictions or exhibited considerable ambiguity. For instance, the vast majority of privacy policies in the App Atlas contain ambiguous or contradictory language regarding the sharing of users' data with third parties. This raises a number of ethico-legal concerns which will require further academic and policy attention to ensure a balance between protecting individual interests and maximizing the scientific utility of crowdsourced data. This article represents a key first step in better understanding these concerns and bringing attention to this important issue. Supplementary Information The online version contains supplementary material available at 10.1007/s41649-024-00296-3.
Collapse
Affiliation(s)
- Nada Farag
- Centre of Genomics and Policy, McGill University, Montreal, Canada
| | - Alycia Noë
- Centre of Genomics and Policy, McGill University, Montreal, Canada
| | - Dimitri Patrinos
- Centre of Genomics and Policy, McGill University, Montreal, Canada
| | - Ma’n H. Zawati
- Centre of Genomics and Policy, McGill University, Montreal, Canada
| |
Collapse
|
6
|
Lyu C, Joehanes R, Huan T, Levy D, Li Y, Wang M, Liu X, Liu C, Ma J. Enhancing selection of alcohol consumption-associated genes by random forest. Br J Nutr 2024; 131:2058-2067. [PMID: 38606596 PMCID: PMC11216877 DOI: 10.1017/s0007114524000795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/13/2024]
Abstract
Machine learning methods have been used in identifying omics markers for a variety of phenotypes. We aimed to examine whether a supervised machine learning algorithm can improve identification of alcohol-associated transcriptomic markers. In this study, we analysed array-based, whole-blood derived expression data for 17 873 gene transcripts in 5508 Framingham Heart Study participants. By using the Boruta algorithm, a supervised random forest (RF)-based feature selection method, we selected twenty-five alcohol-associated transcripts. In a testing set (30 % of entire study participants), AUC (area under the receiver operating characteristics curve) of these twenty-five transcripts were 0·73, 0·69 and 0·66 for non-drinkers v. moderate drinkers, non-drinkers v. heavy drinkers and moderate drinkers v. heavy drinkers, respectively. The AUC of the selected transcripts by the Boruta method were comparable to those identified using conventional linear regression models, for example, AUC of 1958 transcripts identified by conventional linear regression models (false discovery rate < 0·2) were 0·74, 0·66 and 0·65, respectively. With Bonferroni correction for the twenty-five Boruta method-selected transcripts and three CVD risk factors (i.e. at P < 6·7e-4), we observed thirteen transcripts were associated with obesity, three transcripts with type 2 diabetes and one transcript with hypertension. For example, we observed that alcohol consumption was inversely associated with the expression of DOCK4, IL4R, and SORT1, and DOCK4 and SORT1 were positively associated with obesity, and IL4R was inversely associated with hypertension. In conclusion, using a supervised machine learning method, the RF-based Boruta algorithm, we identified novel alcohol-associated gene transcripts.
Collapse
Affiliation(s)
- Chenglin Lyu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA
- Department of Anatomy and Neurobiology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA
| | - Roby Joehanes
- Framingham Heart Study and Population Sciences Branch, NHLBI, Framingham, MA
| | - Tianxiao Huan
- Framingham Heart Study and Population Sciences Branch, NHLBI, Framingham, MA
| | - Daniel Levy
- Framingham Heart Study and Population Sciences Branch, NHLBI, Framingham, MA
| | - Yi Li
- Department of Biostatistics, Boston University School of Public Health, Boston, MA
| | - Mengyao Wang
- Department of Biostatistics, Boston University School of Public Health, Boston, MA
| | - Xue Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA
| | - Chunyu Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA
| | - Jiantao Ma
- Nutrition Epidemiology and Data Science, Friedman School of Nutrition Science and Policy, Tufts University, Boston, MA
| |
Collapse
|
7
|
Rubinic I, Kurtov M, Rubinic I, Likic R, Dargan PI, Wood DM. Artificial intelligence in clinical pharmacology: A case study and scoping review of large language models and bioweapon potential. Br J Clin Pharmacol 2024; 90:620-628. [PMID: 37658550 DOI: 10.1111/bcp.15899] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 08/23/2023] [Accepted: 08/24/2023] [Indexed: 09/03/2023] Open
Abstract
This paper aims to explore the possibility of employing large language models (LLMs) - a type of artificial intelligence (AI) - in clinical pharmacology, with a focus on its possible misuse in bioweapon development. Additionally, ethical considerations, legislation and potential risk reduction measures are analysed. The existing literature is reviewed to investigate the potential misuse of AI and LLMs in bioweapon creation. The search includes articles from PubMed, Scopus and Web of Science Core Collection that were identified using a specific protocol. To explore the regulatory landscape, the OECD.ai platform was used. The review highlights the dual-use vulnerability of AI and LLMs, with a focus on bioweapon development. Subsequently, a case study is used to illustrate the potential of AI manipulation resulting in harmful substance synthesis. Existing regulations inadequately address the ethical concerns tied to AI and LLMs. Mitigation measures are proposed, including technical solutions (explainable AI), establishing ethical guidelines through collaborative efforts, and implementing policy changes to create a comprehensive regulatory framework. The integration of AI and LLMs into clinical pharmacology presents invaluable opportunities, while also introducing significant ethical and safety considerations. Addressing the dual-use nature of AI requires robust regulations, as well as adopting a strategic approach grounded in technical solutions and ethical values following the principles of transparency, accountability and safety. Additionally, AI's potential role in developing countermeasures against novel hazardous substances is underscored. By adopting a proactive approach, the potential benefits of AI and LLMs can be fully harnessed while minimizing the associated risks.
Collapse
Affiliation(s)
- Igor Rubinic
- University of Rijeka School of Medicine, Rijeka, Croatia
- Clinical Hospital Centre Rijeka, Rijeka, Croatia
| | | | - Ivan Rubinic
- School of Engineering, University of Rijeka, Rijeka, Croatia
| | - Robert Likic
- University of Zagreb School of Medicine, Zagreb, Croatia
- Clinical Hospital Centre Zagreb, Zagreb, Croatia
| | - Paul I Dargan
- Faculty of Life Sciences and Medicine, King's College London, London, UK
- Clinical Toxicology, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - David M Wood
- Faculty of Life Sciences and Medicine, King's College London, London, UK
- Clinical Toxicology, Guy's and St Thomas' NHS Foundation Trust, London, UK
| |
Collapse
|
8
|
Bhuvaneshwar K, Gusev Y. Translational bioinformatics and data science for biomarker discovery in mental health: an analytical review. Brief Bioinform 2024; 25:bbae098. [PMID: 38493340 PMCID: PMC10944574 DOI: 10.1093/bib/bbae098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 01/23/2024] [Accepted: 02/18/2024] [Indexed: 03/18/2024] Open
Abstract
Translational bioinformatics and data science play a crucial role in biomarker discovery as it enables translational research and helps to bridge the gap between the bench research and the bedside clinical applications. Thanks to newer and faster molecular profiling technologies and reducing costs, there are many opportunities for researchers to explore the molecular and physiological mechanisms of diseases. Biomarker discovery enables researchers to better characterize patients, enables early detection and intervention/prevention and predicts treatment responses. Due to increasing prevalence and rising treatment costs, mental health (MH) disorders have become an important venue for biomarker discovery with the goal of improved patient diagnostics, treatment and care. Exploration of underlying biological mechanisms is the key to the understanding of pathogenesis and pathophysiology of MH disorders. In an effort to better understand the underlying mechanisms of MH disorders, we reviewed the major accomplishments in the MH space from a bioinformatics and data science perspective, summarized existing knowledge derived from molecular and cellular data and described challenges and areas of opportunities in this space.
Collapse
Affiliation(s)
- Krithika Bhuvaneshwar
- Innovation Center for Biomedical Informatics (ICBI), Georgetown University, Washington DC, 20007, USA
| | - Yuriy Gusev
- Innovation Center for Biomedical Informatics (ICBI), Georgetown University, Washington DC, 20007, USA
| |
Collapse
|
9
|
Bernier A, Knoppers BM, Bermudez P, Beauvais MJS, Thorogood A. Open Data governance at the Canadian Open Neuroscience Platform (CONP): From the Walled Garden to the Arboretum. Gigascience 2024; 13:giad114. [PMID: 38217404 PMCID: PMC10787360 DOI: 10.1093/gigascience/giad114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 11/14/2023] [Accepted: 12/10/2023] [Indexed: 01/15/2024] Open
Abstract
Scientific research communities pursue dual imperatives in implementing strategies to share their data. These communities attempt to maximize the accessibility of biomedical data for downstream research use, in furtherance of open science objectives. Simultaneously, such communities safeguard the interests of research participants through data stewardship measures and the integration of suitable risk disclosures to the informed consent process. The Canadian Open Neuroscience Platform (CONP) convened an Ethics and Governance Committee composed of experts in bioethics, neuroethics, and law to develop holistic policy tools, organizational approaches, and technological supports to align the open governance of data with ethical and legal norms. The CONP has adopted novel platform governance methods that favor full data openness, legitimated through the use of robust deidentification processes and informed consent practices. The experience of the CONP is articulated as a potential template for other open science efforts to further build upon. This experience highlights informed consent guidance, deidentification practices, ethicolegal metadata, platform-level norms, and commercialization and publication policies as the principal pillars of a practicable approach to the governance of open data. The governance approach adopted by the CONP stands as a viable model for the broader neuroscience and open science communities to adopt for sharing data in full open access.
Collapse
Affiliation(s)
- Alexander Bernier
- Centre of Genomics and Policy, Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, 740, Dr Penfield Ave, suite 5200, Montréal, Québec H3A 0G1, Canada
| | - Bartha M Knoppers
- Centre of Genomics and Policy, Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, 740, Dr Penfield Ave, suite 5200, Montréal, Québec H3A 0G1, Canada
| | - Patrick Bermudez
- McGill Centre for Integrative Neuroscience, Montreal Neurological Institute, McGill University, Montréal, Québec H3A 2B4, Canada
| | - Michael J S Beauvais
- Faculty of Law, University of Toronto, Falconer Hall, 84 Queens Park, Toronto, Ontario M5S 2C5, Canada
| | - Adrian Thorogood
- The Terry Fox Research Institute, 110 Pine Ave W, Montreal, Quebec H2W IR7, Canada
| |
Collapse
|
10
|
Yang X, Huang K, Yang D, Zhao W, Zhou X. Biomedical Big Data Technologies, Applications, and Challenges for Precision Medicine: A Review. GLOBAL CHALLENGES (HOBOKEN, NJ) 2024; 8:2300163. [PMID: 38223896 PMCID: PMC10784210 DOI: 10.1002/gch2.202300163] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 09/20/2023] [Indexed: 01/16/2024]
Abstract
The explosive growth of biomedical Big Data presents both significant opportunities and challenges in the realm of knowledge discovery and translational applications within precision medicine. Efficient management, analysis, and interpretation of big data can pave the way for groundbreaking advancements in precision medicine. However, the unprecedented strides in the automated collection of large-scale molecular and clinical data have also introduced formidable challenges in terms of data analysis and interpretation, necessitating the development of novel computational approaches. Some potential challenges include the curse of dimensionality, data heterogeneity, missing data, class imbalance, and scalability issues. This overview article focuses on the recent progress and breakthroughs in the application of big data within precision medicine. Key aspects are summarized, including content, data sources, technologies, tools, challenges, and existing gaps. Nine fields-Datawarehouse and data management, electronic medical record, biomedical imaging informatics, Artificial intelligence-aided surgical design and surgery optimization, omics data, health monitoring data, knowledge graph, public health informatics, and security and privacy-are discussed.
Collapse
Affiliation(s)
- Xue Yang
- Department of Pancreatic Surgery and West China Biomedical Big Data CenterWest China HospitalSichuan UniversityChengdu610041China
| | - Kexin Huang
- Department of Pancreatic Surgery and West China Biomedical Big Data CenterWest China HospitalSichuan UniversityChengdu610041China
| | - Dewei Yang
- College of Advanced Manufacturing EngineeringChongqing University of Posts and TelecommunicationsChongqingChongqing400000China
| | - Weiling Zhao
- Center for Systems MedicineSchool of Biomedical InformaticsUTHealth at HoustonHoustonTX77030USA
| | - Xiaobo Zhou
- Center for Systems MedicineSchool of Biomedical InformaticsUTHealth at HoustonHoustonTX77030USA
| |
Collapse
|
11
|
Alizadeh M, Sampaio Moura N, Schledwitz A, Patil SA, Ravel J, Raufman JP. Gastroenterology Fellowship and Postdoctoral Training in Omics and Statistics-Part I: Why Is It Needed? Dig Dis Sci 2024; 69:18-21. [PMID: 37919514 PMCID: PMC10878129 DOI: 10.1007/s10620-023-08136-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 09/27/2023] [Indexed: 11/04/2023]
Abstract
A multitude of federally and industry-funded efforts are underway to generate and collect human, animal, microbial, and other sources of data on an unprecedented scale; the results are commonly referred to as "big data." Often vaguely defined, big data refers to large and complex datasets consisting of myriad datatypes that can be integrated to address complex questions. Big data offers a wealth of information that can be accessed only by those who pose the right questions and have sufficient technical knowhow and analytical skills. The intersection comprised of the gut-brain axis, the intestinal microbiome and multi-ome, and several other interconnected organ systems poses particular challenges and opportunities for those engaged in gastrointestinal and liver research. Unfortunately, there is currently a shortage of clinicians, scientists, and physician-scientists with the training needed to use and analyze big data at the scale necessary for widespread implementation of precision medicine. Here, we review the importance of training in the use of big data, the perils of insufficient training, and potential solutions that exist or can be developed to address the dearth of individuals in GI and hepatology research with the necessary level of big data expertise.
Collapse
Affiliation(s)
- Madeline Alizadeh
- The Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 20201, USA
| | - Natalia Sampaio Moura
- Division of Gastroenterology and Hepatology, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Alyssa Schledwitz
- Division of Gastroenterology and Hepatology, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Seema A Patil
- Division of Gastroenterology and Hepatology, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Jacques Ravel
- The Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 20201, USA
| | - Jean-Pierre Raufman
- Division of Gastroenterology and Hepatology, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
- VA Maryland Healthcare System, Baltimore, MD, 21201, USA.
- Marlene and Stewart Greenebaum Cancer Center, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
- Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
| |
Collapse
|
12
|
Bergman DR, Norton KA, Jain HV, Jackson T. Connecting Agent-Based Models with High-Dimensional Parameter Spaces to Multidimensional Data Using SMoRe ParS: A Surrogate Modeling Approach. Bull Math Biol 2023; 86:11. [PMID: 38159216 PMCID: PMC10757706 DOI: 10.1007/s11538-023-01240-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 11/22/2023] [Indexed: 01/03/2024]
Abstract
Across a broad range of disciplines, agent-based models (ABMs) are increasingly utilized for replicating, predicting, and understanding complex systems and their emergent behavior. In the biological and biomedical sciences, researchers employ ABMs to elucidate complex cellular and molecular interactions across multiple scales under varying conditions. Data generated at these multiple scales, however, presents a computational challenge for robust analysis with ABMs. Indeed, calibrating ABMs remains an open topic of research due to their own high-dimensional parameter spaces. In response to these challenges, we extend and validate our novel methodology, Surrogate Modeling for Reconstructing Parameter Surfaces (SMoRe ParS), arriving at a computationally efficient framework for connecting high dimensional ABM parameter spaces with multidimensional data. Specifically, we modify SMoRe ParS to initially confine high dimensional ABM parameter spaces using unidimensional data, namely, single time-course information of in vitro cancer cell growth assays. Subsequently, we broaden the scope of our approach to encompass more complex ABMs and constrain parameter spaces using multidimensional data. We explore this extension with in vitro cancer cell inhibition assays involving the chemotherapeutic agent oxaliplatin. For each scenario, we validate and evaluate the effectiveness of our approach by comparing how well ABM simulations match the experimental data when using SMoRe ParS-inferred parameters versus parameters inferred by a commonly used direct method. In so doing, we show that our approach of using an explicitly formulated surrogate model as an interlocutor between the ABM and the experimental data effectively calibrates the ABM parameter space to multidimensional data. Our method thus provides a robust and scalable strategy for leveraging multidimensional data to inform multiscale ABMs and explore the uncertainty in their parameters.
Collapse
Affiliation(s)
- Daniel R Bergman
- Department of Mathematics, University of Michigan, 530 Church Street, Ann Arbor, MI, 48109, USA
| | - Kerri-Ann Norton
- Computational Biology Laboratory, Computer Science Program, Bard College, 30 Campus Road, Annandale-on-Hudson, NY, 12504, USA
| | - Harsh Vardhan Jain
- Department of Mathematics & Statistics, University of Minnesota Duluth, 1117 University Drive, Duluth, MN, 55812, USA
| | - Trachette Jackson
- Department of Mathematics, University of Michigan, 530 Church Street, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
13
|
Suarjana IWG, Sudirham, Salam I, Aditama MHR. Artificial intelligence in public health: the potential and ethical considerations of artificial intelligence in public health. J Public Health (Oxf) 2023; 45:e834-e835. [PMID: 37477239 DOI: 10.1093/pubmed/fdad116] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 06/27/2023] [Indexed: 07/22/2023] Open
Affiliation(s)
- I Wayan Gede Suarjana
- Department of Public Health, Universitas Negeri Manado, Tondano, North Sulawesi 95618, Indonesia
| | - Sudirham
- Department of Public Health, Universitas Negeri Manado, Tondano, North Sulawesi 95618, Indonesia
| | - Ilham Salam
- Department of Public Health, Universitas Negeri Manado, Tondano, North Sulawesi 95618, Indonesia
| | - Mint Husen Raya Aditama
- Department of Guidance and Counseling, Universitas Negeri Manado, Tondano, North Sulawesi 95618, Indonesia
| |
Collapse
|
14
|
Jönsson H, Ahlström H, Kullberg J. Spatial mapping of tumor heterogeneity in whole-body PET-CT: a feasibility study. Biomed Eng Online 2023; 22:110. [PMID: 38007471 PMCID: PMC10675915 DOI: 10.1186/s12938-023-01173-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 11/17/2023] [Indexed: 11/27/2023] Open
Abstract
BACKGROUND Tumor heterogeneity is recognized as a predictor of treatment response and patient outcome. Quantification of tumor heterogeneity across all scales may therefore provide critical insight that ultimately improves cancer management. METHODS An image registration-based framework for the study of tumor heterogeneity in whole-body images was evaluated on a dataset of 490 FDG-PET-CT images of lung cancer, lymphoma, and melanoma patients. Voxel-, lesion- and subject-level features were extracted from the subjects' segmented lesion masks and mapped to female and male template spaces for voxel-wise analysis. Resulting lesion feature maps of the three subsets of cancer patients were studied visually and quantitatively. Lesion volumes and lesion distances in subject spaces were compared with resulting properties in template space. The strength of the association between subject and template space for these properties was evaluated with Pearson's correlation coefficient. RESULTS Spatial heterogeneity in terms of lesion frequency distribution in the body, metabolic activity, and lesion volume was seen between the three subsets of cancer patients. Lesion feature maps showed anatomical locations with low versus high mean feature value among lesions sampled in space and also highlighted sites with high variation between lesions in each cancer subset. Spatial properties of the lesion masks in subject space correlated strongly with the same properties measured in template space (lesion volume, R = 0.986, p < 0.001; total metabolic volume, R = 0.988, p < 0.001; maximum within-patient lesion distance, R = 0.997, p < 0.001). Lesion volume and total metabolic volume increased on average from subject to template space (lesion volume, 3.1 ± 52 ml; total metabolic volume, 53.9 ± 229 ml). Pair-wise lesion distance decreased on average by 0.1 ± 1.6 cm and maximum within-patient lesion distance increased on average by 0.5 ± 2.1 cm from subject to template space. CONCLUSIONS Spatial tumor heterogeneity between subsets of interest in cancer cohorts can successfully be explored in whole-body PET-CT images within the proposed framework. Whole-body studies are, however, especially prone to suffer from regional variation in lesion frequency, and thus statistical power, due to the non-uniform distribution of lesions across a large field of view.
Collapse
Affiliation(s)
- Hanna Jönsson
- Section of Radiology, Department of Surgical Sciences, Uppsala University, 751 85, Uppsala, Sweden.
| | - Håkan Ahlström
- Section of Radiology, Department of Surgical Sciences, Uppsala University, 751 85, Uppsala, Sweden
- Antaros Medical AB, BioVenture Hub, 431 53, Mölndal, Sweden
| | - Joel Kullberg
- Section of Radiology, Department of Surgical Sciences, Uppsala University, 751 85, Uppsala, Sweden
- Antaros Medical AB, BioVenture Hub, 431 53, Mölndal, Sweden
| |
Collapse
|
15
|
Zass L, Johnston K, Benkahla A, Chaouch M, Kumuthini J, Radouani F, Mwita LA, Alsayed N, Allie T, Sathan D, Masamu U, Seuneu Tchamga MS, Tamuhla T, Samtal C, Nembaware V, Gill Z, Ahmed S, Hamdi Y, Fadlelmola F, Tiffin N, Mulder N. Developing Clinical Phenotype Data Collection Standards for Research in Africa. Glob Health Epidemiol Genom 2023; 2023:6693323. [PMID: 37766808 PMCID: PMC10522421 DOI: 10.1155/2023/6693323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 06/30/2023] [Accepted: 07/21/2023] [Indexed: 09/29/2023] Open
Abstract
Modern biomedical research is characterised by its high-throughput and interdisciplinary nature. Multiproject and consortium-based collaborations requiring meaningful analysis of multiple heterogeneous phenotypic datasets have become the norm; however, such analysis remains a challenge in many regions across the world. An increasing number of data harmonisation efforts are being undertaken by multistudy collaborations through either prospective standardised phenotype data collection or retrospective phenotype harmonisation. In this regard, the Phenotype Harmonisation Working Group (PHWG) of the Human Heredity and Health in Africa (H3Africa) consortium aimed to facilitate phenotype standardisation by both promoting the use of existing data collection standards (hosted by PhenX), adapting existing data collection standards for appropriate use in low- and middle-income regions such as Africa, and developing novel data collection standards where relevant gaps were identified. Ultimately, the PHWG produced 11 data collection kits, consisting of 82 protocols, 38 of which were existing protocols, 17 were adapted, and 27 were novel protocols. The data collection kits will facilitate phenotype standardisation and harmonisation not only in Africa but also across the larger research community. In addition, the PHWG aims to feed back adapted and novel protocols to existing reference platforms such as PhenX.
Collapse
Affiliation(s)
- Lyndon Zass
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa
| | - Katherine Johnston
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa
| | - Alia Benkahla
- Laboratory of BioInformatics, BioMathematics and BioStatistics LR16IPT09, Institut Pasteur de Tunis, Tunis, Tunisia
| | - Melek Chaouch
- Laboratory of BioInformatics, BioMathematics and BioStatistics LR16IPT09, Institut Pasteur de Tunis, Tunis, Tunisia
| | - Judit Kumuthini
- South African National Bioinformatics Institute (SANBI), Life Sciences Building, University of Western Cape, Bellville, Cape Town, South Africa
| | - Fouzia Radouani
- Chlamydiae & Mycoplasmas Laboratory Research Department, Institut Pasteur du Maroc, 20360 Casablanca, Morocco
| | - Liberata Alexander Mwita
- Muhimbili Sickle Cell Program, Department of Hematology and Blood Transfusion, Muhimbili University of Health and Allied Sciences, Dar-es-Salaam, Tanzania
| | - Nihad Alsayed
- Kush Centre for Genomics & Biomedical Informatics, Biotechnology Perspectives Organization, Khartoum 11111, Sudan
| | - Taryn Allie
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa
| | - Dassen Sathan
- Software Information Systems Department, FOICDT, University of Mauritius, Reduit, Mauritius
| | - Upendo Masamu
- Muhimbili Sickle Cell Program, Department of Hematology and Blood Transfusion, Muhimbili University of Health and Allied Sciences, Dar-es-Salaam, Tanzania
| | | | - Tsaone Tamuhla
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa
| | - Chaimae Samtal
- Laboratory of Biotechnology, Environment, Agri-Food and Health, Faculty of Sciences Dhar El Mahraz-Sidi Mohammed Ben Abdellah University, Fez 30000, Morocco
| | - Victoria Nembaware
- Division of Human Genetics, Department of Pathology, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Zoe Gill
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa
- Department of Molecular Biology, Johannes Gutenberg University, Mainz, Germany
| | - Samah Ahmed
- Kush Centre for Genomics & Biomedical Informatics, Biotechnology Perspectives Organization, Khartoum 11111, Sudan
| | - Yosr Hamdi
- Laboratory of Biomedical Genomics and Oncogenetics, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
- Laboratory of Human and Experimental Pathology, Institut Pasteur de Tunis, Tunis, Tunisia
| | - Faisal Fadlelmola
- Kush Centre for Genomics & Biomedical Informatics, Biotechnology Perspectives Organization, Khartoum 11111, Sudan
| | - Nicki Tiffin
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa
- South African National Bioinformatics Institute (SANBI), Life Sciences Building, University of Western Cape, Bellville, Cape Town, South Africa
- Wellcome Centre for Infectious Disease Research in Africa, Institute of Infectious Diseases and Molecular Medicine, Faculty of Cape Town, University of Cape Town, Cape Town, South Africa
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa
- Wellcome Centre for Infectious Disease Research in Africa, Institute of Infectious Diseases and Molecular Medicine, Faculty of Cape Town, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
16
|
Sinha K, Ghosh N, Sil PC. A Review on the Recent Applications of Deep Learning in Predictive Drug Toxicological Studies. Chem Res Toxicol 2023; 36:1174-1205. [PMID: 37561655 DOI: 10.1021/acs.chemrestox.2c00375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
Drug toxicity prediction is an important step in ensuring patient safety during drug design studies. While traditional preclinical studies have historically relied on animal models to evaluate toxicity, recent advances in deep-learning approaches have shown great promise in advancing drug safety science and reducing animal use in preclinical studies. However, deep-learning-based approaches also face challenges in handling large biological data sets, model interpretability, and regulatory acceptance. In this review, we provide an overview of recent developments in deep-learning-based approaches for predicting drug toxicity, highlighting their potential advantages over traditional methods and the need to address their limitations. Deep-learning models have demonstrated excellent performance in predicting toxicity outcomes from various data sources such as chemical structures, genomic data, and high-throughput screening assays. The potential of deep learning for automated feature engineering is also discussed. This review emphasizes the need to address ethical concerns related to the use of deep learning in drug toxicity studies, including the reduction of animal use and ensuring regulatory acceptance. Furthermore, emerging applications of deep learning in drug toxicity prediction, such as predicting drug-drug interactions and toxicity in rare subpopulations, are highlighted. The integration of deep-learning-based approaches with traditional methods is discussed as a way to develop more reliable and efficient predictive models for drug safety assessment, paving the way for safer and more effective drug discovery and development. Overall, this review highlights the critical role of deep learning in predictive toxicology and drug safety evaluation, emphasizing the need for continued research and development in this rapidly evolving field. By addressing the limitations of traditional methods, leveraging the potential of deep learning for automated feature engineering, and addressing ethical concerns, deep-learning-based approaches have the potential to revolutionize drug toxicity prediction and improve patient safety in drug discovery and development.
Collapse
Affiliation(s)
- Krishnendu Sinha
- Department of Zoology, Jhargram Raj College, Jhargram 721507, West Bengal, India
| | - Nabanita Ghosh
- Department of Zoology, Maulana Azad College, Kolkata 700013, West Bengal, India
| | - Parames C Sil
- Division of Molecular Medicine, Bose Institute, Kolkata 700054, West Bengal, India
| |
Collapse
|
17
|
Cunha FF, Blüml V, Zopf LM, Walter A, Wagner M, Weninger WJ, Thomaz LA, Tavora LMN, da Silva Cruz LA, Faria SMM. Lossy Image Compression in a Preclinical Multimodal Imaging Study. J Digit Imaging 2023; 36:1826-1850. [PMID: 37038039 PMCID: PMC10406799 DOI: 10.1007/s10278-023-00800-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Revised: 02/20/2023] [Accepted: 02/21/2023] [Indexed: 04/12/2023] Open
Abstract
The growing use of multimodal high-resolution volumetric data in pre-clinical studies leads to challenges related to the management and handling of the large amount of these datasets. Contrarily to the clinical context, currently there are no standard guidelines to regulate the use of image compression in pre-clinical contexts as a potential alleviation of this problem. In this work, the authors study the application of lossy image coding to compress high-resolution volumetric biomedical data. The impact of compression on the metrics and interpretation of volumetric data was quantified for a correlated multimodal imaging study to characterize murine tumor vasculature, using volumetric high-resolution episcopic microscopy (HREM), micro-computed tomography (µCT), and micro-magnetic resonance imaging (µMRI). The effects of compression were assessed by measuring task-specific performances of several biomedical experts who interpreted and labeled multiple data volumes compressed at different degrees. We defined trade-offs between data volume reduction and preservation of visual information, which ensured the preservation of relevant vasculature morphology at maximum compression efficiency across scales. Using the Jaccard Index (JI) and the average Hausdorff Distance (HD) after vasculature segmentation, we could demonstrate that, in this study, compression that yields to a 256-fold reduction of the data size allowed to keep the error induced by compression below the inter-observer variability, with minimal impact on the assessment of the tumor vasculature across scales.
Collapse
Affiliation(s)
- Francisco F. Cunha
- Instituto de Telecomunicações, Morro do Lena—Alto do Vieiro, Leiria, Portugal
- University of Coimbra, Coimbra, Portugal
| | - Valentin Blüml
- Vienna BioCenter Core Facilities GmbH, 1030 Vienna, Austria
| | - Lydia M. Zopf
- Ludwig Boltzmann Institute for Experimental and Clinical Traumatology Vienna, Vienna, Austria
| | - Andreas Walter
- Centre of Optical Technologies, Aalen University, Aalen, Germany
| | - Michael Wagner
- Institute of Applied Research, Aalen University, Aalen, Germany
| | - Wolfgang J. Weninger
- Division of Anatomy, Center for Anatomy & Cell Biology, Medical University of Vienna, Vienna, Austria
| | - Lucas A. Thomaz
- Instituto de Telecomunicações, Morro do Lena—Alto do Vieiro, Leiria, Portugal
- School of Technology and Management, Polytechnic of Leiria, Morro do Lena—Alto do Vieiro, Leiria, Portugal
| | - Luís M. N. Tavora
- Instituto de Telecomunicações, Morro do Lena—Alto do Vieiro, Leiria, Portugal
- School of Technology and Management, Polytechnic of Leiria, Morro do Lena—Alto do Vieiro, Leiria, Portugal
| | - Luis A. da Silva Cruz
- University of Coimbra, Coimbra, Portugal
- Department of Electrical and Computer Engineering, University of Coimbra, Coimbra, Portugal
- Instituto de, Telecomunicações University of Coimbra, Coimbra, Portugal
| | - Sergio M. M. Faria
- Instituto de Telecomunicações, Morro do Lena—Alto do Vieiro, Leiria, Portugal
- School of Technology and Management, Polytechnic of Leiria, Morro do Lena—Alto do Vieiro, Leiria, Portugal
| |
Collapse
|
18
|
Smith G, Miller A, Marra DE, Wu Y, Bian J, Maraganore DM, Anton S. Evaluation of a Computable Phenotype for Successful Cognitive Aging. Mayo Clin Proc Innov Qual Outcomes 2023; 7:212-221. [PMID: 37304063 PMCID: PMC10250575 DOI: 10.1016/j.mayocpiqo.2023.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023] Open
Abstract
Objective To establish, apply, and evaluate a computable phenotype for the recruitment of individuals with successful cognitive aging. Participants and Methods Interviews with 10 aging experts identified electronic health record (EHR)-available variables representing successful aging among individuals aged 85 years and older. On the basis of the identified variables, we developed a rule-based computable phenotype algorithm composed of 17 eligibility criteria. Starting September 1, 2019, we applied the computable phenotype algorithm to all living persons aged 85 years and older at the University of Florida Health, which identified 24,024 individuals. This sample was comprised of 13,841 (58%) women, 13,906 (58%) Whites, and 16,557 (69%) non-Hispanics. A priori permission to be contacted for research had been obtained for 11,898 individuals, of whom 470 responded to study announcements and 333 consented to evaluation. Then, we contacted those who consented to evaluate whether their cognitive and functional status clinically met out successful cognitive aging criteria of a modified Telephone Interview for Cognitive Status score of more than 27 and Geriatric Depression Scale of less than 6. The study was completed on December 31, 2022. Results Of the 45% of living persons aged 85 years and older included in the University of Florida Health EHR database identified by the computable phenotype as successfully aged, approximately 4% of these responded to study announcements and 333 consented, of which 218 (65%) met successful cognitive aging criteria through direct evaluation. Conclusion The study evaluated a computable phenotype algorithm for the recruitment of individuals for a successful aging study using large-scale EHRs. Our study provides proof of concept of using big data and informatics as aids for the recruitment of individuals for prospective cohort studies.
Collapse
Affiliation(s)
- Glenn Smith
- Department of Clinical and Health Psychology, University of Florida, Gainesville
| | - Amber Miller
- Department of Neurology, College of Medicine, University of Florida, Gainesville
| | - David E. Marra
- Department of Psychology, VA Boston Healthcare System, Boston, MA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville
| | | | - Stephen Anton
- Department of Clinical and Health Psychology, University of Florida, Gainesville
- Department of Physiology and Aging, University of Florida, Gainesville
| |
Collapse
|
19
|
Maier A, Hartung M, Abovsky M, Adamowicz K, Bader GD, Baier S, Blumenthal DB, Chen J, Elkjaer ML, Garcia-Hernandez C, Helmy M, Hoffmann M, Jurisica I, Kotlyar M, Lazareva O, Levi H, List M, Lobentanzer S, Loscalzo J, Malod-Dognin N, Manz Q, Matschinske J, Mee M, Oubounyt M, Pico AR, Pillich RT, Poschenrieder JM, Pratt D, Pržulj N, Sadegh S, Saez-Rodriguez J, Sarkar S, Shaked G, Shamir R, Trummer N, Turhan U, Wang R, Zolotareva O, Baumbach J. Drugst.One - A plug-and-play solution for online systems medicine and network-based drug repurposing. ARXIV 2023:arXiv:2305.15453v2. [PMID: 37332567 PMCID: PMC10274948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
In recent decades, the development of new drugs has become increasingly expensive and inefficient, and the molecular mechanisms of most pharmaceuticals remain poorly understood. In response, computational systems and network medicine tools have emerged to identify potential drug repurposing candidates. However, these tools often require complex installation and lack intuitive visual network mining capabilities. To tackle these challenges, we introduce Drugst.One, a platform that assists specialized computational medicine tools in becoming user-friendly, web-based utilities for drug repurposing. With just three lines of code, Drugst.One turns any systems biology software into an interactive web tool for modeling and analyzing complex protein-drug-disease networks. Demonstrating its broad adaptability, Drugst.One has been successfully integrated with 21 computational systems medicine tools. Available at https://drugst.one, Drugst.One has significant potential for streamlining the drug discovery process, allowing researchers to focus on essential aspects of pharmaceutical treatment research.
Collapse
Affiliation(s)
- Andreas Maier
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Michael Hartung
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Mark Abovsky
- Division of Orthopaedic Surgery, Schroeder Arthritis Institute, and Data Science Discovery Centre, Osteoarthritis Research Program, Krembil Research Institute, UHN, Toronto, Canada
- Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, 60 Leonard Avenue, 5KD-407, Toronto, ON, M5T 0S8, Canada
| | - Klaudia Adamowicz
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Gary D Bader
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada
| | - Sylvie Baier
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - David B Blumenthal
- Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander University Erlangen-Nürnberg (FAU), 91052 Erlangen, Germany
| | - Jing Chen
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Maria L Elkjaer
- Department of Neurology, Odense University Hospital, Odense, Denmark
- Institute of Clinical Research, University of Southern Denmark, Odense, Denmark
- Institute of Molecular Medicine, University of Southern Denmark, Odense, Denmark
| | | | - Mohamed Helmy
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Markus Hoffmann
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
- Institute for Advanced Study (Lichtenbergstrasse 2a, D-85748 Garching, Germany), Technical University of Munich, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
| | - Igor Jurisica
- Division of Orthopaedic Surgery, Schroeder Arthritis Institute, and Data Science Discovery Centre, Osteoarthritis Research Program, Krembil Research Institute, UHN, Toronto, Canada
- Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, 60 Leonard Avenue, 5KD-407, Toronto, ON, M5T 0S8, Canada
- Departments of Medical Biophysics and Computer Science, University of Toronto, Toronto, Canada
- Institute of Neuroimmunology, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Max Kotlyar
- Division of Orthopaedic Surgery, Schroeder Arthritis Institute, and Data Science Discovery Centre, Osteoarthritis Research Program, Krembil Research Institute, UHN, Toronto, Canada
- Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, 60 Leonard Avenue, 5KD-407, Toronto, ON, M5T 0S8, Canada
| | - Olga Lazareva
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Junior Clinical Cooperation Unit Multiparametric methods for early detection of prostate cancer, German Cancer Research Center (DKFZ), Heidelberg, Germany
- European Molecular Biology Laboratory, Genome Biology Unit, 69117 Heidelberg, Germany
| | - Hagai Levi
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Sebastian Lobentanzer
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Joseph Loscalzo
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | | | - Quirin Manz
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Julian Matschinske
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Miles Mee
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Mhaned Oubounyt
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, 1650 Owens Street, San Francisco, 94158, California, USA
| | - Rudolf T Pillich
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Julian M Poschenrieder
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Dexter Pratt
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Nataša Pržulj
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- Department of Computer Science, University College London, London WC1E 6BT, UK
- ICREA, Pg. Lluís Companys 23, 08010 Barcelona, Spain
| | - Sepideh Sadegh
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
- Department of Clinical Genetics, Odense University Hospital, Odense, Denmark
| | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Suryadipto Sarkar
- Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander University Erlangen-Nürnberg (FAU), 91052 Erlangen, Germany
| | - Gideon Shaked
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Nico Trummer
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Ugur Turhan
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Ruisheng Wang
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Olga Zolotareva
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Computational Biomedicine Lab, Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
20
|
Wang DC, Xu WD, Wang SN, Wang X, Leng W, Fu L, Liu XY, Qin Z, Huang AF. Lupus nephritis or not? A simple and clinically friendly machine learning pipeline to help diagnosis of lupus nephritis. Inflamm Res 2023:10.1007/s00011-023-01755-7. [PMID: 37300586 DOI: 10.1007/s00011-023-01755-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 05/17/2023] [Accepted: 05/30/2023] [Indexed: 06/12/2023] Open
Abstract
OBJECTIVE Diagnosis of lupus nephritis (LN) is a complex process, which usually requires renal biopsy. We aim to establish a machine learning pipeline to help diagnosis of LN. METHODS A cohort of 681 systemic lupus erythematosus (SLE) patients without LN and 786 SLE patients with LN was established, and a total of 95 clinical, laboratory data and 17 meteorological indicators were collected. After tenfold cross-validation, the patients were divided into training set and test set. The features selected by collective feature selection method of mutual information (MI) and multisurf were used to construct the models of logistic regression, decision tree, random forest, naive Bayes, support vector machine (SVM), light gradient boosting (LGB), extreme gradient boosting (XGB), and artificial neural network (ANN), the models were compared and verified in post-analysis. RESULTS Collective feature selection method screens out antistreptolysin (ASO), retinol binding protein (RBP), lupus anticoagulant 1 (LA1), LA2, proteinuria and other features, and the hyperparameter optimized XGB (ROC: AUC = 0.995; PRC: AUC = 1.000, APS = 1.000; balance accuracy: 0.990) has the best performance, followed by LGB (ROC: AUC = 0.992; PRC: AUC = 0.997, APS = 0.977; balance accuracy: 0.957). The worst performance is naive Bayes model (ROC: AUC = 0.799; PRC: AUC = 0.822, APS = 0.823; balance accuracy: 0.693). In the composite feature importance bar plots, ASO, RF, Up/Ucr, and other features play important roles in LN. CONCLUSION We developed and validated a new and simple machine learning pathway for diagnosis of LN, especially the XGB model based on ASO, LA1, LA2, proteinuria, and other features screened out by collective feature selection.
Collapse
Affiliation(s)
- Da-Cheng Wang
- Department of Evidence-Based Medicine, Southwest Medical University, 1 Xianglin Road, Luzhou, Sichuan, China
| | - Wang-Dong Xu
- Department of Evidence-Based Medicine, Southwest Medical University, 1 Xianglin Road, Luzhou, Sichuan, China
| | - Shen-Nan Wang
- Luzhou Meteorological Bureau, 3 Songshan Road, Luzhou, Sichuan, China
| | - Xiang Wang
- Luzhou Meteorological Bureau, 3 Songshan Road, Luzhou, Sichuan, China
| | - Wei Leng
- Luzhou Meteorological Bureau, 3 Songshan Road, Luzhou, Sichuan, China
| | - Lu Fu
- Laboratory Animal Center, Southwest Medical University, 1 Xianglin Road, Luzhou, Sichuan, China
| | - Xiao-Yan Liu
- Department of Evidence-Based Medicine, Southwest Medical University, 1 Xianglin Road, Luzhou, Sichuan, China
| | - Zhen Qin
- Department of Rheumatology and Immunology, Affiliated Hospital of Southwest Medical University, 25 Taiping Road, Luzhou, Sichuan, China
| | - An-Fang Huang
- Department of Rheumatology and Immunology, Affiliated Hospital of Southwest Medical University, 25 Taiping Road, Luzhou, Sichuan, China.
| |
Collapse
|
21
|
Yuan Q, Zhao WL, Qin B. Big data and variceal rebleeding prediction in cirrhosis patients. Artif Intell Gastroenterol 2023; 4:1-9. [DOI: 10.35712/aig.v4.i1.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Revised: 02/03/2023] [Accepted: 03/10/2023] [Indexed: 06/08/2023] Open
Abstract
Big data has convincing merits in developing risk stratification strategies for diseases. The 6 “V”s of big data, namely, volume, velocity, variety, veracity, value, and variability, have shown promise for real-world scenarios. Big data can be applied to analyze health data and advance research in preclinical biology, medicine, and especially disease initiation, development, and control. A study design comprises data selection, inclusion and exclusion criteria, standard confirmation and cohort establishment, follow-up strategy, and events of interest. The development and efficiency verification of a prognosis model consists of deciding the data source, taking previous models as references while selecting candidate predictors, assessing model performance, choosing appropriate statistical methods, and model optimization. The model should be able to inform disease development and outcomes, such as predicting variceal rebleeding in patients with cirrhosis. Our work has merits beyond those of other colleagues with respect to cirrhosis patient screening and data source regarding variceal bleeding.
Collapse
Affiliation(s)
- Quan Yuan
- Department of Gastroenterology, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400042, China
| | - Wen-Long Zhao
- College of Medical Informatics, Chongqing Medical University, Chongqing 400016, China
- Medical Data Science Academy, Chongqing 400016, China
- Chongqing Engineering Research Centre for Clinical Big-data and Drug Evaluation, Chongqing 400016, China
| | - Bo Qin
- Department of Infectious Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400042, China
| |
Collapse
|
22
|
Shi Y, Lin J, Zhu J, Gao J, Liu L, Yin M, Yu C, Liu X, Wang Y, Xu C. Predicting the Recurrence of Common Bile Duct Stones After ERCP Treatment with Automated Machine Learning Algorithms. Dig Dis Sci 2023:10.1007/s10620-023-07949-7. [PMID: 37160541 DOI: 10.1007/s10620-023-07949-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 09/26/2022] [Indexed: 05/11/2023]
Abstract
BACKGROUND Recurrence of common bile duct stones (CBDs) commonly happens after endoscopic retrograde cholangiopancreatography (ERCP). The clinical prediction models for the recurrence of CBDs after ERCP are lacking. AIMS We aim to develop high-performance prediction models for the recurrence of CBDS after ERCP treatment using automated machine learning (AutoML) and to assess the AutoML models versus the traditional regression models. METHODS 473 patients with CBDs undergoing ERCP were recruited in the single-center retrospective cohort study. Samples were divided into Training Set (65%) and Validation Set (35%) randomly. Three modeling approaches, including fully automated machine learning (Fully automated), semi-automated machine learning (Semi-automated), and traditional regression were applied to fit prediction models. Models' discrimination, calibration, and clinical benefits were examined. The Shapley additive explanations (SHAP), partial dependence plot (PDP), and SHAP local explanation (SHAPLE) were proposed for the interpretation of the best model. RESULTS The area under roc curve (AUROC) of semi-automated gradient boost machine (GBM) model was 0.749 in Validation Set, better than the other fully/semi-automated models and the traditional regression models (highest AUROC = 0.736). The calibration and clinical application of AutoML models were adequate. Through the SHAP-PDP-SHAPLE pipeline, the roles of key variables of the semi-automated GBM model were visualized. Lastly, the best model was deployed online for clinical practitioners. CONCLUSION The GBM model based on semi-AutoML is an optimal model to predict the recurrence of CBDs after ERCP treatment. In comparison with traditional regressions, AutoML algorithms present significant strengths in modeling, which show promise in future clinical practices.
Collapse
Affiliation(s)
- Yuqi Shi
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, 215000, China
| | - Jiaxi Lin
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, 215000, China
| | - Jinzhou Zhu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, 215000, China
| | - Jingwen Gao
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, 215000, China
| | - Lu Liu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, 215000, China
| | - Minyue Yin
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, 215000, China
| | - Chenyan Yu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, 215000, China
| | - Xiaolin Liu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China
- Suzhou Clinical Center of Digestive Diseases, Suzhou, 215000, China
| | - Yu Wang
- Department of General Surgery, Jintan Affiliated Hospital of Jiangsu University, Changzhou, 213200, China
| | - Chunfang Xu
- Department of Gastroenterology, The First Affiliated Hospital of Soochow University, Suzhou, 215000, China.
- Suzhou Clinical Center of Digestive Diseases, Suzhou, 215000, China.
| |
Collapse
|
23
|
Peng M, Southern DA, Ocampo W, Kaufman J, Hogan DB, Conly J, Baylis BW, Stelfox HT, Ho C, Ghali WA. Exploring data reduction strategies in the analysis of continuous pressure imaging technology. BMC Med Res Methodol 2023; 23:56. [PMID: 36859239 PMCID: PMC9976437 DOI: 10.1186/s12874-023-01875-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 02/21/2023] [Indexed: 03/03/2023] Open
Abstract
BACKGROUND Science is becoming increasingly data intensive as digital innovations bring new capacity for continuous data generation and storage. This progress also brings challenges, as many scientific initiatives are challenged by the shear volumes of data produced. Here we present a case study of a data intensive randomized clinical trial assessing the utility of continuous pressure imaging (CPI) for reducing pressure injuries. OBJECTIVE To explore an approach to reducing the amount of CPI data required for analyses to a manageable size without loss of critical information using a nested subset of pressure data. METHODS Data from four enrolled study participants excluded from the analytical phase of the study were used to develop an approach to data reduction. A two-step data strategy was used. First, raw data were sampled at different frequencies (5, 30, 60, 120, and 240 s) to identify optimal measurement frequency. Second, similarity between adjacent frames was evaluated using correlation coefficients to identify position changes of enrolled study participants. Data strategy performance was evaluated through visual inspection using heat maps and time series plots. RESULTS A sampling frequency of every 60 s provided reasonable representation of changes in interface pressure over time. This approach translated to using only 1.7% of the collected data in analyses. In the second step it was found that 160 frames within 24 h represented the pressure states of study participants. In total, only 480 frames from the 72 h of collected data would be needed for analyses without loss of information. Only ~ 0.2% of the raw data collected would be required for assessment of the primary trial outcome. CONCLUSIONS Data reduction is an important component of big data analytics. Our two-step strategy markedly reduced the amount of data required for analyses without loss of information. This data reduction strategy, if validated, could be used in other CPI and other settings where large amounts of both temporal and spatial data must be analysed.
Collapse
Affiliation(s)
- Mingkai Peng
- Libin Cardiovascular Institute of Alberta, University of Calgary, Calgary, AB, Canada
| | - Danielle A Southern
- O'Brien Institute for Public Health, University of Calgary, Calgary, AB, Canada
| | - Wrechelle Ocampo
- W21C Research and Innovation Centre, Cumming School of Medicine, GD01 Teaching Research & Wellness Building, University of Calgary, 3280 Hospital Drive, Calgary, NW, Canada
| | - Jaime Kaufman
- W21C Research and Innovation Centre, Cumming School of Medicine, GD01 Teaching Research & Wellness Building, University of Calgary, 3280 Hospital Drive, Calgary, NW, Canada
| | - David B Hogan
- O'Brien Institute for Public Health, University of Calgary, Calgary, AB, Canada.,W21C Research and Innovation Centre, Cumming School of Medicine, GD01 Teaching Research & Wellness Building, University of Calgary, 3280 Hospital Drive, Calgary, NW, Canada.,Department of Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - John Conly
- O'Brien Institute for Public Health, University of Calgary, Calgary, AB, Canada.,W21C Research and Innovation Centre, Cumming School of Medicine, GD01 Teaching Research & Wellness Building, University of Calgary, 3280 Hospital Drive, Calgary, NW, Canada.,Department of Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Infection Prevention and Control, Alberta Health Services, Calgary, AB, Canada.,Snyder Institute for Chronic Diseases, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Foothills Medical Centre, Special Services Building, Ground Floor, AGW5, Calgary, AB, T2N 2T9, Canada
| | - Barry W Baylis
- O'Brien Institute for Public Health, University of Calgary, Calgary, AB, Canada.,W21C Research and Innovation Centre, Cumming School of Medicine, GD01 Teaching Research & Wellness Building, University of Calgary, 3280 Hospital Drive, Calgary, NW, Canada.,Department of Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Foothills Medical Centre, Special Services Building, Ground Floor, AGW5, Calgary, AB, T2N 2T9, Canada
| | - Henry T Stelfox
- O'Brien Institute for Public Health, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Critical Care Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Alberta, Canada
| | - Chester Ho
- Department of Medicine, Division of Physical Medicine & Rehabilitation, University of Alberta, Edmonton, AB, Canada
| | - William A Ghali
- O'Brien Institute for Public Health, University of Calgary, Calgary, AB, Canada. .,W21C Research and Innovation Centre, Cumming School of Medicine, GD01 Teaching Research & Wellness Building, University of Calgary, 3280 Hospital Drive, Calgary, NW, Canada. .,Department of Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada. .,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada. .,Division of General Internal Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.
| |
Collapse
|
24
|
Rajput D, Wang WJ, Chen CC. Evaluation of a decided sample size in machine learning applications. BMC Bioinformatics 2023; 24:48. [PMID: 36788550 PMCID: PMC9926644 DOI: 10.1186/s12859-023-05156-9] [Citation(s) in RCA: 51] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 01/23/2023] [Indexed: 02/16/2023] Open
Abstract
BACKGROUND An appropriate sample size is essential for obtaining a precise and reliable outcome of a study. In machine learning (ML), studies with inadequate samples suffer from overfitting of data and have a lower probability of producing true effects, while the increment in sample size increases the accuracy of prediction but may not cause a significant change after a certain sample size. Existing statistical approaches using standardized mean difference, effect size, and statistical power for determining sample size are potentially biased due to miscalculations or lack of experimental details. This study aims to design criteria for evaluating sample size in ML studies. We examined the average and grand effect sizes and the performance of five ML methods using simulated datasets and three real datasets to derive the criteria for sample size. We systematically increase the sample size, starting from 16, by randomly sampling and examine the impact of sample size on classifiers' performance and both effect sizes. Tenfold cross-validation was used to quantify the accuracy. RESULTS The results demonstrate that the effect sizes and the classification accuracies increase while the variances in effect sizes shrink with the increment of samples when the datasets have a good discriminative power between two classes. By contrast, indeterminate datasets had poor effect sizes and classification accuracies, which did not improve by increasing sample size in both simulated and real datasets. A good dataset exhibited a significant difference in average and grand effect sizes. We derived two criteria based on the above findings to assess a decided sample size by combining the effect size and the ML accuracy. The sample size is considered suitable when it has appropriate effect sizes (≥ 0.5) and ML accuracy (≥ 80%). After an appropriate sample size, the increment in samples will not benefit as it will not significantly change the effect size and accuracy, thereby resulting in a good cost-benefit ratio. CONCLUSION We believe that these practical criteria can be used as a reference for both the authors and editors to evaluate whether the selected sample size is adequate for a study.
Collapse
Affiliation(s)
- Daniyal Rajput
- Institute of Cognitive Neuroscience, National Central University, Zhongda Rd, No. 300, Zhongli District, Taoyuan City, 320317, Taiwan, ROC. .,Taiwan International Graduate Program in Interdisciplinary Neuroscience, National Central University and Academia Sinica, Taipei, Taiwan, ROC.
| | - Wei-Jen Wang
- grid.37589.300000 0004 0532 3167Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan, ROC
| | - Chun-Chuan Chen
- grid.37589.300000 0004 0532 3167Institute of Cognitive Neuroscience, National Central University, Zhongda Rd, No. 300, Zhongli District, Taoyuan City, 320317 Taiwan, ROC ,grid.37589.300000 0004 0532 3167Department of Biomedical Sciences and Engineering, National Central University, Taoyuan, Taiwan, ROC
| |
Collapse
|
25
|
Pham TD, Ravi V, Fan C, Luo B, Sun XF. Tensor Decomposition of Largest Convolutional Eigenvalues Reveals Pathologic Predictive Power of RhoB in Rectal Cancer Biopsy. THE AMERICAN JOURNAL OF PATHOLOGY 2023; 193:579-590. [PMID: 36740183 DOI: 10.1016/j.ajpath.2023.01.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 12/29/2022] [Accepted: 01/06/2023] [Indexed: 02/05/2023]
Abstract
RhoB protein belongs to the Rho GTPase family, which plays an important role in governing cell signaling and tissue morphology. RhoB expression is known to have implications in pathologic processes of diseases. Investigation in the regulation and communication of this protein, detected by immunohistochemical staining on the microscope, is worth exploring to gain insightful information that may lead to identifying optimal disease treatment options. In particular, the role of RhoB in rectal cancer is not well discovered. Here, we report that methods of deep learning-based image analysis and the decomposition of multiway arrays discover the predictive factor of RhoB in two cohorts of patients with rectal cancer having survival rates of <5 and >5 years. The analysis results show distinctions between the tensor decomposition factors of the two cohorts.
Collapse
Affiliation(s)
- Tuan D Pham
- Center for Artificial Intelligence, Prince Mohammad Bin Fahd University, Khobar, Saudi Arabia.
| | - Vinayakumar Ravi
- Center for Artificial Intelligence, Prince Mohammad Bin Fahd University, Khobar, Saudi Arabia
| | - Chuanwen Fan
- Department of Clinical and Experimental Medicine, Linkoping University, Linkoping, Sweden
| | - Bin Luo
- Department of Clinical and Experimental Medicine, Linkoping University, Linkoping, Sweden; Department of Gastrointestinal Surgery, Sichuan Provincial People's Hospital, Chengdu, China
| | - Xiao-Feng Sun
- Department of Clinical and Experimental Medicine, Linkoping University, Linkoping, Sweden
| |
Collapse
|
26
|
Moradi H, Al-Hourani A, Concilia G, Khoshmanesh F, Nezami FR, Needham S, Baratchi S, Khoshmanesh K. Recent developments in modeling, imaging, and monitoring of cardiovascular diseases using machine learning. Biophys Rev 2023; 15:19-33. [PMID: 36909958 PMCID: PMC9995635 DOI: 10.1007/s12551-022-01040-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 12/21/2022] [Indexed: 01/12/2023] Open
Abstract
Cardiovascular diseases are the leading cause of mortality, morbidity, and hospitalization around the world. Recent technological advances have facilitated analyzing, visualizing, and monitoring cardiovascular diseases using emerging computational fluid dynamics, blood flow imaging, and wearable sensing technologies. Yet, computational cost, limited spatiotemporal resolution, and obstacles for thorough data analysis have hindered the utility of such techniques to curb cardiovascular diseases. We herein discuss how leveraging machine learning techniques, and in particular deep learning methods, could overcome these limitations and offer promise for translation. We discuss the remarkable capacity of recently developed machine learning techniques to accelerate flow modeling, enhance the resolution while reduce the noise and scanning time of current blood flow imaging techniques, and accurate detection of cardiovascular diseases using a plethora of data collected by wearable sensors.
Collapse
Affiliation(s)
- Hamed Moradi
- Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, Netherlands
| | - Akram Al-Hourani
- School of Engineering, RMIT University, Melbourne, Victoria Australia
| | | | - Farnaz Khoshmanesh
- School of Allied Health, Human Services & Sport, La Trobe University, Melbourne, Victoria Australia
| | - Farhad R. Nezami
- Division of Thoracic and Cardiac Surgery, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA USA
| | - Scott Needham
- Leading Technology Group, Melbourne, Victoria Australia
| | - Sara Baratchi
- School of Health and Biomedical Sciences, RMIT University, Melbourne, Victoria Australia
| | | |
Collapse
|
27
|
Data harnessing to nurture the human mind for a tailored approach to the child. Pediatr Res 2023; 93:357-365. [PMID: 36180585 DOI: 10.1038/s41390-022-02320-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 07/06/2022] [Accepted: 09/12/2022] [Indexed: 11/08/2022]
Abstract
Big data in pediatrics is an ocean of structured and unstructured data. Big data analysis helps to dive into the ocean of data to filter out information that can guide pediatricians in their decision making, precision diagnosis, and targeted therapy. In addition, big data and its analysis have helped in the surveillance, prevention, and performance of the health system. There has been a considerable amount of work in pediatrics that we have tried to highlight in this review and some of it has been already incorporated into the health system. Work in specialties of pediatrics is still forthcoming with the creation of a common data model and amalgamation of the huge "omics" database. The physicians entrusted with the care of children must be aware of the outcome so that they can play a role to ensure that big data algorithms have a clinically relevant effect in improving the health of their patients. They will apply the outcome of big data and its analysis in patient care through clinical algorithms or with the help of embedded clinical support alerts from the electronic medical records. IMPACT: Big data in pediatrics include structured, unstructured data, waveform data, biological, and social data. Big data analytics has unraveled significant information from these databases. This is changing how pediatricians will look at the body of available evidence and translate it into their clinical practice. Data harnessed so far is implemented in certain fields while in others it is in the process of development to become a clinical adjunct to the physician. Common databases are being prepared for future work. Diagnostic and prediction models when incorporated into the health system will guide the pediatrician to a targeted approach to diagnosis and therapy.
Collapse
|
28
|
Improving child health through Big Data and data science. Pediatr Res 2023; 93:342-349. [PMID: 35974162 PMCID: PMC9380977 DOI: 10.1038/s41390-022-02264-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 06/10/2022] [Accepted: 06/28/2022] [Indexed: 12/04/2022]
Abstract
Child health is defined by a complex, dynamic network of genetic, cultural, nutritional, infectious, and environmental determinants at distinct, developmentally determined epochs from preconception to adolescence. This network shapes the future of children, susceptibilities to adult diseases, and individual child health outcomes. Evolution selects characteristics during fetal life, infancy, childhood, and adolescence that adapt to predictable and unpredictable exposures/stresses by creating alternative developmental phenotype trajectories. While child health has improved in the United States and globally over the past 30 years, continued improvement requires access to data that fully represent the complexity of these interactions and to new analytic methods. Big Data and innovative data science methods provide tools to integrate multiple data dimensions for description of best clinical, predictive, and preventive practices, for reducing racial disparities in child health outcomes, for inclusion of patient and family input in medical assessments, and for defining individual disease risk, mechanisms, and therapies. However, leveraging these resources will require new strategies that intentionally address institutional, ethical, regulatory, cultural, technical, and systemic barriers as well as developing partnerships with children and families from diverse backgrounds that acknowledge historical sources of mistrust. We highlight existing pediatric Big Data initiatives and identify areas of future research. IMPACT: Big Data and data science can improve child health. This review highlights the importance for child health of child-specific and life course-based Big Data and data science strategies. This review provides recommendations for future pediatric-specific Big Data and data science research.
Collapse
|
29
|
Taipalus T, Isomöttönen V, Erkkilä H, Äyrämö S. Data Analytics in Healthcare: A Tertiary Study. SN COMPUTER SCIENCE 2022; 4:87. [PMID: 36532635 PMCID: PMC9734338 DOI: 10.1007/s42979-022-01507-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 11/14/2022] [Indexed: 12/13/2022]
Abstract
The field of healthcare has seen a rapid increase in the applications of data analytics during the last decades. By utilizing different data analytic solutions, healthcare areas such as medical image analysis, disease recognition, outbreak monitoring, and clinical decision support have been automated to various degrees. Consequently, the intersection of healthcare and data analytics has received scientific attention to the point of numerous secondary studies. We analyze studies on healthcare data analytics, and provide a wide overview of the subject. This is a tertiary study, i.e., a systematic review of systematic reviews. We identified 45 systematic secondary studies on data analytics applications in different healthcare sectors, including diagnosis and disease profiling, diabetes, Alzheimer's disease, and sepsis. Machine learning and data mining were the most widely used data analytics techniques in healthcare applications, with a rising trend in popularity. Healthcare data analytics studies often utilize four popular databases in their primary study search, typically select 25-100 primary studies, and the use of research guidelines such as PRISMA is growing. The results may help both data analytics and healthcare researchers towards relevant and timely literature reviews and systematic mappings, and consequently, towards respective empirical studies. In addition, the meta-analysis presents a high-level perspective on prominent data analytics applications in healthcare, indicating the most popular topics in the intersection of data analytics and healthcare, and provides a big picture on a topic that has seen dozens of secondary studies in the last 2 decades.
Collapse
Affiliation(s)
- Toni Taipalus
- Faculty of Information Technology, University of Jyväskylä, P.O. Box 35, FI-40014 Jyvaskyla, Finland
| | - Ville Isomöttönen
- Faculty of Information Technology, University of Jyväskylä, P.O. Box 35, FI-40014 Jyvaskyla, Finland
| | - Hanna Erkkilä
- Faculty of Information Technology, University of Jyväskylä, P.O. Box 35, FI-40014 Jyvaskyla, Finland
| | - Sami Äyrämö
- Faculty of Information Technology, University of Jyväskylä, P.O. Box 35, FI-40014 Jyvaskyla, Finland
| |
Collapse
|
30
|
Prediction of COVID-19 diagnosis based on openEHR artefacts. Sci Rep 2022; 12:12549. [PMID: 35869091 PMCID: PMC9306245 DOI: 10.1038/s41598-022-15968-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 07/01/2022] [Indexed: 11/08/2022] Open
Abstract
AbstractNowadays, we are facing the worldwide pandemic caused by COVID-19. The complexity and momentum of monitoring patients infected with this virus calls for the usage of agile and scalable data structure methodologies. OpenEHR is a healthcare standard that is attracting a lot of attention in recent years due to its comprehensive and robust architecture. The importance of an open, standardized and adaptable approach to clinical data lies in extracting value to generate useful knowledge that really can help healthcare professionals make an assertive decision. This importance is even more accentuated when facing a pandemic context. Thus, in this study, a system for tracking symptoms and health conditions of suspected or confirmed SARS-CoV-2 patients from a Portuguese hospital was developed using openEHR. All data on the evolutionary status of patients in home care as well as the results of their COVID-19 test were used to train different ML algorithms, with the aim of developing a predictive model capable of identifying COVID-19 infections according to the severity of symptoms identified by patients. The CRISP-DM methodology was used to conduct this research. The results obtained were promising, with the best model achieving an accuracy of 96.25%, a precision of 99.91%, a sensitivity of 92.58%, a specificity of 99.92%, and an AUC of 0.963, using the Decision Tree algorithm and the Split Validation method. Hence, in the future, after further testing, the predictive model could be implemented in clinical decision support systems.
Collapse
|
31
|
Li G, Togo R, Ogawa T, Haseyama M. Compressed gastric image generation based on soft-label dataset distillation for medical data sharing. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 227:107189. [PMID: 36323177 DOI: 10.1016/j.cmpb.2022.107189] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 07/07/2022] [Accepted: 10/17/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND AND OBJECTIVE Sharing of medical data is required to enable the cross-agency flow of healthcare information and construct high-accuracy computer-aided diagnosis systems. However, the large sizes of medical datasets, the massive amount of memory of saved deep convolutional neural network (DCNN) models, and patients' privacy protection are problems that can lead to inefficient medical data sharing. Therefore, this study proposes a novel soft-label dataset distillation method for medical data sharing. METHODS The proposed method distills valid information of medical image data and generates several compressed images with different data distributions for anonymous medical data sharing. Furthermore, our method can extract essential weights of DCNN models to reduce the memory required to save trained models for efficient medical data sharing. RESULTS The proposed method can compress tens of thousands of images into several soft-label images and reduce the size of a trained model to a few hundredths of its original size. The compressed images obtained after distillation have been visually anonymized; therefore, they do not contain the private information of the patients. Furthermore, we can realize high-detection performance with a small number of compressed images. CONCLUSIONS The experimental results show that the proposed method can improve the efficiency and security of medical data sharing.
Collapse
Affiliation(s)
- Guang Li
- Graduate School of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-Ku, Sapporo, 060-0814, Japan.
| | - Ren Togo
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-Ku, Sapporo, 060-0814, Japan.
| | - Takahiro Ogawa
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-Ku, Sapporo, 060-0814, Japan.
| | - Miki Haseyama
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-Ku, Sapporo, 060-0814, Japan.
| |
Collapse
|
32
|
Diakou I, Papakonstantinou E, Papageorgiou L, Pierouli K, Dragoumani K, Spandidos DA, Bacopoulou F, Chrousos GP, Goulielmos GΝ, Eliopoulos E, Vlachakis D. Multiple sclerosis and computational biology (Review). Biomed Rep 2022; 17:96. [PMID: 36382258 PMCID: PMC9634047 DOI: 10.3892/br.2022.1579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 09/27/2022] [Indexed: 12/02/2022] Open
Abstract
Multiple sclerosis (MS) is an autoimmune neurodegenerative disease whose prevalence has increased worldwide. The resultant symptoms may be debilitating and can substantially reduce the of patients. Computational biology, which involves the use of computational tools to answer biomedical questions, may provide the basis for novel healthcare approaches in the context of MS. The rapid accumulation of health data, and the ever-increasing computational power and evolving technology have helped to modernize and refine MS research. From the discovery of novel biomarkers to the optimization of treatment and a number of quality-of-life enhancements for patients, computational biology methods and tools are shaping the field of MS diagnosis, management and treatment. The final goal in such a complex disease would be personalized medicine, i.e., providing healthcare services that are tailored to the individual patient, in accordance to the particular biology of their disease and the environmental factors to which they are subjected. The present review article summarizes the current knowledge on MS, modern computational biology and the impact of modern computational approaches of MS.
Collapse
Affiliation(s)
- Io Diakou
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Eleni Papakonstantinou
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Louis Papageorgiou
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Katerina Pierouli
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Konstantina Dragoumani
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Demetrios A. Spandidos
- Laboratory of Clinical Virology, School of Medicine, University of Crete, 71003 Heraklion, Greece
| | - Flora Bacopoulou
- University Research Institute of Maternal and Child Health and Precision Medicine, and UNESCO Chair on Adolescent Health Care, National and Kapodistrian University of Athens, ‘Aghia Sophia’ Children's Hospital, 11527 Athens, Greece
| | - George P. Chrousos
- University Research Institute of Maternal and Child Health and Precision Medicine, and UNESCO Chair on Adolescent Health Care, National and Kapodistrian University of Athens, ‘Aghia Sophia’ Children's Hospital, 11527 Athens, Greece
| | - Georges Ν. Goulielmos
- Section of Molecular Pathology and Human Genetics, Department of Internal Medicine, School of Medicine, University of Crete, 71003 Heraklion, Greece
| | - Elias Eliopoulos
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
| | - Dimitrios Vlachakis
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, 11855 Athens, Greece
- University Research Institute of Maternal and Child Health and Precision Medicine, and UNESCO Chair on Adolescent Health Care, National and Kapodistrian University of Athens, ‘Aghia Sophia’ Children's Hospital, 11527 Athens, Greece
- Division of Endocrinology and Metabolism, Center of Clinical, Experimental Surgery and Translational Research, Biomedical Research Foundation of The Academy of Athens, 11527 Athens, Greece
| |
Collapse
|
33
|
Kundu A, Fu R, Grace D, Logie CH, Abramovich A, Baskerville B, Yager C, Schwartz R, Mitsakakis N, Planinac L, Chaiton M. Correlates of wanting to seek help for mental health and substance use concerns by sexual and gender minority young adults during the COVID-19 pandemic: A machine learning analysis. PLoS One 2022; 17:e0277438. [PMID: 36383536 PMCID: PMC9668172 DOI: 10.1371/journal.pone.0277438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 10/26/2022] [Indexed: 11/17/2022] Open
Abstract
The COVID-19 pandemic has worsened the mental health and substance use challenges among many people who are Two Spirit, lesbian, gay, bisexual, transgender, queer, questioning, and intersex (2SLGBTQI+). We aimed to identify the important correlates and their effects on the predicted likelihood of wanting to seek help among 2SLGBTQI+ young adults for mental health or substance use concerns during the pandemic. A cross-sectional survey was conducted in 2020-2021 among 2SLGBTQI+ young adults aged 16-29 living in two Canadian provinces (Ontario and Quebec). Among 1414 participants, 77% (n = 1089) wanted to seek help for their mental health or substance use concerns during the pandemic, out of these, 69.8% (n = 760) reported delay in accessing care. We built a random forest (RF) model to predict the status of wanting to seek help, which achieved moderately high performance with an area under the receiver operating characteristic curve (AUC) of 0.85. The top 10 correlates of wanting to seek help were worsening mental health, age, stigma and discrimination, and adverse childhood experiences. The interactions of adequate housing with certain sexual orientations, gender identities and mental health challenges were found to increase the likelihood of wanting to seek help. We built another RF model for predicting risk of delay in accessing care among participants who wanted to seek help (n = 1089). The model identified a similar set of top 10 correlates of delay in accessing care but lacked adequate performance (AUC 0.61). These findings can direct future research and targeted prevention measures to reduce health disparities for 2SLGBTQI+ young adults.
Collapse
Affiliation(s)
- Anasua Kundu
- Institute of Medical Science, University of Toronto, Toronto, Canada
- Centre for Addiction and Mental Health, Toronto, Canada
- Ontario Tobacco Research Unit, University of Toronto, Toronto, Canada
| | - Rui Fu
- Department of Otolaryngology—Head and Neck Surgery, Sunnybrook Research Institute, University of Toronto, Toronto, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - Daniel Grace
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - Carmen H. Logie
- Factor-Inwentash Faculty of Social Work, University of Toronto, Toronto, Canada
- United Nations University Institute for Water, Environment & Health, Hamilton, Canada
| | - Alex Abramovich
- Centre for Addiction and Mental Health, Toronto, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
- Department of Psychiatry, University of Toronto, Toronto, Canada
| | - Bruce Baskerville
- Canadian Institutes of Health Research, Ottawa, Canada
- School of Pharmacy, Faculty of Science, University of Waterloo, Kitchener, Canada
| | | | - Robert Schwartz
- Centre for Addiction and Mental Health, Toronto, Canada
- Ontario Tobacco Research Unit, University of Toronto, Toronto, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - Nicholas Mitsakakis
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
- Children’s Hospital of Eastern Ontario Research Institute, Ottawa, Canada
| | - Lynn Planinac
- Ontario Tobacco Research Unit, University of Toronto, Toronto, Canada
| | - Michael Chaiton
- Institute of Medical Science, University of Toronto, Toronto, Canada
- Centre for Addiction and Mental Health, Toronto, Canada
- Ontario Tobacco Research Unit, University of Toronto, Toronto, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| |
Collapse
|
34
|
Datta A, Nicolaï B, Vitrac O, Verboven P, Erdogdu F, Marra F, Sarghini F, Koh C. Computer-aided food engineering. NATURE FOOD 2022; 3:894-904. [PMID: 37118206 DOI: 10.1038/s43016-022-00617-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 09/09/2022] [Indexed: 04/30/2023]
Abstract
Computer-aided food engineering (CAFE) can reduce resource use in product, process and equipment development, improve time-to-market performance, and drive high-level innovation in food safety and quality. Yet, CAFE is challenged by the complexity and variability of food composition and structure, by the transformations food undergoes during processing and the limited availability of comprehensive mechanistic frameworks describing those transformations. Here we introduce frameworks to model food processes and predict physiochemical properties that will accelerate CAFE. We review how investments in open access, such as code sharing, and capacity-building through specialized courses could facilitate the use of CAFE in the transformation already underway in digital food systems.
Collapse
Affiliation(s)
- Ashim Datta
- Department of Biological and Environmental Engineering, Cornell University, Ithaca, NY, USA.
| | - Bart Nicolaï
- Biosystems Department - MeBioS Division, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Olivier Vitrac
- Université Paris-Saclay, INRAE, AgroParisTech, UMR 0782 SayFood, Massy, France
| | - Pieter Verboven
- Biosystems Department - MeBioS Division, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Ferruh Erdogdu
- Department of Food Engineering, Ankara University, Golbasi-Ankara, Turkey
| | - Francesco Marra
- Department of Industrial Engineering, University of Salerno, Fisciano, Italy
| | - Fabrizio Sarghini
- Department of Agricultural Sciences, Agricultural and Biosystems Engineering, University of Naples Federico II, Portici, Italy
| | - Chris Koh
- PepsiCo R&D, PepsiCo, Plano, TX, USA
| |
Collapse
|
35
|
Jeong JC, Hands I, Kolesar JM, Rao M, Davis B, Dobyns Y, Hurt-Mueller J, Levens J, Gregory J, Williams J, Witt L, Kim EM, Burton C, Elbiheary AA, Chang M, Durbin EB. Local data commons: the sleeping beauty in the community of data commons. BMC Bioinformatics 2022; 23:386. [PMID: 36151511 PMCID: PMC9502580 DOI: 10.1186/s12859-022-04922-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Accepted: 09/12/2022] [Indexed: 12/03/2022] Open
Abstract
Background Public Data Commons (PDC) have been highlighted in the scientific literature for their capacity to collect and harmonize big data. On the other hand, local data commons (LDC), located within an institution or organization, have been underrepresented in the scientific literature, even though they are a critical part of research infrastructure. Being closest to the sources of data, LDCs provide the ability to collect and maintain the most up-to-date, high-quality data within an organization, closest to the sources of the data. As a data provider, LDCs have many challenges in both collecting and standardizing data, moreover, as a consumer of PDC, they face problems of data harmonization stemming from the monolithic harmonization pipeline designs commonly adapted by many PDCs. Unfortunately, existing guidelines and resources for building and maintaining data commons exclusively focus on PDC and provide very little information on LDC. Results This article focuses on four important observations. First, there are three different types of LDC service models that are defined based on their roles and requirements. These can be used as guidelines for building new LDC or enhancing the services of existing LDC. Second, the seven core services of LDC are discussed, including cohort identification and facilitation of genomic sequencing, the management of molecular reports and associated infrastructure, quality control, data harmonization, data integration, data sharing, and data access control. Third, instead of commonly developed monolithic systems, we propose a new data sharing method for data harmonization that combines both divide-and-conquer and bottom-up approaches. Finally, an end-to-end LDC implementation is introduced with real-world examples. Conclusions Although LDCs are an optimal place to identify and address data quality issues, they have traditionally been relegated to the role of passive data provider for much larger PDC. Indeed, many LDCs limit their functions to only conducting routine data storage and transmission tasks due to a lack of information on how to design, develop, and improve their services using limited resources. We hope that this work will be the first small step in raising awareness among the LDCs of their expanded utility and to publicize to a wider audience the importance of LDC.
Collapse
Affiliation(s)
- Jong Cheol Jeong
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA. .,Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.
| | - Isaac Hands
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.,Kentucky Cancer Registry, Lexington, KY, USA
| | - Jill M Kolesar
- Department of Pharmacy Practice and Science, College of Pharmacy, University of Kentucky, Lexington, KY, USA
| | - Mahadev Rao
- Department of Pharmacy Practice, Center for Translational Research, Manipal College of Pharmaceutical Sciences, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Bront Davis
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.,Kentucky Cancer Registry, Lexington, KY, USA
| | - York Dobyns
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.,Kentucky Cancer Registry, Lexington, KY, USA
| | - Joseph Hurt-Mueller
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.,Kentucky Cancer Registry, Lexington, KY, USA
| | - Justin Levens
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.,Kentucky Cancer Registry, Lexington, KY, USA
| | - Jenny Gregory
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.,Kentucky Cancer Registry, Lexington, KY, USA
| | - John Williams
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.,Kentucky Cancer Registry, Lexington, KY, USA
| | - Lisa Witt
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA.,Kentucky Cancer Registry, Lexington, KY, USA
| | - Eun Mi Kim
- Department of Computer Science, Eastern Kentucky University, Richmond, KY, USA
| | - Carlee Burton
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA
| | - Amir A Elbiheary
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA
| | - Mingguang Chang
- Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA
| | - Eric B Durbin
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA. .,Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, USA. .,Kentucky Cancer Registry, Lexington, KY, USA.
| |
Collapse
|
36
|
A Deep Learning Model Incorporating Knowledge Representation Vectors and Its Application in Diabetes Prediction. DISEASE MARKERS 2022; 2022:7593750. [PMID: 35990251 PMCID: PMC9391170 DOI: 10.1155/2022/7593750] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 07/24/2022] [Accepted: 07/30/2022] [Indexed: 01/09/2023]
Abstract
The deep learning methods for various disease prediction tasks have become very effective and even surpass human experts. However, the lack of interpretability and medical expertise limits its clinical application. This paper combines knowledge representation learning and deep learning methods, and a disease prediction model is constructed. The model initially constructs the relationship graph between the physical indicator and the test value based on the normal range of human physical examination index. And the human physical examination index for testing value by knowledge representation learning model is encoded. Then, the patient physical examination data is represented as a vector and input into a deep learning model built with self-attention mechanism and convolutional neural network to implement disease prediction. The experimental results show that the model which is used in diabetes prediction yields an accuracy of 97.18% and the recall of 87.55%, which outperforms other machine learning methods (e.g., lasso, ridge, support vector machine, random forest, and XGBoost). Compared with the best performing random forest method, the recall is increased by 5.34%, respectively. Therefore, it can be concluded that the application of medical knowledge into deep learning through knowledge representation learning can be used in diabetes prediction for the purpose of early detection and assisting diagnosis.
Collapse
|
37
|
Interactive exploration of a global clinical network from a large breast cancer cohort. NPJ Digit Med 2022; 5:113. [PMID: 35948579 PMCID: PMC9365762 DOI: 10.1038/s41746-022-00647-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 06/27/2022] [Indexed: 11/08/2022] Open
Abstract
Despite unprecedented amount of information now available in medical records, health data remain underexploited due to their heterogeneity and complexity. Simple charts and hypothesis-driven statistics can no longer apprehend the content of information-rich clinical data. There is, therefore, a clear need for powerful interactive visualization tools enabling medical practitioners to perceive the patterns and insights gained by state-of-the-art machine learning algorithms. Here, we report an interactive graphical interface for use as the front end of a machine learning causal inference server (MIIC), to facilitate the visualization and comprehension by clinicians of relationships between clinically relevant variables. The widespread use of such tools, facilitating the interactive exploration of datasets, is crucial both for data visualization and for the generation of research hypotheses. We demonstrate the utility of the MIIC interactive interface, by exploring the clinical network of a large cohort of breast cancer patients treated with neoadjuvant chemotherapy (NAC). This example highlights, in particular, the direct and indirect links between post-NAC clinical responses and patient survival. The MIIC interactive graphical interface has the potential to help clinicians identify actionable nodes and edges in clinical networks, thereby ultimately improving the patient care pathway.
Collapse
|
38
|
Kim T, Choi H, Lee SM. Parametric and non-parametric estimation of reference intervals for routine laboratory tests: an analysis of health check-up data for 260 889 young men in the South Korean military. BMJ Open 2022; 12:e062617. [PMID: 35879016 PMCID: PMC9328105 DOI: 10.1136/bmjopen-2022-062617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 07/04/2022] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVES Determination of reference intervals (RIs) using big data faces several obstacles due to heterogeneity in analysers, period and ethnicity. The present study aimed to establish the RIs for routine common blood count (CBC) and biochemistry laboratory tests in homogeneous, healthy, male Korean soldiers in their 20s using a large health check-up data set, comparing parametric and non-parametric estimation. DESIGN A multicentre, cross-sectional study. SETTING Seven armed forces hospitals in South Korea. PARTICIPANTS A total of 609 649 men underwent health examination when promoted to corporal between January 2015 and September 2021. 260 889 eligible individuals aged 20-25 were included in the analysis. MAIN OUTCOMES AND MEASURES The RIs were established by parametric and non-parametric methods. In the parametric approach, maximum likelihood estimation was applied to measure the Box-Cox transformation parameter and the values at the 2.5th and 97.5th percentiles were recalculated. The non-parametric approach adopted the Tukey's exclusion test and the values at the 2.5th and 97.5th percentiles were obtained. Classification by body mass index was also performed. RESULTS The obtained RIs for haematology parameters were comparable between devices. If the values followed a Gaussian distribution, parametric and non-parametric methods were well matched for haematology and biochemical markers. When the values were right-skewed, the upper limits were higher with parametric than with non-parametric methods. Participants with obesity showed higher RIs for CBC, some liver function tests and some lipid profiles than participants without obesity. CONCLUSIONS Using data from healthy, male Korean soldiers in their 20s, we proposed the RIs for CBC and biochemical parameters, comparing parametric and non-parametric estimation. As such approaches based on large data sets become more prevalent, further studies are needed to discriminate eligible individuals and determine RIs in an extrapolated sample.
Collapse
Affiliation(s)
- Taeyun Kim
- Internal Medicine, The Armed Forces Goyang Hospital, Goyang, Republic of Korea
| | - Hyunji Choi
- Laboratory Medicine, Pusan National University Yangsan Hospital, Yangsan, Republic of Korea
- Research Institute for Convergence of Biomedical Science and Technology, Pusan National University Yangsan Hospital, Yangsan, Gyeongnam, Republic of Korea
| | - Sun Min Lee
- Research Institute for Convergence of Biomedical Science and Technology, Pusan National University Yangsan Hospital, Yangsan, Gyeongnam, Republic of Korea
- Laboratory Medicine, Pusan National University School of Medicine, Busan, Republic of Korea
| |
Collapse
|
39
|
Comparing Worldwide, National, and Independent Notifications about Adverse Drug Reactions Due to COVID-19 Vaccines. INFORMATION 2022. [DOI: 10.3390/info13070329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
The rapid development of effective vaccines against COVID-19 is an extraordinary achievement. However, no medical product can ever be considered risk-free. Several countries have a pharmacovigilance system that detects, assesses, understands, and prevents possible adverse effects of a drug. To benefit from such huge data sources, specialists and researchers need advanced big data analysis tools able to extract value and find valuable insights. This paper defines a general framework for a pharmaceutical data analysis application that provides a predefined (but extensible) set of functions for each data processing step (i.e., data collection, filtering, enriching, analysis, and visualization). As a case study, we present here an analysis of the potential side effects observed following the administration of the COVID-19 vaccines. The experimental evaluation shows that: (i) most adverse events can be classified as non-serious and concern muscle/joint pain, chills and nausea, headache, and fatigue; (ii) the notification rate is higher in the age group 20–39 years and decreases in older age groups and in very young people.
Collapse
|
40
|
Grosman L, Muller A, Dag I, Goldgeier H, Harush O, Herzlinger G, Nebenhaus K, Valetta F, Yashuv T, Dick N. Artifact3-D: New software for accurate, objective and efficient 3D analysis and documentation of archaeological artifacts. PLoS One 2022; 17:e0268401. [PMID: 35709137 PMCID: PMC9202890 DOI: 10.1371/journal.pone.0268401] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 04/20/2022] [Indexed: 11/19/2022] Open
Abstract
The study of artifacts is fundamental to archaeological research. The features of individual artifacts are recorded, analyzed, and compared within and between contextual assemblages. Here we present and make available for academic-use Artifact3-D, a new software package comprised of a suite of analysis and documentation procedures for archaeological artifacts. We introduce it here, alongside real archaeological case studies to demonstrate its utility. Artifact3-D equips its users with a range of computational functions for accurate measurements, including orthogonal distances, surface area, volume, CoM, edge angles, asymmetry, and scar attributes. Metrics and figures for each of these measurements are easily exported for the purposes of further analysis and illustration. We test these functions on a range of real archaeological case studies pertaining to tool functionality, technological organization, manufacturing traditions, knapping techniques, and knapper skill. Here we focus on lithic artifacts, but the Artifact3-D software can be used on any artifact type to address the needs of modern archaeology. Computational methods are increasingly becoming entwined in the excavation, documentation, analysis, database creation, and publication of archaeological research. Artifact3-D offers functions to address every stage of this workflow. It equips the user with the requisite toolkit for archaeological research that is accurate, objective, repeatable and efficient. This program will help archaeological research deal with the abundant material found during excavations and will open new horizons in research trajectories.
Collapse
Affiliation(s)
- Leore Grosman
- Institute of Archaeology, Mount Scopus, The Hebrew University of Jerusalem, Jerusalem, Israel
- * E-mail:
| | - Antoine Muller
- Institute of Archaeology, Mount Scopus, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Itamar Dag
- Institute of Archaeology, Mount Scopus, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Hadas Goldgeier
- Institute of Archaeology, Mount Scopus, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Ortal Harush
- Institute of Archaeology, Mount Scopus, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Gadi Herzlinger
- Institute of Archaeology, Mount Scopus, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Keren Nebenhaus
- Institute of Archaeology, Mount Scopus, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Francesco Valetta
- Institute of Archaeology, Mount Scopus, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Talia Yashuv
- Institute of Archaeology, Mount Scopus, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Nir Dick
- Institute of Archaeology, Mount Scopus, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
41
|
Go S, Wang Q, Wang B, Jiang Y, Bajalovic N, Loke DK. Continual Learning Electrical Conduction in Resistive‐Switching‐Memory Materials. ADVANCED THEORY AND SIMULATIONS 2022. [DOI: 10.1002/adts.202200226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Shao‐Xiang Go
- Department of Science, Mathematics and Technology Singapore University of Technology and Design 487372 Singapore
| | - Qiang Wang
- Department of Science, Mathematics and Technology Singapore University of Technology and Design 487372 Singapore
| | - Bo Wang
- Department of Information Systems Technology and Design Singapore University of Technology and Design 487372 Singapore
| | - Yu Jiang
- Department of Science, Mathematics and Technology Singapore University of Technology and Design 487372 Singapore
| | - Natasa Bajalovic
- Department of Science, Mathematics and Technology Singapore University of Technology and Design 487372 Singapore
| | - Desmond K. Loke
- Department of Science, Mathematics and Technology Singapore University of Technology and Design 487372 Singapore
| |
Collapse
|
42
|
Zenker S, Strech D, Ihrig K, Jahns R, Müller G, Schickhardt C, Schmidt G, Speer R, Winkler E, von Kielmansegg SG, Drepper J. Data protection-compliant broad consent for secondary use of health care data and human biosamples for (bio)medical research: Towards a new German national standard. J Biomed Inform 2022; 131:104096. [PMID: 35643273 DOI: 10.1016/j.jbi.2022.104096] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Revised: 04/05/2022] [Accepted: 05/20/2022] [Indexed: 01/10/2023]
Abstract
BACKGROUND The secondary use of deidentified but not anonymized patient data is a promising approach for enabling precision medicine and learning health care systems. In most national jurisdictions (e.g., in Europe), this type of secondary use requires patient consent. While various ethical, legal, and technical analyses have stressed the opportunities and challenges for different types of consent over the past decade, no country has yet established a national consent standard accepted by the relevant authorities. METHODS A working group of the national Medical Informatics Initiative in Germany conducted a requirements analysis and developed a GDPR-compliant broad consent standard. The development included consensus procedures within the Medical Informatics Initiative, a documented consultation process with all relevant stakeholder groups and authorities, and the ultimate submission for approval via the national data protection authorities. RESULTS This paper presents the broad consent text together with a guidance document on mandatory safeguards for broad consent implementation. The mandatory safeguards comprise i) independent review of individual research projects, ii) organizational measures to protect patients from involuntary disclosure of protected information, and iii) comprehensive information for patients and public transparency. This paper further describes the key issues discussed with the relevant authorities, especially the position on additional or alternative consent approaches such as dynamic consent. DISCUSSION Both the resulting broad consent text and the national consensus process are relevant for similar activities internationally. A key challenge of aligning consent documents with the various stakeholders was explaining and justifying the decision to use broad consent and the decision against using alternative models such as dynamic consent. Public transparency for all secondary use projects and their results emerged as a key factor in this justification. While currently largely limited to academic medicine in Germany, the first steps for extending this broad consent approach to wider areas of application, including smaller institutions and medical practices, are currently under consideration.
Collapse
Affiliation(s)
- Sven Zenker
- Staff Unit for Scientific & Medical Technology Development & Coordination (MWTek), Commercial Directorate, Institute for Medical Biometry, Informatics & Epidemiology, Department of Anesthesiology and Intensive Care Medicine, University Hospital Bonn, Venusbergcampus 1, 53127 Bonn, Germany.
| | - Daniel Strech
- QUEST Center, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Kristina Ihrig
- Department of Medicine, Hematology/Oncology, Goethe University, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany; German Cancer Consortium (DKTK), Partner Site Frankfurt/Mainz, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Roland Jahns
- Interdisciplinary Bank of Biomaterials and Data Würzburg (ibdw), University and University Hospital of Würzburg, Building A8/A9, Straubmühlweg 2a, 97078 Würzburg, Germany
| | - Gabriele Müller
- Center for Evidence-Based Healthcare, University Hospital Carl Gustav Carus and Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Fetscherstr. 74, 01307 Dresden, Germany
| | - Christoph Schickhardt
- Section of Translational Medical Ethics, National Center for Tumor Diseases, German Cancer Research Center, Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
| | - Georg Schmidt
- Department of Internal Medicine 1, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany, German Centre for Cardiovascular Research partner site Munich Heart Alliance, Munich, Germany
| | - Ronald Speer
- LIFE - Leipzig Research Center for Civilization Diseases, Medical Faculty, Leipzig University, Philipp-Rosenthal-Straße 27, 04103 Leipzig, Germany
| | - Eva Winkler
- Section for Translational Medical Ethics, Dept Medical Oncology, National Center for Tumor Diseases, Heidelberg University Hospital, INF 460, 69121 Heidelberg
| | | | - Johannes Drepper
- TMF - Technology, Methods, and Infrastructure for Networked Medical Research, Charlottenstrasse 42, 10117 Berlin, Germany
| |
Collapse
|
43
|
Zadeh FA, Ardalani MV, Salehi AR, Jalali Farahani R, Hashemi M, Mohammed AH. An Analysis of New Feature Extraction Methods Based on Machine Learning Methods for Classification Radiological Images. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:3035426. [PMID: 35634075 PMCID: PMC9131703 DOI: 10.1155/2022/3035426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/01/2022] [Revised: 02/02/2022] [Accepted: 03/08/2022] [Indexed: 12/02/2022]
Abstract
The lungs are COVID-19's most important focus, as it induces inflammatory changes in the lungs that can lead to respiratory insufficiency. Reducing the supply of oxygen to human cells negatively impacts humans, and multiorgan failure with a high mortality rate may, in certain circumstances, occur. Radiological pulmonary evaluation is a vital part of patient therapy for the critically ill patient with COVID-19. The evaluation of radiological imagery is a specialized activity that requires a radiologist. Artificial intelligence to display radiological images is one of the essential topics. Using a deep machine learning technique to identify morphological differences in the lungs of COVID-19-infected patients could yield promising results on digital images of chest X-rays. Minor differences in digital images that are not detectable or apparent to the human eye may be detected using computer vision algorithms. This paper uses machine learning methods to diagnose COVID-19 on chest X-rays, and the findings have been very promising. The dataset includes COVID-19-enhanced X-ray images for disease detection using chest X-ray images. The data were gathered from two publicly accessible datasets. The feature extractions are done using the gray level co-occurrence matrix methods. K-nearest neighbor, support vector machine, linear discrimination analysis, naïve Bayes, and convolutional neural network methods are used for the classification of patients. According to the findings, convolutional neural networks' efficiency linked to imaging modalities with fewer human involvements outperforms other traditional machine learning approaches.
Collapse
Affiliation(s)
| | - Mohammadreza Vazifeh Ardalani
- Robotics Research Laboratory, Center of Excellence in Experimental Solid Mechanics and Dynamics, School of Mechanical Engineering, Iran University of Science and Technology, Tehran, Iran
| | - Ali Rezaei Salehi
- Industrial Engineering Department, Technical and Engineering Faculty, University of Science and Culture, Tehran, Iran
| | | | - Mandana Hashemi
- School of Industrial and Information Engineering, Politecnico di Milano University, Milan, Italy
| | - Adil Hussein Mohammed
- Department of Communication and Computer Engineering, Faculty of Engineering, Cihan University-Erbil, Erbil, Kurdistan Region, Iraq
| |
Collapse
|
44
|
Wan S, Zhao X, Niu Z, Dong L, Wu Y, Gu S, Feng Y, Hua X. Influence of ambient air pollution on successful pregnancy with frozen embryo transfer: A machine learning prediction model. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2022; 236:113444. [PMID: 35367879 DOI: 10.1016/j.ecoenv.2022.113444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 03/18/2022] [Accepted: 03/19/2022] [Indexed: 06/14/2023]
Abstract
Numerous air pollutants have been reported to influence the outcomes of in vitro fertilization (IVF). However, whether air pollution affects implantation in frozen embryo transfer (FET) process is under debate. We aimed to find the association between ambient air pollution and implantation potential of FET and test the value of adding air pollution data to a random forest model (RFM) predicting intrauterine pregnancy. Using a retrospective study of a 4-year single-center design,we analyzed 3698 cycles of women living in Shanghai who underwent FET between 2015 and 2018. To estimate patients' individual exposure to air pollution, we computed averages of daily concentrations of six air pollutants including PM2.5, PM10, SO2, CO, NO2, and O3 measured at 9 monitoring stations in Shanghai for the exposure period (one month before FET). Moreover, A predictive model of 15 variables was established using RFM. Air pollutants levels of patients with or without intrauterine pregnancy were compared. Our results indicated that for exposure periods before FET, NO2 were negatively associated with intrauterine pregnancy (OR: 0.906, CI: 0.816-0.989). AUROC increased from 0.712 to 0.771 as air pollutants features were added. Overall, our findings demonstrate that exposure to NO2 before transfer has an adverse effect on clinical pregnancy. The performance to predict intrauterine pregnancy will improve with the use of air pollution data in RFM.
Collapse
Affiliation(s)
- Sheng Wan
- Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Xiaobo Zhao
- Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Zhihong Niu
- Reproductive Medical Center, Obstetrics and Gynecology Department, Ruijin Hospital Affiliated with the Medical School of Shanghai Jiao Tong University, Shanghai, China
| | - Lingling Dong
- Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Yuelin Wu
- Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Shengyi Gu
- Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Yun Feng
- Reproductive Medical Center, Obstetrics and Gynecology Department, Ruijin Hospital Affiliated with the Medical School of Shanghai Jiao Tong University, Shanghai, China.
| | - Xiaolin Hua
- Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, China.
| |
Collapse
|
45
|
Development of Elderly Life Quality Database in Thailand with a Correlation Feature Analysis. SUSTAINABILITY 2022. [DOI: 10.3390/su14084468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Understanding the context of the elderly is very important for determining guidelines that improve their quality of life. One problem in Thailand, in this context, is that each organization involved in caring for the elderly has its own separate data collection, resulting in mismatches that negatively affect government agencies in their monitoring. This study proposes the development of a central database for elderly care and includes a study of factors affecting their quality of life. The proposed system can be used to collect data, manage data, perform data analysis with multiple linear regression, and display results via a web application in visualizations of many forms, such as graphs, charts, and spatial data. In addition, our system would replace paper forms and increase efficiency in work, as well as in storage and processing. In an observational case study, we include 240 elderly in village areas 5, 6, 7, and 8, in the Makham Tia subdistrict, Muang district, Surat Thani province, Thailand. Data were analyzed with multiple linear regression to predict the level of quality of life by using other indicators in the data gathered. This model uses only 14 factors of the available 39. Moreover, this model has an accuracy of 86.55%, R-squared = 69.11%, p-Value < 2.2×10−16, and Kappa = 0.7994 at 95% confidence. These results can make subsequent data collection more comfortable and faster as the number of questions is reduced, while revealing with good confidence the level of quality of life of the elderly. In addition, the system has a central database that is useful for elderly care organizations in the community, in support of planning and policy setting for elderly care.
Collapse
|
46
|
Armenta-Medina D, Brambila-Tapia AJL, Miranda-Jiménez S, Rodea-Montero ER. A Web Application for Biomedical Text Mining of Scientific Literature Associated with Coronavirus-Related Syndromes: Coronavirus Finder. Diagnostics (Basel) 2022; 12:887. [PMID: 35453935 PMCID: PMC9028729 DOI: 10.3390/diagnostics12040887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 02/10/2022] [Accepted: 02/11/2022] [Indexed: 12/10/2022] Open
Abstract
In this study, a web application was developed that comprises scientific literature associated with the Coronaviridae family, specifically for those viruses that are members of the Genus Betacoronavirus, responsible for emerging diseases with a great impact on human health: Middle East Respiratory Syndrome-Related Coronavirus (MERS-CoV) and Severe Acute Respiratory Syndrome-Related Coronavirus (SARS-CoV, SARS-CoV-2). The information compiled on this webserver aims to understand the basics of these viruses' infection, and the nature of their pathogenesis, enabling the identification of molecular and cellular components that may function as potential targets on the design and development of successful treatments for the diseases associated with the Coronaviridae family. Some of the web application's primary functions are searching for keywords within the scientific literature, natural language processing for the extraction of genes and words, the generation and visualization of gene networks associated with viral diseases derived from the analysis of latent semantic space, and cosine similarity measures. Interestingly, our gene association analysis reveals drug targets in understudies, and new targets suggested in the scientific literature to treat coronavirus.
Collapse
Affiliation(s)
- Dagoberto Armenta-Medina
- Consejo Nacional de Ciencia y Tecnología (CONACyT), Ciudad de México 03940, Mexico;
- Centro de Investigación e Innovación en Tecnologías de la Información y Comunicación (INFOTEC), Aguascalientes 20326, Mexico
| | | | - Sabino Miranda-Jiménez
- Consejo Nacional de Ciencia y Tecnología (CONACyT), Ciudad de México 03940, Mexico;
- Centro de Investigación e Innovación en Tecnologías de la Información y Comunicación (INFOTEC), Aguascalientes 20326, Mexico
| | | |
Collapse
|
47
|
Shi X, Nikolic G, Fischaber S, Black M, Rankin D, Epelde G, Beristain A, Alvarez R, Arrue M, Pita Costa J, Grobelnik M, Stopar L, Pajula J, Umer A, Poliwoda P, Wallace J, Carlin P, Pääkkönen J, De Moor B. System Architecture of a European Platform for Health Policy Decision Making: MIDAS. Front Public Health 2022; 10:838438. [PMID: 35433572 PMCID: PMC9008448 DOI: 10.3389/fpubh.2022.838438] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 01/13/2022] [Indexed: 12/01/2022] Open
Abstract
Background Healthcare data is a rich yet underutilized resource due to its disconnected, heterogeneous nature. A means of connecting healthcare data and integrating it with additional open and social data in a secure way can support the monumental challenge policy-makers face in safely accessing all relevant data to assist in managing the health and wellbeing of all. The goal of this study was to develop a novel health data platform within the MIDAS (Meaningful Integration of Data Analytics and Services) project, that harnesses the potential of latent healthcare data in combination with open and social data to support evidence-based health policy decision-making in a privacy-preserving manner. Methods The MIDAS platform was developed in an iterative and collaborative way with close involvement of academia, industry, healthcare staff and policy-makers, to solve tasks including data storage, data harmonization, data analytics and visualizations, and open and social data analytics. The platform has been piloted and tested by health departments in four European countries, each focusing on different region-specific health challenges and related data sources. Results A novel health data platform solving the needs of Public Health decision-makers was successfully implemented within the four pilot regions connecting heterogeneous healthcare datasets and open datasets and turning large amounts of previously isolated data into actionable information allowing for evidence-based health policy-making and risk stratification through the application and visualization of advanced analytics. Conclusions The MIDAS platform delivers a secure, effective and integrated solution to deal with health data, providing support for health policy decision-making, planning of public health activities and the implementation of the Health in All Policies approach. The platform has proven transferable, sustainable and scalable across policies, data and regions.
Collapse
Affiliation(s)
- Xi Shi
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Leuven, Belgium
- Vlerick Business School, Leuven, Belgium
- *Correspondence: Xi Shi
| | - Gorana Nikolic
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Leuven, Belgium
| | | | - Michaela Black
- School of Computing, Engineering and Intelligent Systems, Ulster University, Londonderry, United Kingdom
| | - Debbie Rankin
- School of Computing, Engineering and Intelligent Systems, Ulster University, Londonderry, United Kingdom
| | - Gorka Epelde
- Vicomtech Foundation, Basque Research and Technology Alliance (BRTA), Donostia-San Sebastián, Spain
- EHealth Group, Biodonostia Health Research Institute, Donostia-San Sebastián, Spain
| | - Andoni Beristain
- Vicomtech Foundation, Basque Research and Technology Alliance (BRTA), Donostia-San Sebastián, Spain
- EHealth Group, Biodonostia Health Research Institute, Donostia-San Sebastián, Spain
| | - Roberto Alvarez
- Vicomtech Foundation, Basque Research and Technology Alliance (BRTA), Donostia-San Sebastián, Spain
- EHealth Group, Biodonostia Health Research Institute, Donostia-San Sebastián, Spain
| | - Monica Arrue
- Vicomtech Foundation, Basque Research and Technology Alliance (BRTA), Donostia-San Sebastián, Spain
- EHealth Group, Biodonostia Health Research Institute, Donostia-San Sebastián, Spain
| | - Joao Pita Costa
- Quintelligence, Ljubljana, Slovenia
- AI Lab, Institute Jozef Stefan, Ljubljana, Slovenia
| | - Marko Grobelnik
- Quintelligence, Ljubljana, Slovenia
- AI Lab, Institute Jozef Stefan, Ljubljana, Slovenia
| | - Luka Stopar
- Quintelligence, Ljubljana, Slovenia
- AI Lab, Institute Jozef Stefan, Ljubljana, Slovenia
| | - Juha Pajula
- Data-Driven Solutions, Smart Health, VTT Technical Research Centre of Finland, Tampere, Finland
| | - Adil Umer
- Data-Driven Solutions, Smart Health, VTT Technical Research Centre of Finland, Tampere, Finland
| | - Peter Poliwoda
- IBM Ireland Lab, Innovation Exchange, International Business Machines Corporation, Dublin, Ireland
| | - Jonathan Wallace
- School of Computing, Ulster University, Jordanstown, United Kingdom
| | - Paul Carlin
- Faculty of Wellbeing, Education and Language Studies, Open University, Belfast, United Kingdom
| | - Jarmo Pääkkönen
- Centre for Health and Technology, University of Oulu, Oulu, Finland
| | - Bart De Moor
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Leuven, Belgium
| |
Collapse
|
48
|
Valenzuela W, Balsiger F, Wiest R, Scheidegger O. Medical-Blocks: A Platform for Exploration, Management, Analysis, and Sharing of Data in Biomedical Research. JMIR Form Res 2022; 6:e32287. [PMID: 35232718 PMCID: PMC9039815 DOI: 10.2196/32287] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 02/04/2022] [Accepted: 02/28/2022] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Biomedical research requires healthcare institutions to provide sensitive clinical data to leverage data science and artificial intelligence technologies. However, providing healthcare data to researchers simple and secure, proves to be challenging for healthcare institutions. OBJECTIVE We describe and introduce Medical-Blocks, a platform for data exploration, data management, data analysis, and data sharing in biomedical research. METHODS The specification requirements for Medical-Blocks included: i) Connection to data sources of healthcare institutions with an interface for data exploration, ii) management of data in an internal file storage system, iii) data analysis through visualization and classification of data, and iv) data sharing via a file hosting service for collaboration. Medical-Blocks should be simple to use via a web-based user interface and extensible with new functionalities by a modular design via microservices ("blocks"). The scalability of the platform should be ensured by containerization. Security and legal regulations were considered during the development. RESULTS Medical-Blocks is a web application that runs in the cloud or as a local instance at a healthcare institution. Local instances of Medical-Blocks access data sources such as electronic health records and picture archiving and communications system (PACS) at healthcare institutions. Researchers and clinicians can explore, manage, and analyze the available data through Medical-Blocks. The data analysis involves classification of data for metadata extraction and the formation of cohorts. In collaborations, metadata (e.g., number of patients per cohort) and/or the data itself can be shared through Medical-Blocks locally or via a cloud instance to other researchers and clinicians. CONCLUSIONS Medical-Blocks facilitates biomedical research by providing a centralized platform to interact with medical data in collaborative research projects. The access to and management of medical data is simplified. Data can be swiftly analyzed to form cohorts for research and be shared among researchers. The modularity of Medical-Blocks makes the platform feasible for biomedical research where heterogenous medical data is needed. CLINICALTRIAL
Collapse
Affiliation(s)
- Waldo Valenzuela
- Institute for Diagnostic and Interventional Neuroradiology, Inselspital, Bern University Hospital, University of Bern, Freiburgstrasse 18, Bern, CH
| | - Fabian Balsiger
- Support Center for Advanced Neuroimaging (SCAN), Institute for Diagnostic and Interventional Neuroradiology, Inselspital, Bern University Hospital, University of Bern, Bern, CH
| | - Roland Wiest
- Support Center for Advanced Neuroimaging (SCAN), Institute for Diagnostic and Interventional Neuroradiology, Inselspital, Bern University Hospital, University of Bern, Bern, CH
| | - Olivier Scheidegger
- Support Center for Advanced Neuroimaging (SCAN), Institute for Diagnostic and Interventional Neuroradiology, Inselspital, Bern University Hospital, University of Bern, Bern, CH.,Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern, CH
| |
Collapse
|
49
|
John Cremin C, Dash S, Huang X. Big Data: Historic Advances and Emerging Trends in Biomedical Research. CURRENT RESEARCH IN BIOTECHNOLOGY 2022. [DOI: 10.1016/j.crbiot.2022.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
|
50
|
Ahne A, Fagherazzi G, Tannier X, Czernichow T, Orchard F. Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study. J Med Internet Res 2022; 24:e27434. [PMID: 35040795 PMCID: PMC8808347 DOI: 10.2196/27434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 04/06/2021] [Accepted: 11/10/2021] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The amount of available textual health data such as scientific and biomedical literature is constantly growing and becoming more and more challenging for health professionals to properly summarize those data and practice evidence-based clinical decision making. Moreover, the exploration of unstructured health text data is challenging for professionals without computer science knowledge due to limited time, resources, and skills. Current tools to explore text data lack ease of use, require high computational efforts, and incorporate domain knowledge and focus on topics of interest with difficulty. OBJECTIVE We developed a methodology able to explore and target topics of interest via an interactive user interface for health professionals with limited computer science knowledge. We aim to reach near state-of-the-art performance while reducing memory consumption, increasing scalability, and minimizing user interaction effort to improve the clinical decision-making process. The performance was evaluated on diabetes-related abstracts from PubMed. METHODS The methodology consists of 4 parts: (1) a novel interpretable hierarchical clustering of documents where each node is defined by headwords (words that best represent the documents in the node), (2) an efficient classification system to target topics, (3) minimized user interaction effort through active learning, and (4) a visual user interface. We evaluated our approach on 50,911 diabetes-related abstracts providing a hierarchical Medical Subject Headings (MeSH) structure, a unique identifier for a topic. Hierarchical clustering performance was compared against the implementation in the machine learning library scikit-learn. On a subset of 2000 randomly chosen diabetes abstracts, our active learning strategy was compared against 3 other strategies: random selection of training instances, uncertainty sampling that chooses instances about which the model is most uncertain, and an expected gradient length strategy based on convolutional neural networks (CNNs). RESULTS For the hierarchical clustering performance, we achieved an F1 score of 0.73 compared to 0.76 achieved by scikit-learn. Concerning active learning performance, after 200 chosen training samples based on these strategies, the weighted F1 score of all MeSH codes resulted in a satisfying 0.62 F1 score using our approach, 0.61 using the uncertainty strategy, 0.63 using the CNN, and 0.45 using the random strategy. Moreover, our methodology showed a constant low memory use with increased number of documents. CONCLUSIONS We proposed an easy-to-use tool for health professionals with limited computer science knowledge who combine their domain knowledge with topic exploration and target specific topics of interest while improving transparency. Furthermore, our approach is memory efficient and highly parallelizable, making it interesting for large Big Data sets. This approach can be used by health professionals to gain deep insights into biomedical literature to ultimately improve the evidence-based clinical decision making process.
Collapse
Affiliation(s)
- Adrian Ahne
- Exposome and Heredity team, Center of Epidemiology and Population Health, Hospital Gustave Roussy, Inserm, Paris-Saclay University, Villejuif, France
- Epiconcept Company, Paris, France
| | - Guy Fagherazzi
- Deep Digital Phenotyping Research Unit, Department of Population Health, Luxembourg Institute of Health, Luxembourg, Luxembourg
| | - Xavier Tannier
- Laboratoire d'Informatique Medicale et d'Ingenierie des Connaissances pour la e-Sante, Limics, Inserm, University Sorbonne Paris Nord, Sorbonne University, Paris, France
| | | | | |
Collapse
|