1
|
Baum L, Johns M, Müller A, Abu Attieh H, Prasser F. HERALD: A domain-specific query language for longitudinal health data analytics. Int J Med Inform 2024; 192:105646. [PMID: 39393126 DOI: 10.1016/j.ijmedinf.2024.105646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2024] [Revised: 10/02/2024] [Accepted: 10/04/2024] [Indexed: 10/13/2024]
Abstract
BACKGROUND Large-scale health data has significant potential for research and innovation, especially with longitudinal data offering insights into prevention, disease progression, and treatment effects. Yet, analyzing this data type is complex, as data points are repeatedly documented along the timeline. As a consequence, extracting cross-sectional tabular data suitable for statistical analysis and machine learning can be challenging for medical researchers and data scientists alike, with existing tools lacking balance between ease-of-use and comprehensiveness. OBJECTIVE This paper introduces HERALD, a novel domain-specific query language designed to support the transformation of longitudinal health data into cross-sectional tables. We describe the basic concepts, the query syntax, a graphical user interface for constructing and executing HERALD queries, as well as an integration into Informatics for Integrating Biology and the Bedside (i2b2). METHODS The syntax of HERALD mimics natural language and supports different query types for selection, aggregation, analysis of relationships, and searching for data points based on filter expressions and temporal constraints. Using a hierarchical concept model, queries are executed individually for the data of each patient, while constructing tabular output. HERALD is closed, meaning that queries process data points and generate data points. Queries can refer to data points that have been produced by previous queries, providing a simple, but powerful nesting mechanism. RESULTS The open-source implementation consists of a HERALD query parser, an execution engine, as well as a web-based user interface for query construction and statistical analysis. The implementation can be deployed as a standalone component and integrated into self-service data analytics environments like i2b2 as a plugin. HERALD can be valuable tool for data scientists and machine learning experts, as it simplifies the process of transforming longitudinal health data into tables and data matrices. CONCLUSION The construction of cross-sectional tables from longitudinal data can be supported through dedicated query languages that strike a reasonable balance between language complexity and transformation capabilities.
Collapse
Affiliation(s)
- Lena Baum
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Center of Health Data Science, Berlin, Germany.
| | - Marco Johns
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Center of Health Data Science, Berlin, Germany
| | - Armin Müller
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Center of Health Data Science, Berlin, Germany
| | - Hammam Abu Attieh
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Center of Health Data Science, Berlin, Germany
| | - Fabian Prasser
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Center of Health Data Science, Berlin, Germany
| |
Collapse
|
2
|
Alkhatib R, Gaede KI. Data Management in Biobanking: Strategies, Challenges, and Future Directions. BIOTECH 2024; 13:34. [PMID: 39311336 PMCID: PMC11417763 DOI: 10.3390/biotech13030034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 08/23/2024] [Accepted: 08/31/2024] [Indexed: 09/26/2024] Open
Abstract
Biobanking plays a pivotal role in biomedical research by providing standardized processing, precise storing, and management of biological sample collections along with the associated data. Effective data management is a prerequisite to ensure the integrity, quality, and accessibility of these resources. This review provides a current landscape of data management in biobanking, discussing key challenges, existing strategies, and potential future directions. We explore multiple aspects of data management, including data collection, storage, curation, sharing, and ethical considerations. By examining the evolving technologies and methodologies in biobanking, we aim to provide insights into addressing the complexities and maximizing the utility of biobank data for research and clinical applications.
Collapse
Affiliation(s)
- Ramez Alkhatib
- Biomaterial Bank Nord, Research Center Borstel Leibniz Lung Center, Parkallee 35, 23845 Borstel, Germany;
- German Centre for Lung Research (DZL), Airway Research Centre North (ARCN), 22927 Großhansdorf, Germany
| | - Karoline I. Gaede
- Biomaterial Bank Nord, Research Center Borstel Leibniz Lung Center, Parkallee 35, 23845 Borstel, Germany;
- German Centre for Lung Research (DZL), Airway Research Centre North (ARCN), 22927 Großhansdorf, Germany
- PopGen 2.0 Biobanking Network (P2N), University Hospital Schleswig-Holstein, Campus Kiel, Kiel University, 24105 Kiel, Germany
| |
Collapse
|
3
|
Hallaj S, Chuter BG, Lieu AC, Singh P, Kalpathy-Cramer J, Xu BY, Christopher M, Zangwill LM, Weinreb RN, Baxter SL. Federated Learning in Glaucoma: A Comprehensive Review and Future Perspectives. Ophthalmol Glaucoma 2024:S2589-4196(24)00143-1. [PMID: 39214457 DOI: 10.1016/j.ogla.2024.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 08/20/2024] [Accepted: 08/23/2024] [Indexed: 09/04/2024]
Abstract
CLINICAL RELEVANCE Glaucoma is a complex eye condition with varied morphological and clinical presentations, making diagnosis and management challenging. The lack of a consensus definition for glaucoma or glaucomatous optic neuropathy further complicates the development of universal diagnostic tools. Developing robust artificial intelligence (AI) models for glaucoma screening is essential for early detection and treatment but faces significant obstacles. Effective deep learning algorithms require large, well-curated datasets from diverse patient populations and imaging protocols. However, creating centralized data repositories is hindered by concerns over data sharing, patient privacy, regulatory compliance, and intellectual property. Federated Learning (FL) offers a potential solution by enabling data to remain locally hosted while facilitating distributed model training across multiple sites. METHODS A comprehensive literature review was conducted on the application of Federated Learning in training AI models for glaucoma screening. Publications from 1950 to 2024 were searched using databases such as PubMed and IEEE Xplore with keywords including "glaucoma," "federated learning," "artificial intelligence," "deep learning," "machine learning," "distributed learning," "privacy-preserving," "data sharing," "medical imaging," and "ophthalmology." Articles were included if they discussed the use of FL in glaucoma-related AI tasks or addressed data sharing and privacy challenges in ophthalmic AI development. RESULTS FL enables collaborative model development without centralizing sensitive patient data, addressing privacy and regulatory concerns. Studies show that FL can improve model performance and generalizability by leveraging diverse datasets while maintaining data security. FL models have achieved comparable or superior accuracy to those trained on centralized data, demonstrating effectiveness in real-world clinical settings. CONCLUSIONS Federated Learning presents a promising strategy to overcome current obstacles in developing AI models for glaucoma screening. By balancing the need for extensive, diverse training data with the imperative to protect patient privacy and comply with regulations, FL facilitates collaborative model training without compromising data security. This approach offers a pathway toward more accurate and generalizable AI solutions for glaucoma detection and management. FINANCIAL DISCLOSURE(S) Proprietary or commercial disclosure may be found after the references in the Footnotes and Disclosures at the end of this article.
Collapse
Affiliation(s)
- Shahin Hallaj
- Division of Ophthalmology Informatics and Data Science, Hamilton Glaucoma Center, Shiley Eye Institute, Viterbi Family Department of Ophthalmology, University of California, San Diego, La Jolla, California; Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, California
| | - Benton G Chuter
- Division of Ophthalmology Informatics and Data Science, Hamilton Glaucoma Center, Shiley Eye Institute, Viterbi Family Department of Ophthalmology, University of California, San Diego, La Jolla, California; Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, California
| | - Alexander C Lieu
- Division of Ophthalmology Informatics and Data Science, Hamilton Glaucoma Center, Shiley Eye Institute, Viterbi Family Department of Ophthalmology, University of California, San Diego, La Jolla, California; Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, California
| | - Praveer Singh
- Division of Artificial Medical Intelligence, Department of Ophthalmology, University of Colorado School of Medicine, Aurora, Colorado
| | - Jayashree Kalpathy-Cramer
- Division of Artificial Medical Intelligence, Department of Ophthalmology, University of Colorado School of Medicine, Aurora, Colorado
| | - Benjamin Y Xu
- Roski Eye Institute, Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Mark Christopher
- Division of Ophthalmology Informatics and Data Science, Hamilton Glaucoma Center, Shiley Eye Institute, Viterbi Family Department of Ophthalmology, University of California, San Diego, La Jolla, California
| | - Linda M Zangwill
- Division of Ophthalmology Informatics and Data Science, Hamilton Glaucoma Center, Shiley Eye Institute, Viterbi Family Department of Ophthalmology, University of California, San Diego, La Jolla, California
| | - Robert N Weinreb
- Division of Ophthalmology Informatics and Data Science, Hamilton Glaucoma Center, Shiley Eye Institute, Viterbi Family Department of Ophthalmology, University of California, San Diego, La Jolla, California
| | - Sally L Baxter
- Division of Ophthalmology Informatics and Data Science, Hamilton Glaucoma Center, Shiley Eye Institute, Viterbi Family Department of Ophthalmology, University of California, San Diego, La Jolla, California; Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, California.
| |
Collapse
|
4
|
Giacobbe DR, Marelli C, Guastavino S, Mora S, Rosso N, Signori A, Campi C, Giacomini M, Bassetti M. Explainable and Interpretable Machine Learning for Antimicrobial Stewardship: Opportunities and Challenges. Clin Ther 2024; 46:474-480. [PMID: 38519371 DOI: 10.1016/j.clinthera.2024.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/23/2024] [Accepted: 02/27/2024] [Indexed: 03/24/2024]
Abstract
There is growing interest in exploiting the advances in artificial intelligence and machine learning (ML) for improving and monitoring antimicrobial prescriptions in line with antimicrobial stewardship principles. Against this background, the concepts of interpretability and explainability are becoming increasingly essential to understanding how ML algorithms could predict antimicrobial resistance or recommend specific therapeutic agents, to avoid unintended biases related to the "black box" nature of complex models. In this commentary, we review and discuss some relevant topics on the use of ML algorithms for antimicrobial stewardship interventions, highlighting opportunities and challenges, with particular attention paid to interpretability and explainability of employed models. As in other fields of medicine, the exponential growth of artificial intelligence and ML indicates the potential for improving the efficacy of antimicrobial stewardship interventions, at least in part by reducing time-consuming tasks for overwhelmed health care personnel. Improving our knowledge about how complex ML models work could help to achieve crucial advances in promoting the appropriate use of antimicrobials, as well as in preventing antimicrobial resistance selection and dissemination.
Collapse
Affiliation(s)
- Daniele Roberto Giacobbe
- Department of Health Sciences, University of Genoa, Genoa, Italy; UO Clinica Malattie Infettive, Istituto di Ricovero e Cura a Carattere Scientifico Ospedale Policlinico San Martino, Genoa, Italy.
| | - Cristina Marelli
- UO Clinica Malattie Infettive, Istituto di Ricovero e Cura a Carattere Scientifico Ospedale Policlinico San Martino, Genoa, Italy
| | | | - Sara Mora
- UO Information and Communication Technologies, Istituto di Ricovero e Cura a Carattere Scientifico Ospedale Policlinico San Martino, Genoa, Italy
| | - Nicola Rosso
- UO Information and Communication Technologies, Istituto di Ricovero e Cura a Carattere Scientifico Ospedale Policlinico San Martino, Genoa, Italy
| | - Alessio Signori
- Section of Biostatistics, Department of Health Sciences, University of Genoa, Genoa, Italy
| | - Cristina Campi
- Department of Mathematics, University of Genoa, Genoa, Italy; Life Science Computational Laboratory, Istituto di Ricovero e Cura a Carattere Scientifico Ospedale Policlinico San Martino, Genoa, Italy
| | - Mauro Giacomini
- Department of Informatics, Bioengineering, Robotics and System Engineering, University of Genoa, Genoa, Italy
| | - Matteo Bassetti
- Department of Health Sciences, University of Genoa, Genoa, Italy; UO Clinica Malattie Infettive, Istituto di Ricovero e Cura a Carattere Scientifico Ospedale Policlinico San Martino, Genoa, Italy
| |
Collapse
|
5
|
Wündisch E, Hufnagl P, Brunecker P, Meier Zu Ummeln S, Träger S, Kopp M, Prasser F, Weber J. Development of a Trusted Third Party at a Large University Hospital: Design and Implementation Study. JMIR Med Inform 2024; 12:e53075. [PMID: 38632712 DOI: 10.2196/53075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 02/15/2024] [Accepted: 02/17/2024] [Indexed: 04/19/2024] Open
Abstract
Background Pseudonymization has become a best practice to securely manage the identities of patients and study participants in medical research projects and data sharing initiatives. This method offers the advantage of not requiring the direct identification of data to support various research processes while still allowing for advanced processing activities, such as data linkage. Often, pseudonymization and related functionalities are bundled in specific technical and organization units known as trusted third parties (TTPs). However, pseudonymization can significantly increase the complexity of data management and research workflows, necessitating adequate tool support. Common tasks of TTPs include supporting the secure registration and pseudonymization of patient and sample identities as well as managing consent. Objective Despite the challenges involved, little has been published about successful architectures and functional tools for implementing TTPs in large university hospitals. The aim of this paper is to fill this research gap by describing the software architecture and tool set developed and deployed as part of a TTP established at Charité - Universitätsmedizin Berlin. Methods The infrastructure for the TTP was designed to provide a modular structure while keeping maintenance requirements low. Basic functionalities were realized with the free MOSAIC tools. However, supporting common study processes requires implementing workflows that span different basic services, such as patient registration, followed by pseudonym generation and concluded by consent collection. To achieve this, an integration layer was developed to provide a unified Representational state transfer (REST) application programming interface (API) as a basis for more complex workflows. Based on this API, a unified graphical user interface was also implemented, providing an integrated view of information objects and workflows supported by the TTP. The API was implemented using Java and Spring Boot, while the graphical user interface was implemented in PHP and Laravel. Both services use a shared Keycloak instance as a unified management system for roles and rights. Results By the end of 2022, the TTP has already supported more than 10 research projects since its launch in December 2019. Within these projects, more than 3000 identities were stored, more than 30,000 pseudonyms were generated, and more than 1500 consent forms were submitted. In total, more than 150 people regularly work with the software platform. By implementing the integration layer and the unified user interface, together with comprehensive roles and rights management, the effort for operating the TTP could be significantly reduced, as personnel of the supported research projects can use many functionalities independently. Conclusions With the architecture and components described, we created a user-friendly and compliant environment for supporting research projects. We believe that the insights into the design and implementation of our TTP can help other institutions to efficiently and effectively set up corresponding structures.
Collapse
Affiliation(s)
- Eric Wündisch
- Core Unit THS, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Peter Hufnagl
- Digital Pathology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Peter Brunecker
- Core Unit Research IT, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Sophie Meier Zu Ummeln
- Core Unit THS, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Sarah Träger
- Core Unit THS, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Marcus Kopp
- Core Unit THS, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Fabian Prasser
- Medical Informatics Group, Center of Health Data Science, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Joachim Weber
- Core Unit THS, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Center for Stroke Research Berlin, Charité - Universitätsmedizin Berlin, Berlin, Germany
- German Centre for Cardiovascular Research (DZHK), Berlin, Germany
| |
Collapse
|
6
|
Giacobbe DR, Zhang Y, de la Fuente J. Explainable artificial intelligence and machine learning: novel approaches to face infectious diseases challenges. Ann Med 2023; 55:2286336. [PMID: 38010090 PMCID: PMC10836268 DOI: 10.1080/07853890.2023.2286336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 11/16/2023] [Indexed: 11/29/2023] Open
Abstract
Artificial intelligence (AI) and machine learning (ML) are revolutionizing human activities in various fields, with medicine and infectious diseases being not exempt from their rapid and exponential growth. Furthermore, the field of explainable AI and ML has gained particular relevance and is attracting increasing interest. Infectious diseases have already started to benefit from explainable AI/ML models. For example, they have been employed or proposed to better understand complex models aimed at improving the diagnosis and management of coronavirus disease 2019, in the field of antimicrobial resistance prediction and in quantum vaccine algorithms. Although some issues concerning the dichotomy between explainability and interpretability still require careful attention, an in-depth understanding of how complex AI/ML models arrive at their predictions or recommendations is becoming increasingly essential to properly face the growing challenges of infectious diseases in the present century.
Collapse
Affiliation(s)
- Daniele Roberto Giacobbe
- Department of Health Sciences (DISSAL), University of Genoa, Genoa, Italy
- Clinica Malattie Infettive, IRCCS Ospedale Policlinico San Martino, Italy
| | - Yudong Zhang
- School of Computing and Mathematical Sciences, University of Leicester, Leicester, UK
- School of Computer Science and Engineering, Southeast University, Nanjing, China
| | - José de la Fuente
- SaBio (Health and Biotechnology), Instituto de Investigación en Recursos Cinegéticos IREC-CSIC-UCLM-JCCM, Ciudad Real, Spain
- Department of Veterinary Pathobiology, Center for Veterinary Health Sciences, Oklahoma State University, Stillwater, OK, USA
| |
Collapse
|
7
|
Zhang T, Hei R, Huang Y, Shao J, Zhang M, Feng K, Qian W, Li S, Jin F, Chen Y. Construction and experimental validation of a necroptosis-related lncRNA signature as a prognostic model and immune-landscape predictor for lung adenocarcinoma. Am J Cancer Res 2023; 13:4418-4433. [PMID: 37818057 PMCID: PMC10560937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 09/14/2023] [Indexed: 10/12/2023] Open
Abstract
Necroptosis is a new form of cell death. Since the discovery that long non-coding RNAs can affect the proliferation of lung adenocarcinoma, much has been learned about it, yet those of necroptosis-related long non-coding RNAs (NRlncRNAs) in lung adenocarcinoma (LUAD) remain enigmatic. This study aims to explore novel biomarkers and therapeutic targets for LUAD. The LUAD data was downloaded from The Cancer Genome Atlas, and necroptosis-related genes were retrieved from published literature. Co-expression analysis, univariate Cox analysis, least absolute shrinkage and selection operator regression analysis were used to identify necroptosis-related prognostic long non-coding RNAs. A comprehensive evaluation of tumor immunity for necrosis-related features was performed, and we identified a 9-NRlncRNA signature. Kaplan-Meier and Cox regression analyses confirmed that the signature was an independent predictor of LUAD outcome in the test and train sets (all P < 0.05). The areas of 1-, 2-, and 3-year overall survival under the time-dependent receiver operating characteristics (ROC) curve (AUC) were 0.754, 0.746, and 0.720, respectively. The GSEA results showed that 9 NRlncRNAs were associated with multiple malignancy-associated and immunoregulatory pathways. Based on this model, we found that the immune status and level of response to chemotherapy and targeted therapy were significantly different in the low-risk group compared with the high-risk group. qRT-PCR assay revealed that 9 NRlncRNAs were involved in the regulation of tumor cell proliferation and may affect the expression of programmed cell death 1 (PD1) and CD28 at human immune checkpoints. Our results indicated that the novel signature involving 9 NRlncRNAs (AL031600.2, LINC01281, AP001178.1, AL157823.2, LINC01290, MED4-AS1, AC026355.2, AL606489.1, FAM83A-AS1) can predict the prognosis of LUAD and are associated with the immune response. This will provide new insights into the pathogenesis and development of therapies for LUAD.
Collapse
Affiliation(s)
- Tongtong Zhang
- Department of Pulmonary Critical Care Medicine, The Second Affiliated Hospital of The Air Force Military Medical UniversityXinsi Road 569, Xi’an 710038, Shaanxi, PR China
| | - Ruoxuan Hei
- Department of Clinical Diagnose, The Second Affiliated Hospital of The Air Force Military Medical UniversityXinsi Road 569, Xi’an 710038, Shaanxi, PR China
| | - Yue Huang
- Department of Pulmonary Critical Care Medicine, The 1st Affiliated Hospital of Shenzhen UniversityShenzhen 518035, Guangdong, PR China
| | - Jingjin Shao
- Department of Pulmonary Critical Care Medicine, The 1st Affiliated Hospital of Shenzhen UniversityShenzhen 518035, Guangdong, PR China
| | - Min Zhang
- Department of Pulmonary Critical Care Medicine, The 1st Affiliated Hospital of Shenzhen UniversityShenzhen 518035, Guangdong, PR China
| | - Kai Feng
- Department of Pulmonary Critical Care Medicine, The Second Affiliated Hospital of The Air Force Military Medical UniversityXinsi Road 569, Xi’an 710038, Shaanxi, PR China
| | - Weishen Qian
- Department of Pulmonary Critical Care Medicine, The Second Affiliated Hospital of The Air Force Military Medical UniversityXinsi Road 569, Xi’an 710038, Shaanxi, PR China
| | - Simin Li
- Department of Clinical Diagnose, The Second Affiliated Hospital of The Air Force Military Medical UniversityXinsi Road 569, Xi’an 710038, Shaanxi, PR China
| | - Faguang Jin
- Department of Pulmonary Critical Care Medicine, The Second Affiliated Hospital of The Air Force Military Medical UniversityXinsi Road 569, Xi’an 710038, Shaanxi, PR China
| | - Yanwei Chen
- Department of Pulmonary Critical Care Medicine, The 1st Affiliated Hospital of Shenzhen UniversityShenzhen 518035, Guangdong, PR China
- Department of Pulmonary Critical Care Medicine, The Second Affiliated Hospital of The Air Force Military Medical UniversityXinsi Road 569, Xi’an 710038, Shaanxi, PR China
| |
Collapse
|
8
|
Pieters M, Kruger IM, Kruger HS, Breet Y, Moss SJ, van Oort A, Bester P, Ricci C. Strategies of Modelling Incident Outcomes Using Cox Regression to Estimate the Population Attributable Risk. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:6417. [PMID: 37510649 PMCID: PMC10379285 DOI: 10.3390/ijerph20146417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 07/07/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023]
Abstract
When the Cox model is applied, some recommendations about the choice of the time metric and the model's structure are often disregarded along with the proportionality of risk assumption. Moreover, most of the published studies fail to frame the real impact of a risk factor in the target population. Our aim was to show how modelling strategies affected Cox model assumptions. Furthermore, we showed how the Cox modelling strategies affected the population attributable risk (PAR). Our work is based on data collected in the North-West Province, one of the two PURE study centres in South Africa. The Cox model was used to estimate the hazard ratio (HR) of mortality for all causes in relation to smoking, alcohol use, physical inactivity, and hypertension. Firstly, we used a Cox model with time to event as the underlying time variable. Secondly, we used a Cox model with age to event as the underlying time variable. Finally, the second model was implemented with age classes and sex as strata variables. Mutually adjusted models were also investigated. A statistical test to the multiplicative interaction term the exposures and the log transformed time to event metric was used to assess the proportionality of risk assumption. The model's fitting was investigated by means of the Akaike Information Criteria (AIC). Models with age as the underlying time variable with age and sex as strata variables had enhanced validity of the risk proportionality assumption and better fitting. The PAR for a specific modifiable risk factor can be defined more accurately in mutually adjusted models allowing better public health decisions. This is not necessarily true when correlated modifiable risk factors are considered.
Collapse
Affiliation(s)
- Marlien Pieters
- Centre of Excellence for Nutrition, Faculty of Health Sciences, North-West University, Potchefstroom 2520, South Africa
- SAMRC Extramural Unit for Hypertension and Cardiovascular Disease, Faculty of Health Sciences, North-West University, Potchefstroom 2520, South Africa
| | - Iolanthe M Kruger
- Africa Unit for Transdisciplinary Health Research, Faculty of Health Sciences, North-West University, Potchefstroom 2520, South Africa
| | - Herculina S Kruger
- Centre of Excellence for Nutrition, Faculty of Health Sciences, North-West University, Potchefstroom 2520, South Africa
- SAMRC Extramural Unit for Hypertension and Cardiovascular Disease, Faculty of Health Sciences, North-West University, Potchefstroom 2520, South Africa
| | - Yolandi Breet
- Africa Unit for Transdisciplinary Health Research, Faculty of Health Sciences, North-West University, Potchefstroom 2520, South Africa
- Centre of Excellence for Hypertension in Africa Research Team, Faculty of Health Sciences, North-West University, Potchefstroom 2520, South Africa
| | - Sarah J Moss
- Physical Activity, Sport and Recreation Research Focus Area, Faculty of Health Sciences, North-West University, Potchefstroom 2520, South Africa
| | - Andries van Oort
- Physical Activity, Sport and Recreation Research Focus Area, Faculty of Health Sciences, North-West University, Potchefstroom 2520, South Africa
| | - Petra Bester
- Africa Unit for Transdisciplinary Health Research, Faculty of Health Sciences, North-West University, Potchefstroom 2520, South Africa
| | - Cristian Ricci
- Africa Unit for Transdisciplinary Health Research, Faculty of Health Sciences, North-West University, Potchefstroom 2520, South Africa
| |
Collapse
|
9
|
Gaudio HA, Padmanabhan V, Landis WP, Silva LEV, Slovis J, Starr J, Weeks MK, Widmann NJ, Forti RM, Laurent GH, Ranieri NR, Mi F, Degani RE, Hallowell T, Delso N, Calkins H, Dobrzynski C, Haddad S, Kao SH, Hwang M, Shi L, Baker WB, Tsui F, Morgan RW, Kilbaugh TJ, Ko TS. A Template for Translational Bioinformatics: Facilitating Multimodal Data Analyses in Preclinical Models of Neurological Injury. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.17.547582. [PMID: 37503137 PMCID: PMC10370067 DOI: 10.1101/2023.07.17.547582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Background Pediatric neurological injury and disease is a critical public health issue due to increasing rates of survival from primary injuries (e.g., cardiac arrest, traumatic brain injury) and a lack of monitoring technologies and therapeutics for the treatment of secondary neurological injury. Translational, preclinical research facilitates the development of solutions to address this growing issue but is hindered by a lack of available data frameworks and standards for the management, processing, and analysis of multimodal data sets. Methods Here, we present a generalizable data framework that was implemented for large animal research at the Children's Hospital of Philadelphia to address this technological gap. The presented framework culminates in an interactive dashboard for exploratory analysis and filtered data set download. Results Compared with existing clinical and preclinical data management solutions, the presented framework accommodates heterogeneous data types (single measure, repeated measures, time series, and imaging), integrates data sets across various experimental models, and facilitates dynamic visualization of integrated data sets. We present a use case of this framework for predictive model development for intra-arrest prediction of cardiopulmonary resuscitation outcome. Conclusions The described preclinical data framework may serve as a template to aid in data management efforts in other translational research labs that generate heterogeneous data sets and require a dynamic platform that can easily evolve alongside their research.
Collapse
|
10
|
Validating functional redundancy with mixed generative adversarial networks. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
11
|
Wang Y, He S, Wang Y. AI-Assisted Dynamic Modelling for Data Management in a Distributed System. INTERNATIONAL JOURNAL OF INFORMATION SYSTEMS AND SUPPLY CHAIN MANAGEMENT 2022. [DOI: 10.4018/ijisscm.313623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
There are many interdependent computers available in distributed networks. In such schemes, overall ownership costs comprise facilities, such as computers, controls, etc.; buying hardware; and running expenses such as wages, electrical charges, etc. Strom use is a large part of operating expenses. AI-assisted dynamic modelling for data management (AI-DM) framework is proposed. The high percentage of power use is connected explicitly to inadequate planning of energy. This research suggests creating a multi-objective method to plan the preparation of multi-criteria software solutions for distributed systems using the fuzzy TOPSIS tool as a comprehensive guide to multi-criteria management. The execution results demonstrate that this strategy could then sacrifice requirements by weight.
Collapse
Affiliation(s)
- Yingjun Wang
- Guangdong Provincial Communication Group Co., Ltd., China
| | | | - Yiran Wang
- Zhongwei Road and Bridge Equipment Jiangsu Co., Ltd., China
| |
Collapse
|
12
|
Kimble M, Allers S, Campbell K, Chen C, Jackson LM, King BL, Silverbrand S, York G, Beard K. medna-metadata: an open-source data management system for tracking environmental DNA samples and metadata. Bioinformatics 2022; 38:4589-4597. [PMID: 35960154 PMCID: PMC9524998 DOI: 10.1093/bioinformatics/btac556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 07/23/2022] [Accepted: 08/09/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Environmental DNA (eDNA), as a rapidly expanding research field, stands to benefit from shared resources including sampling protocols, study designs, discovered sequences, and taxonomic assignments to sequences. High-quality community shareable eDNA resources rely heavily on comprehensive metadata documentation that captures the complex workflows covering field sampling, molecular biology lab work, and bioinformatic analyses. There are limited sources that provide documentation of database development on comprehensive metadata for eDNA and these workflows and no open-source software. RESULTS We present medna-metadata, an open-source, modular system that aligns with Findable, Accessible, Interoperable, and Reusable guiding principles that support scholarly data reuse and the database and application development of a standardized metadata collection structure that encapsulates critical aspects of field data collection, wet lab processing, and bioinformatic analysis. Medna-metadata is showcased with metabarcoding data from the Gulf of Maine (Polinski et al., 2019). AVAILABILITY AND IMPLEMENTATION The source code of the medna-metadata web application is hosted on GitHub (https://github.com/Maine-eDNA/medna-metadata). Medna-metadata is a docker-compose installable package. Documentation can be found at https://medna-metadata.readthedocs.io/en/latest/?badge=latest. The application is implemented in Python, PostgreSQL and PostGIS, RabbitMQ, and NGINX, with all major browsers supported. A demo can be found at https://demo.metadata.maine-edna.org/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- M Kimble
- School of Computing and Information Science, University of Maine, Orono, ME 04469, USA
| | - S Allers
- Department of Molecular and Biomedical Sciences, University of Maine, Orono, ME 04469, USA
| | - K Campbell
- School of Computing and Information Science, University of Maine, Orono, ME 04469, USA
| | - C Chen
- School of Computing and Information Science, University of Maine, Orono, ME 04469, USA
| | - L M Jackson
- Advanced Research Computing, Security and Information Management, University of Maine, Orono, ME 04469, USA
- Maine EPSCoR, University of Maine, Orono, ME 04469, USA
| | - B L King
- Department of Molecular and Biomedical Sciences, University of Maine, Orono, ME 04469, USA
| | - S Silverbrand
- School of Marine Sciences, University of Maine, Orono, ME 04469, USA
| | - G York
- Environmental DNA Laboratory, Coordinated Operating Research Entities, University of Maine, Orono, ME 04469, USA
| | - K Beard
- School of Computing and Information Science, University of Maine, Orono, ME 04469, USA
| |
Collapse
|
13
|
Gurugubelli VS, Fang H, Shikany JM, Balkus SV, Rumbut J, Ngo H, Wang H, Allison JJ, Steffen LM. A review of harmonization methods for studying dietary patterns. SMART HEALTH (AMSTERDAM, NETHERLANDS) 2022; 23:100263. [PMID: 35252528 PMCID: PMC8896407 DOI: 10.1016/j.smhl.2021.100263] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]
Abstract
Data harmonization is the process by which each of the variables from different research studies are standardized to similar units resulting in comparable datasets. These data may be integrated for more powerful and accurate examination and prediction of outcomes for use in the intelligent and smart electronic health software programs and systems. Prospective harmonization is performed when researchers create guidelines for gathering and managing the data before data collection begins. In contrast, retrospective harmonization is performed by pooling previously collected data from various studies using expert domain knowledge to identify and translate variables. In nutritional epidemiology, dietary data harmonization is often necessary to construct the nutrient and food databases necessary to answer complex research questions and develop effective public health policy. In this paper, we review methods for effective data harmonization, including developing a harmonization plan, which common standards already exist for harmonization, and defining variables needed to harmonize datasets. Currently, several large-scale studies maintain harmonized nutrient databases, especially in Europe, and steps have been proposed to inform the retrospective harmonization process. As an example, data harmonization methods are applied to several U.S longitudinal diet datasets. Based on our review, considerations for future dietary data harmonization include user agreements for sharing private data among participating studies, defining variables and data dictionaries that accurately map variables among studies, and the use of secure data storage servers to maintain privacy. These considerations establish necessary components of harmonized data for smart health applications which can promote healthier eating and provide greater insights into the effect of dietary patterns on health.
Collapse
Affiliation(s)
| | - Hua Fang
- University of Massachusetts Dartmouth, 285 Old Westport Rd, North Dartmouth, 02747, Massachusetts, USA
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, 55 N Lake Ave, Worcester, 01655, Massachusetts, USA
- Corresponding author. Tel.: +0-508-910-6411;
| | - James M Shikany
- Division of Preventive Medicine, University of Alabama at Birmingham, 1720 University Blvd, Birmingham, 35294, Alabama, USA
| | - Salvador V Balkus
- University of Massachusetts Dartmouth, 285 Old Westport Rd, North Dartmouth, 02747, Massachusetts, USA
| | - Joshua Rumbut
- University of Massachusetts Dartmouth, 285 Old Westport Rd, North Dartmouth, 02747, Massachusetts, USA
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, 55 N Lake Ave, Worcester, 01655, Massachusetts, USA
| | - Hieu Ngo
- University of Massachusetts Dartmouth, 285 Old Westport Rd, North Dartmouth, 02747, Massachusetts, USA
| | - Honggang Wang
- University of Massachusetts Dartmouth, 285 Old Westport Rd, North Dartmouth, 02747, Massachusetts, USA
| | - Jeroan J Allison
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, 55 N Lake Ave, Worcester, 01655, Massachusetts, USA
| | - Lyn M. Steffen
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, 55455, Minnesota, USA
| |
Collapse
|
14
|
Valenzuela W, Balsiger F, Wiest R, Scheidegger O. Medical-Blocks: A Platform for Exploration, Management, Analysis, and Sharing of Data in Biomedical Research. JMIR Form Res 2022; 6:e32287. [PMID: 35232718 PMCID: PMC9039815 DOI: 10.2196/32287] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 02/04/2022] [Accepted: 02/28/2022] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Biomedical research requires healthcare institutions to provide sensitive clinical data to leverage data science and artificial intelligence technologies. However, providing healthcare data to researchers simple and secure, proves to be challenging for healthcare institutions. OBJECTIVE We describe and introduce Medical-Blocks, a platform for data exploration, data management, data analysis, and data sharing in biomedical research. METHODS The specification requirements for Medical-Blocks included: i) Connection to data sources of healthcare institutions with an interface for data exploration, ii) management of data in an internal file storage system, iii) data analysis through visualization and classification of data, and iv) data sharing via a file hosting service for collaboration. Medical-Blocks should be simple to use via a web-based user interface and extensible with new functionalities by a modular design via microservices ("blocks"). The scalability of the platform should be ensured by containerization. Security and legal regulations were considered during the development. RESULTS Medical-Blocks is a web application that runs in the cloud or as a local instance at a healthcare institution. Local instances of Medical-Blocks access data sources such as electronic health records and picture archiving and communications system (PACS) at healthcare institutions. Researchers and clinicians can explore, manage, and analyze the available data through Medical-Blocks. The data analysis involves classification of data for metadata extraction and the formation of cohorts. In collaborations, metadata (e.g., number of patients per cohort) and/or the data itself can be shared through Medical-Blocks locally or via a cloud instance to other researchers and clinicians. CONCLUSIONS Medical-Blocks facilitates biomedical research by providing a centralized platform to interact with medical data in collaborative research projects. The access to and management of medical data is simplified. Data can be swiftly analyzed to form cohorts for research and be shared among researchers. The modularity of Medical-Blocks makes the platform feasible for biomedical research where heterogenous medical data is needed. CLINICALTRIAL
Collapse
Affiliation(s)
- Waldo Valenzuela
- Institute for Diagnostic and Interventional Neuroradiology, Inselspital, Bern University Hospital, University of Bern, Freiburgstrasse 18, Bern, CH
| | - Fabian Balsiger
- Support Center for Advanced Neuroimaging (SCAN), Institute for Diagnostic and Interventional Neuroradiology, Inselspital, Bern University Hospital, University of Bern, Bern, CH
| | - Roland Wiest
- Support Center for Advanced Neuroimaging (SCAN), Institute for Diagnostic and Interventional Neuroradiology, Inselspital, Bern University Hospital, University of Bern, Bern, CH
| | - Olivier Scheidegger
- Support Center for Advanced Neuroimaging (SCAN), Institute for Diagnostic and Interventional Neuroradiology, Inselspital, Bern University Hospital, University of Bern, Bern, CH.,Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern, CH
| |
Collapse
|
15
|
John Cremin C, Dash S, Huang X. Big Data: Historic Advances and Emerging Trends in Biomedical Research. CURRENT RESEARCH IN BIOTECHNOLOGY 2022. [DOI: 10.1016/j.crbiot.2022.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
|
16
|
Abstract
With increasing digitization of healthcare, real-world data (RWD) are available in greater quantity and scope than ever before. Since the 2016 United States 21st Century Cures Act, innovations in the RWD life cycle have taken tremendous strides forward, largely driven by demand for regulatory-grade real-world evidence from the biopharmaceutical sector. However, use cases for RWD continue to grow in number, moving beyond drug development, to population health and direct clinical applications pertinent to payors, providers, and health systems. Effective RWD utilization requires disparate data sources to be turned into high-quality datasets. To harness the potential of RWD for emerging use cases, providers and organizations must accelerate life cycle improvements that support this process. We build on examples obtained from the academic literature and author experience of data curation practices across a diverse range of sectors to describe a standardized RWD life cycle containing key steps in production of useful data for analysis and insights. We delineate best practices that will add value to current data pipelines. Seven themes are highlighted that ensure sustainability and scalability for RWD life cycles: data standards adherence, tailored quality assurance, data entry incentivization, deploying natural language processing, data platform solutions, RWD governance, and ensuring equity and representation in data.
Collapse
|
17
|
Ahmed Z. Precision medicine with multi-omics strategies, deep phenotyping, and predictive analysis. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2022; 190:101-125. [DOI: 10.1016/bs.pmbts.2022.02.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
18
|
Data protection, data management, and data sharing: Stakeholder perspectives on the protection of personal health information in South Africa. PLoS One 2021; 16:e0260341. [PMID: 34928950 PMCID: PMC8687565 DOI: 10.1371/journal.pone.0260341] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 11/08/2021] [Indexed: 11/19/2022] Open
Abstract
The Protection of Personal Information Act (POPIA) 2013 came into force in South Africa on 1 July 2020. It seeks to strengthen the processing of personal information, including health information. While POPIA is to be welcomed, there are concerns about the impact it will have on the processing of health information. To ensure that the National Health Laboratory Service [NHLS] is compliant with these new strict processing requirements and that compliance does not negatively impact upon its current screening, treatment, surveillance and research mandate, it was decided to consider the development of a NHLS POPIA Code of Conduct for Personal Health. As part of the process of developing such a Code and better understand the challenges faced in the processing of personal health information in South Africa, 19 semi-structured interviews with stakeholders were conducted between June and September 2020. Overall, respondents welcomed the introduction of POPIA. However, they felt that there are tensions between the strengthening of data protection and the use of personal information for individual patient care, treatment programmes, and research. Respondents reported a need to rethink the management of personal health information in South Africa and identified 5 issues needing to be addressed at a national and an institutional level: an understanding of the importance of personal information; an understanding of POPIA and data protection; improve data quality; improve transparency in data use; and improve accountability in data use. The application of POPIA to the processing of personal health information is challenging, complex, and likely costly. However, personal health information must be appropriately managed to ensure the privacy of the data subject is protected, but equally that it is used as a resource in the individual's and wider public interest.
Collapse
|
19
|
The Role of Big Data in Aging and Older People’s Health Research: A Systematic Review and Ecological Framework. SUSTAINABILITY 2021. [DOI: 10.3390/su132111587] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Big data has been prominent in studying aging and older people’s health. It has promoted modeling and analyses in biological and geriatric research (like cellular senescence), developed health management platforms, and supported decision-making in public healthcare and social security. However, current studies are still limited within a single subject, rather than flourished as interdisciplinary research in the context of big data. The research perspectives have not changed, nor has big data brought itself out of the role as a modeling tool. When embedding big data as a data product, analysis tool, and resolution service into different spatial, temporal, and organizational scales of aging processes, it would present as a connection, integration, and interaction simultaneously in conducting interdisciplinary research. Therefore, this paper attempts to propose an ecological framework for big data based on aging and older people’s health research. Following the scoping process of PRISMA, 35 studies were reviewed to validate our ecological framework. Although restricted by issues like digital divides and privacy security, we encourage researchers to capture various elements and their interactions in the human-environment system from a macro and dynamic perspective rather than simply pursuing accuracy.
Collapse
|
20
|
Kouper I, Tucker KL, Tharp K, van Booven ME, Clark A. Active Curation of Large Longitudinal Surveys: A Case Study. JOURNAL OF ESCIENCE LIBRARIANSHIP 2021. [DOI: 10.7191/jeslib.2021.1210] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
In this paper we take an in-depth look at the curation of a large longitudinal survey and activities and procedures involved in moving the data from its generation to the state that is needed to conduct scientific analysis. Using a case study approach, we describe how large surveys generate a range of data assets that require many decisions well before the data is considered for analysis and publication. We use the notion of active curation to describe activities and decisions about the data objects that are “live,” i.e., when they are still being collected and processed for the later stages of the data lifecycle. Our efforts illustrate a gap in the existing discussions on curation. On one hand, there is an acknowledged need for active or upstream curation as an engagement of curators close to the point of data creation. On the other hand, the recommendations on how to do that are scattered across multiple domain-oriented data efforts.
In describing the complexities of active curation of survey data and providing general recommendations we aim to draw attention to the practices of active curation, stimulate the development of interoperable tools, standards, and techniques needed at the initial stages of research projects, and encourage collaborations between libraries and other academic units.
Collapse
|
21
|
Pavel A, del Giudice G, Federico A, Di Lieto A, Kinaret PAS, Serra A, Greco D. Integrated network analysis reveals new genes suggesting COVID-19 chronic effects and treatment. Brief Bioinform 2021; 22:1430-1441. [PMID: 33569598 PMCID: PMC7929418 DOI: 10.1093/bib/bbaa417] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 11/13/2020] [Accepted: 12/19/2020] [Indexed: 01/08/2023] Open
Abstract
The COVID-19 disease led to an unprecedented health emergency, still ongoing worldwide. Given the lack of a vaccine or a clear therapeutic strategy to counteract the infection as well as its secondary effects, there is currently a pressing need to generate new insights into the SARS-CoV-2 induced host response. Biomedical data can help to investigate new aspects of the COVID-19 pathogenesis, but source heterogeneity represents a major drawback and limitation. In this work, we applied data integration methods to develop a Unified Knowledge Space (UKS) and used it to identify a new set of genes associated with SARS-CoV-2 host response, both in vitro and in vivo. Functional analysis of these genes reveals possible long-term systemic effects of the infection, such as vascular remodelling and fibrosis. Finally, we identified a set of potentially relevant drugs targeting proteins involved in multiple steps of the host response to the virus.
Collapse
Affiliation(s)
- Alisa Pavel
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
| | - Giusy del Giudice
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
| | - Antonio Di Lieto
- Department of Forensic Psychiatry, Aarhus University, Aarhus, Denmark
| | - Pia A S Kinaret
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| |
Collapse
|
22
|
Defining the big social data paradigm through a systematic literature review approach. JOURNAL OF KNOWLEDGE MANAGEMENT 2021. [DOI: 10.1108/jkm-10-2020-0801] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Purpose
This study aims to investigate the Big Social Data (BSD) paradigm, which still lacks a clear and shared definition, and causes a lack of clarity and understanding about its beneficial opportunities for practitioners. In the knowledge management (KM) domain, a clear characterization of the BSD paradigm can lead to more effective and efficient KM strategies, processes and systems that leverage a huge amount of structured and unstructured data sources.
Design/methodology/approach
The study adopts a systematic literature review (SLR) methodology based on a mixed analysis approach (unsupervised machine learning and human-based) applied to 199 research articles on BSD topics extracted from Scopus and Web of Science. In particular, machine learning processing has been implemented by using topic extraction and hierarchical clustering techniques.
Findings
The paper provides a threefold contribution: a conceptualization and a consensual definition of the BSD paradigm through the identification of four key conceptual pillars (i.e. sources, properties, technology and value exploitation); a characterization of the taxonomy of BSD data type that extends previous works on this topic; a research agenda for future research studies on BSD and its applications along with a KM perspective.
Research limitations/implications
The main limits of the research rely on the list of articles considered for the literature review that could be enlarged by considering further sources (in addition to Scopus and Web of Science) and/or further languages (in addition to English) and/or further years (the review considers papers published until 2018). Research implications concern the development of a research agenda organized along with five thematic issues, which can feed future research to deepen the paradigm of BSD and explore linkages with the KM field.
Practical implications
Practical implications concern the usage of the proposed definition of BSD to purposefully design applications and services based on BSD in knowledge-intensive domains to generate value for citizens, individuals, companies and territories.
Originality/value
The original contribution concerns the definition of the big data social paradigm built through an SLR the combines machine learning processing and human-based processing. Moreover, the research agenda deriving from the study contributes to investigate the BSD paradigm in the wider domain of KM.
Collapse
|
23
|
|
24
|
Khan IH, Javaid M. Big Data Applications in Medical Field: A Literature Review. JOURNAL OF INDUSTRIAL INTEGRATION AND MANAGEMENT 2020. [DOI: 10.1142/s242486222030001x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Digital imaging and medical reporting have acquired an essential role in healthcare, but the main challenge is the storage of a high volume of patient data. Although newer technologies are already introduced in the medical sciences to save records size, Big Data provides advancements by storing a large amount of data to improve the efficiency and quality of patient treatment with better care. It provides intelligent automation capabilities to reduce errors than manual inputs. Large numbers of research papers on big data in the medical field are studied and analyzed for their impacts, benefits, and applications. Big data has great potential to support the digitalization of all medical and clinical records and then save the entire data regarding the medical history of an individual or a group. This paper discusses big data usage for various industries and sectors. Finally, 12 significant applications for the medical field by the implementation of big data are identified and studied with a brief description. This technology can be gainfully used to extract useful information from the available data by analyzing and managing them through a combination of hardware and software. With technological advancement, big data provides health-related information for millions of patient-related to life issues such as lab tests reporting, clinical narratives, demographics, prescription, medical diagnosis, and related documentation. Thus, Big Data is essential in developing a better yet efficient analysis and storage healthcare services. The demand for big data applications is increasing due to its capability of handling and analyzing massive data. Not only in the future but even now, Big Data is proving itself as an axiom of storing, developing, analyzing, and providing overall health information to the physicians.
Collapse
Affiliation(s)
- Ibrahim Haleem Khan
- School of Engineering Sciences and Technology, Jamia Hamdard, New Delhi, India
| | - Mohd Javaid
- Department of Mechanical Engineering, Jamia Millia Islamia, New Delhi, India
| |
Collapse
|
25
|
Jahangiri L, Akiva G, Lakhia S, Turkyilmaz I. Understanding the complexities of digital dentistry integration in high-volume dental institutions. Br Dent J 2020; 229:166-168. [PMID: 32811935 DOI: 10.1038/s41415-020-1928-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The purpose of this article is to detail the primary challenges faced by large dental institutions as they incorporate digital dentistry into their mainstream workflow. Integration of digital technology is easier in private practices with smaller patient volumes and fewer trained staff required. Additionally, in private practices, scanning, designing and milling frequently occur in a single location, which does not require an external digital data transfer. However, large dental institutions must overcome several barriers which are uniquely generated by their large-scale operation. Numerous individuals must be comprehensively and efficiently trained to operate the advanced technologies. The digital software must seamlessly integrate with existing software and an internal infrastructure must be established capable of handling massive data inputs. High-volume production in large dental institutions requires the involvement of external laboratories to meet demand. This outsourcing presents a new challenge of safe digital data transfer in accordance with patient privacy and protection regulations set forth by governing agencies. It is vital for large dental institutions to recognise the unique challenges thrust upon them as they attempt to incorporate a digital workflow. With proper forethought and planning an appropriate infrastructure may be established allowing for a smooth and safe transition to the digital era.
Collapse
Affiliation(s)
- Leila Jahangiri
- Clinical Professor and Chair, New York University College of Dentistry, Department of Prosthodontics, New York, USA
| | - Guy Akiva
- Director, Information Technology Infrastructure and Systems Support, New York University College of Dentistry, New York, USA
| | - Samantha Lakhia
- Third-year dental student, New York University College of Dentistry, New York, USA
| | - Ilser Turkyilmaz
- Clinical Associate Professor, New York University College of Dentistry, Department of Prosthodontics, New York, USA.
| |
Collapse
|
26
|
Borda A, Gray K, Fu Y. Research data management in health and biomedical citizen science: practices and prospects. JAMIA Open 2020; 3:113-125. [PMID: 32607493 PMCID: PMC7309241 DOI: 10.1093/jamiaopen/ooz052] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 07/09/2019] [Accepted: 09/30/2019] [Indexed: 12/25/2022] Open
Abstract
Background Public engagement in health and biomedical research is being influenced by the paradigm of citizen science. However, conventional health and biomedical research relies on sophisticated research data management tools and methods. Considering these, what contribution can citizen science make in this field of research? How can it follow research protocols and produce reliable results? Objective The aim of this article is to analyze research data management practices in existing biomedical citizen science studies, so as to provide insights for members of the public and of the research community considering this approach to research. Methods A scoping review was conducted on this topic to determine data management characteristics of health and bio medical citizen science research. From this review and related web searching, we chose five online platforms and a specific research project associated with each, to understand their research data management approaches and enablers. Results Health and biomedical citizen science platforms and projects are diverse in terms of types of work with data and data management activities that in themselves may have scientific merit. However, consistent approaches in the use of research data management models or practices seem lacking, or at least are not prevalent in the review. Conclusions There is potential for important data collection and analysis activities to be opaque or irreproducible in health and biomedical citizen science initiatives without the implementation of a research data management model that is transparent and accessible to team members and to external audiences. This situation might be improved with participatory development of standards that can be applied to diverse projects and platforms, across the research data life cycle.
Collapse
Affiliation(s)
- Ann Borda
- Health and Biomedical Informatics Centre, Melbourne Medical School, The University of Melbourne, Melbourne, Australia
| | - Kathleen Gray
- Health and Biomedical Informatics Centre, Melbourne Medical School, The University of Melbourne, Melbourne, Australia
| | - Yuqing Fu
- Health and Biomedical Informatics Centre, Melbourne Medical School, The University of Melbourne, Melbourne, Australia
| |
Collapse
|
27
|
Egli A, Schrenzel J, Greub G. Digital microbiology. Clin Microbiol Infect 2020; 26:1324-1331. [PMID: 32603804 PMCID: PMC7320868 DOI: 10.1016/j.cmi.2020.06.023] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Revised: 06/15/2020] [Accepted: 06/20/2020] [Indexed: 12/12/2022]
Abstract
BACKGROUND Digitalization and artificial intelligence have an important impact on the way microbiology laboratories will work in the near future. Opportunities and challenges lie ahead to digitalize the microbiological workflows. Making efficient use of big data, machine learning, and artificial intelligence in clinical microbiology requires a profound understanding of data handling aspects. OBJECTIVE This review article summarizes the most important concepts of digital microbiology. The article gives microbiologists, clinicians and data scientists a viewpoint and practical examples along the diagnostic process. SOURCES We used peer-reviewed literature identified by a PubMed search for digitalization, machine learning, artificial intelligence and microbiology. CONTENT We describe the opportunities and challenges of digitalization in microbiological diagnostic processes with various examples. We also provide in this context key aspects of data structure and interoperability, as well as legal aspects. Finally, we outline the way for applications in a modern microbiology laboratory. IMPLICATIONS We predict that digitalization and the usage of machine learning will have a profound impact on the daily routine of laboratory staff. Along the analytical process, the most important steps should be identified, where digital technologies can be applied and provide a benefit. The education of all staff involved should be adapted to prepare for the advances in digital microbiology.
Collapse
Affiliation(s)
- A Egli
- Clinical Bacteriology and Mycology, University Hospital Basel, Basel, Switzerland; Applied Microbiology Research, Department of Biomedicine, University of Basel, Basel, Switzerland.
| | - J Schrenzel
- Laboratory of Bacteriology, University Hospitals of Geneva, Geneva, Switzerland
| | - G Greub
- Institute of Medical Microbiology, University Hospital Lausanne, Lausanne, Switzerland
| |
Collapse
|
28
|
Mallappallil M, Sabu J, Gruessner A, Salifu M. A review of big data and medical research. SAGE Open Med 2020; 8:2050312120934839. [PMID: 32637104 PMCID: PMC7323266 DOI: 10.1177/2050312120934839] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Accepted: 05/21/2020] [Indexed: 12/11/2022] Open
Abstract
Universally, the volume of data has increased, with the collection rate doubling every 40 months, since the 1980s. "Big data" is a term that was introduced in the 1990s to include data sets too large to be used with common software. Medicine is a major field predicted to increase the use of big data in 2025. Big data in medicine may be used by commercial, academic, government, and public sectors. It includes biologic, biometric, and electronic health data. Examples of biologic data include biobanks; biometric data may have individual wellness data from devices; electronic health data include the medical record; and other data demographics and images. Big data has also contributed to the changes in the research methodology. Changes in the clinical research paradigm has been fueled by large-scale biological data harvesting (biobanks), which is developed, analyzed, and managed by cheaper computing technology (big data), supported by greater flexibility in study design (real-world data) and the relationships between industry, government regulators, and academics. Cultural changes along with easy access to information via the Internet facilitate ease of participation by more people. Current needs demand quick answers which may be supplied by big data, biobanks, and changes in flexibility in study design. Big data can reveal health patterns, and promises to provide solutions that have previously been out of society's grasp; however, the murkiness of international laws, questions of data ownership, public ignorance, and privacy and security concerns are slowing down the progress that could otherwise be achieved by the use of big data. The goal of this descriptive review is to create awareness of the ramifications for big data and to encourage readers that this trend is positive and will likely lead to better clinical solutions, but, caution must be exercised to reduce harm.
Collapse
Affiliation(s)
| | - Jacob Sabu
- State University of New York at Downstate, Brooklyn, NY, USA
| | | | - Moro Salifu
- State University of New York at Downstate, Brooklyn, NY, USA
| |
Collapse
|
29
|
Ercole A, Brinck V, George P, Hicks R, Huijben J, Jarrett M, Vassar M, Wilson L. Guidelines for Data Acquisition, Quality and Curation for Observational Research Designs (DAQCORD). J Clin Transl Sci 2020; 4:354-359. [PMID: 33244417 PMCID: PMC7681114 DOI: 10.1017/cts.2020.24] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 03/05/2020] [Accepted: 03/09/2020] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND High-quality data are critical to the entire scientific enterprise, yet the complexity and effort involved in data curation are vastly under-appreciated. This is especially true for large observational, clinical studies because of the amount of multimodal data that is captured and the opportunity for addressing numerous research questions through analysis, either alone or in combination with other data sets. However, a lack of details concerning data curation methods can result in unresolved questions about the robustness of the data, its utility for addressing specific research questions or hypotheses and how to interpret the results. We aimed to develop a framework for the design, documentation and reporting of data curation methods in order to advance the scientific rigour, reproducibility and analysis of the data. METHODS Forty-six experts participated in a modified Delphi process to reach consensus on indicators of data curation that could be used in the design and reporting of studies. RESULTS We identified 46 indicators that are applicable to the design, training/testing, run time and post-collection phases of studies. CONCLUSION The Data Acquisition, Quality and Curation for Observational Research Designs (DAQCORD) Guidelines are the first comprehensive set of data quality indicators for large observational studies. They were developed around the needs of neuroscience projects, but we believe they are relevant and generalisable, in whole or in part, to other fields of health research, and also to smaller observational studies and preclinical research. The DAQCORD Guidelines provide a framework for achieving high-quality data; a cornerstone of health research.
Collapse
Affiliation(s)
- Ari Ercole
- Department of Medicine, Division of Anaesthesia, University of Cambridge, Cambridge, UK
| | | | - Pradeep George
- International Neuroinformatics Coordinating Facility, Karolinska Institutet, Stockholm, Sweden
| | | | - Jilske Huijben
- Department of Public Health, Center for Medical Decision Sciences, Erasmus MC, Rotterdam, The Netherlands
| | | | - Mary Vassar
- Department of Neurological Surgery, University of California, San Francisco, CA, USA
| | - Lindsay Wilson
- Division of Psychology, University of Stirling, Stirling, UK
| | | |
Collapse
|
30
|
Ahmed Z, Mohamed K, Zeeshan S, Dong X. Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine. Database (Oxford) 2020; 2020:baaa010. [PMID: 32185396 PMCID: PMC7078068 DOI: 10.1093/database/baaa010] [Citation(s) in RCA: 167] [Impact Index Per Article: 41.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Revised: 01/05/2020] [Accepted: 01/21/2020] [Indexed: 02/06/2023]
Abstract
Precision medicine is one of the recent and powerful developments in medical care, which has the potential to improve the traditional symptom-driven practice of medicine, allowing earlier interventions using advanced diagnostics and tailoring better and economically personalized treatments. Identifying the best pathway to personalized and population medicine involves the ability to analyze comprehensive patient information together with broader aspects to monitor and distinguish between sick and relatively healthy people, which will lead to a better understanding of biological indicators that can signal shifts in health. While the complexities of disease at the individual level have made it difficult to utilize healthcare information in clinical decision-making, some of the existing constraints have been greatly minimized by technological advancements. To implement effective precision medicine with enhanced ability to positively impact patient outcomes and provide real-time decision support, it is important to harness the power of electronic health records by integrating disparate data sources and discovering patient-specific patterns of disease progression. Useful analytic tools, technologies, databases, and approaches are required to augment networking and interoperability of clinical, laboratory and public health systems, as well as addressing ethical and social issues related to the privacy and protection of healthcare data with effective balance. Developing multifunctional machine learning platforms for clinical data extraction, aggregation, management and analysis can support clinicians by efficiently stratifying subjects to understand specific scenarios and optimize decision-making. Implementation of artificial intelligence in healthcare is a compelling vision that has the potential in leading to the significant improvements for achieving the goals of providing real-time, better personalized and population medicine at lower costs. In this study, we focused on analyzing and discussing various published artificial intelligence and machine learning solutions, approaches and perspectives, aiming to advance academic solutions in paving the way for a new data-centric era of discovery in healthcare.
Collapse
Affiliation(s)
- Zeeshan Ahmed
- Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New Jersey, 112 Paterson Street, New Brunswick, NJ, USA
- Department of Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers Biomedical and Health Sciences, 125 Paterson Street, New Brunswick, NJ, USA
- Department of Genetics and Genome Sciences, School of Medicine, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, 67 North Eagleville Road, Storrs, CT, USA
| | - Khalid Mohamed
- Department of Genetics and Genome Sciences, School of Medicine, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT, USA
| | - Saman Zeeshan
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - XinQi Dong
- Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New Jersey, 112 Paterson Street, New Brunswick, NJ, USA
- Department of Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers Biomedical and Health Sciences, 125 Paterson Street, New Brunswick, NJ, USA
| |
Collapse
|
31
|
Li F, Wang Y, Li C, Marquez-Lago TT, Leier A, Rawlings ND, Haffari G, Revote J, Akutsu T, Chou KC, Purcell AW, Pike RN, Webb GI, Ian Smith A, Lithgow T, Daly RJ, Whisstock JC, Song J. Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods. Brief Bioinform 2019; 20:2150-2166. [PMID: 30184176 PMCID: PMC6954447 DOI: 10.1093/bib/bby077] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 07/26/2018] [Accepted: 08/01/2018] [Indexed: 01/06/2023] Open
Abstract
The roles of proteolytic cleavage have been intensively investigated and discussed during the past two decades. This irreversible chemical process has been frequently reported to influence a number of crucial biological processes (BPs), such as cell cycle, protein regulation and inflammation. A number of advanced studies have been published aiming at deciphering the mechanisms of proteolytic cleavage. Given its significance and the large number of functionally enriched substrates targeted by specific proteases, many computational approaches have been established for accurate prediction of protease-specific substrates and their cleavage sites. Consequently, there is an urgent need to systematically assess the state-of-the-art computational approaches for protease-specific cleavage site prediction to further advance the existing methodologies and to improve the prediction performance. With this goal in mind, in this article, we carefully evaluated a total of 19 computational methods (including 8 scoring function-based methods and 11 machine learning-based methods) in terms of their underlying algorithm, calculated features, performance evaluation and software usability. Then, extensive independent tests were performed to assess the robustness and scalability of the reviewed methods using our carefully prepared independent test data sets with 3641 cleavage sites (specific to 10 proteases). The comparative experimental results demonstrate that PROSPERous is the most accurate generic method for predicting eight protease-specific cleavage sites, while GPS-CCD and LabCaS outperformed other predictors for calpain-specific cleavage sites. Based on our review, we then outlined some potential ways to improve the prediction performance and ease the computational burden by applying ensemble learning, deep learning, positive unlabeled learning and parallel and distributed computing techniques. We anticipate that our study will serve as a practical and useful guide for interested readers to further advance next-generation bioinformatics tools for protease-specific cleavage site prediction.
Collapse
Affiliation(s)
- Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yanan Wang
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Department of Biology, Institute of Molecular Systems Biology,ETH Zürich, Zürich 8093, Switzerland
| | - Tatiana T Marquez-Lago
- Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics and Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Neil D Rawlings
- EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Wellcome Trust Genome Campus,Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gholamreza Haffari
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Jerico Revote
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Anthony W Purcell
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Robert N Pike
- La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC 3086, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - A Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Trevor Lithgow
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, Victoria 3800, Australia
| | - Roger J Daly
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - James C Whisstock
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
32
|
McKenzie KA, Hunt SL, Hulshof G, Mudaranthakam DP, Meyer K, Vidoni ED, Burns JM, Mahnken JD. A semi-automated pipeline for fulfillment of resource requests from a longitudinal Alzheimer's disease registry. JAMIA Open 2019; 2:516-520. [PMID: 32025648 PMCID: PMC6993996 DOI: 10.1093/jamiaopen/ooz032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 06/21/2019] [Accepted: 07/22/2019] [Indexed: 12/22/2022] Open
Abstract
Objective Managing registries with continual data collection poses challenges, such as following reproducible research protocols and guaranteeing data accessibility. The University of Kansas (KU) Alzheimer’s Disease Center (ADC) maintains one such registry: Curated Clinical Cohort Phenotypes and Observations (C3PO). We created an automated and reproducible process by which investigators have access to C3PO data. Materials and Methods Data was input into Research Electronic Data Capture. Monthly, data part of the Uniform Data Set (UDS), that is data also collected at other ADCs, was uploaded to the National Alzheimer’s Coordinating Center (NACC). Quarterly, NACC cleaned, curated, and returned the UDS to the KU Data Management and Statistics (DMS) Core, where it was stored in C3PO with other quarterly curated site-specific data. Investigators seeking to utilize C3PO submitted a research proposal and requested variables via the publicly accessible and searchable data dictionary. The DMS Core used this variable list and an automated SAS program to create a subset of C3PO. Results C3PO contained 1913 variables stored in 15 datasets. From 2017 to 2018, 38 data requests were completed for several KU departments and other research institutions. Completing data requests became more efficient; C3PO subsets were produced in under 10 seconds. Discussion The data management strategy outlined above facilitated reproducible research practices, which is fundamental to the future of research as it allows replication and verification to occur. Conclusion We created a transparent, automated, and efficient process of extracting subsets of data from a registry where data was changing daily.
Collapse
Affiliation(s)
- Katelyn A McKenzie
- Department of Biostatistics, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Suzanne L Hunt
- Department of Biostatistics, University of Kansas Medical Center, Kansas City, Kansas, USA.,University of Kansas Alzheimer's Disease Center, Fairway, Kansas, USA
| | - Genevieve Hulshof
- Department of Biostatistics, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Dinesh Pal Mudaranthakam
- Department of Biostatistics, University of Kansas Medical Center, Kansas City, Kansas, USA.,University of Kansas Alzheimer's Disease Center, Fairway, Kansas, USA
| | - Kayla Meyer
- University of Kansas Alzheimer's Disease Center, Fairway, Kansas, USA
| | - Eric D Vidoni
- University of Kansas Alzheimer's Disease Center, Fairway, Kansas, USA.,Department of Neurology, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Jeffrey M Burns
- University of Kansas Alzheimer's Disease Center, Fairway, Kansas, USA.,Department of Neurology, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Jonathan D Mahnken
- Department of Biostatistics, University of Kansas Medical Center, Kansas City, Kansas, USA.,University of Kansas Alzheimer's Disease Center, Fairway, Kansas, USA
| |
Collapse
|
33
|
Cai W, Lesnik KL, Wade MJ, Heidrich ES, Wang Y, Liu H. Incorporating microbial community data with machine learning techniques to predict feed substrates in microbial fuel cells. Biosens Bioelectron 2019; 133:64-71. [PMID: 30909014 DOI: 10.1016/j.bios.2019.03.021] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 03/11/2019] [Accepted: 03/12/2019] [Indexed: 12/21/2022]
Abstract
The complicated interactions that occur in mixed-species biotechnologies, including biosensors, hinder chemical detection specificity. This lack of specificity limits applications in which biosensors may be deployed, such as those where an unknown feed substrate must be determined. The application of genomic data and well-developed data mining technologies can overcome these limitations and advance engineering development. In the present study, 69 samples with three different substrate types (acetate, carbohydrates and wastewater) collected from various laboratory environments were evaluated to determine the ability to identify feed substrates from the resultant microbial communities. Six machine learning algorithms with four different input variables were trained and evaluated on their ability to predict feed substrate from genomic datasets. The highest accuracies of 93 ± 6% and 92 ± 5% were obtained using NNET trained on datasets classified at the phylum and family taxonomic level, respectively. These accuracies corresponded to kappa values of 0.87 ± 0.10, 0.86 ± 0.09, respectively. Four out of six of the algorithms used maintained accuracies above 80% and kappa values higher than 0.66. Different sequencing method (Roche 454 or Illumina sequencing) did not affect the accuracies of all algorithms, except SVM at the phylum level. All algorithms trained on NMDS-compressed datasets obtained accuracies over 80%, while models trained on PCoA-compressed datasets presented a 10-30% reduction in accuracy. These results suggest that incorporating microbial community data with machine learning algorithms can be used for the prediction of feed substrate and for the potential improvement of MFC-based biosensor signal specificity, providing a new use of machine learning techniques that has substantial practical applications in biotechnological fields.
Collapse
Affiliation(s)
- Wenfang Cai
- Department of Environmental Science and Engineering, Xi'an Jiaotong University, Xi'an 710049, China; Department of Biological and Ecological Engineering, Oregon State University, Corvallis OR 97331, USA
| | - Keaton Larson Lesnik
- Department of Biological and Ecological Engineering, Oregon State University, Corvallis OR 97331, USA
| | - Matthew J Wade
- School of Engineering, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; Department of Mathematics & Statistics, McMaster University, Hamilton, Canada L8S 4K1
| | | | - Yunhai Wang
- Department of Environmental Science and Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
| | - Hong Liu
- Department of Biological and Ecological Engineering, Oregon State University, Corvallis OR 97331, USA.
| |
Collapse
|
34
|
Read KB. Adapting data management education to support clinical research projects in an academic medical center. J Med Libr Assoc 2019; 107:89-97. [PMID: 30598653 PMCID: PMC6300223 DOI: 10.5195/jmla.2019.580] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Accepted: 09/01/2018] [Indexed: 02/07/2023] Open
Abstract
Background Librarians and researchers alike have long identified research data management (RDM) training as a need in biomedical research. Despite the wealth of libraries offering RDM education to their communities, clinical research is an area that has not been targeted. Clinical RDM (CRDM) is seen by its community as an essential part of the research process where established guidelines exist, yet educational initiatives in this area are unknown. Case Presentation Leveraging my academic library's experience supporting CRDM through informationist grants and REDCap training in our medical center, I developed a 1.5 hour CRDM workshop. This workshop was designed to use established CRDM guidelines in clinical research and address common questions asked by our community through the library's existing data support program. The workshop was offered to the entire medical center 4 times between November 2017 and July 2018. This case study describes the development, implementation, and evaluation of this workshop. Conclusions The 4 workshops were well attended and well received by the medical center community, with 99% stating that they would recommend the class to others and 98% stating that they would use what they learned in their work. Attendees also articulated how they would implement the main competencies they learned from the workshop into their work. For the library, the effort to support CRDM has led to the coordination of a larger institutional collaborative training series to educate researchers on best practices with data, as well as the formation of institution-wide policy groups to address researcher challenges with CRDM, data transfer, and data sharing.
Collapse
Affiliation(s)
- Kevin B Read
- Data Services Librarian and Data Discovery Lead, NYU Health Sciences Library, New York University School of Medicine, 577 First Avenue, New York, NY 10016,
| |
Collapse
|
35
|
Ahmed Z, Kim M, Liang BT. MAV-clic: management, analysis, and visualization of clinical data. JAMIA Open 2018; 2:23-28. [PMID: 31984341 PMCID: PMC6951942 DOI: 10.1093/jamiaopen/ooy052] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Revised: 07/18/2018] [Accepted: 11/22/2018] [Indexed: 11/12/2022] Open
Abstract
Objectives Develop a multifunctional analytics platform for efficient management and analysis of healthcare data. Materials and Methods Management, Analysis, and Visualization of Clinical Data (MAV-clic) is a Health Insurance Portability and Accountability Act of 1996 (HIPAA)-compliant framework based on the Butterfly Model. MAV-clic extracts, cleanses, and encrypts data then restructures and aggregates data in a deidentified format. A graphical user interface allows query, analysis, and visualization of clinical data. Results MAV-clic manages healthcare data for over 800 000 subjects at UConn Health. Three analytic capabilities of MAV-clic include: creating cohorts based on specific criteria; performing measurement analysis of subjects with a specific diagnosis and medication; and calculating measure outcomes of subjects over time. Discussion MAV-clic supports clinicians and healthcare analysts by efficiently stratifying subjects to understand specific scenarios and optimize decision making. Conclusion MAV-clic is founded on the scientific premise that to improve the quality and transition of healthcare, integrative platforms are necessary to analyze heterogeneous clinical, epidemiological, metabolomics, proteomics, and genomics data for precision medicine.
Collapse
Affiliation(s)
- Zeeshan Ahmed
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, School of Medicine, University of Connecticut Health Center, Farmington, Connecticut, USA
| | - Minjung Kim
- The Pat and Jim Calhoun Cardiology Center, School of Medicine, University of Connecticut Health Center, Farmington, Connecticut, USA
| | - Bruce T Liang
- Ray Neag Distinguished Professor of Cardiovascular Biology and Medicine, Director Pat and Jim Calhoun Cardiology Center, Dean UConn School of Medicine, University of Connecticut Health Center, Farmington, Connecticut, USA
| |
Collapse
|