1
|
Yoon H, Schwedt TJ, Chong CD, Olatunde O, Wu T. Harmonizing Healthy Cohorts to Support Multicenter Studies on Migraine Classification using Brain MRI Data. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.26.23291909. [PMID: 37425905 PMCID: PMC10327280 DOI: 10.1101/2023.06.26.23291909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Multicenter and multi-scanner imaging studies might be needed to provide sample sizes large enough for developing accurate predictive models. However, multicenter studies, which likely include confounding factors due to subtle differences in research participant characteristics, MRI scanners, and imaging acquisition protocols, might not yield generalizable machine learning models, that is, models developed using one dataset may not be applicable to a different dataset. The generalizability of classification models is key for multi-scanner and multicenter studies, and for providing reproducible results. This study developed a data harmonization strategy to identify healthy controls with similar (homogenous) characteristics from multicenter studies to validate the generalization of machine-learning techniques for classifying individual migraine patients and healthy controls using brain MRI data. The Maximum Mean Discrepancy (MMD) was used to compare the two datasets represented in Geodesic Flow Kernel (GFK) space, capturing the data variabilities for identifying a "healthy core". A set of homogeneous healthy controls can assist in overcoming some of the unwanted heterogeneity and allow for the development of classification models that have high accuracy when applied to new datasets. Extensive experimental results show the utilization of a healthy core. One dataset consists of 120 individuals (66 with migraine and 54 healthy controls) and another dataset consists of 76 (34 with migraine and 42 healthy controls) individuals. A homogeneous dataset derived from a cohort of healthy controls improves the performance of classification models by about 25% accuracy improvements for both episodic and chronic migraineurs.
Collapse
Affiliation(s)
- Hyunsoo Yoon
- Yonsei University; Department of Industrial Engineering
| | - Todd J. Schwedt
- Mayo Clinic; Department of Neurology
- ASU-Mayo Center for Innovative Imaging
| | - Catherine D. Chong
- Mayo Clinic; Department of Neurology
- ASU-Mayo Center for Innovative Imaging
| | - Oyekanmi Olatunde
- Binghamton University; Department of Systems Science and Industrial Engineering
| | - Teresa Wu
- ASU-Mayo Center for Innovative Imaging
- Arizona State University; School of Computing and Augmented Intelligence
| |
Collapse
|
2
|
Eysenbach G, Ulrich H, Bergh B, Schreiweis B. Functional Requirements for Medical Data Integration into Knowledge Management Environments: Requirements Elicitation Approach Based on Systematic Literature Analysis. J Med Internet Res 2023; 25:e41344. [PMID: 36757764 PMCID: PMC9951079 DOI: 10.2196/41344] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 10/24/2022] [Accepted: 11/17/2022] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND In patient care, data are historically generated and stored in heterogeneous databases that are domain specific and often noninteroperable or isolated. As the amount of health data increases, the number of isolated data silos is also expected to grow, limiting the accessibility of the collected data. Medical informatics is developing ways to move from siloed data to a more harmonized arrangement in information architectures. This paradigm shift will allow future research to integrate medical data at various levels and from various sources. Currently, comprehensive requirements engineering is working on data integration projects in both patient care- and research-oriented contexts, and it is significantly contributing to the success of such projects. In addition to various stakeholder-based methods, document-based requirement elicitation is a valid method for improving the scope and quality of requirements. OBJECTIVE Our main objective was to provide a general catalog of functional requirements for integrating medical data into knowledge management environments. We aimed to identify where integration projects intersect to derive consistent and representative functional requirements from the literature. On the basis of these findings, we identified which functional requirements for data integration exist in the literature and thus provide a general catalog of requirements. METHODS This work began by conducting a literature-based requirement elicitation based on a broad requirement engineering approach. Thus, in the first step, we performed a web-based systematic literature review to identify published articles that dealt with the requirements for medical data integration. We identified and analyzed the available literature by applying the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. In the second step, we screened the results for functional requirements using the requirements engineering method of document analysis and derived the requirements into a uniform requirement syntax. Finally, we classified the elicited requirements into a category scheme that represents the data life cycle. RESULTS Our 2-step requirements elicitation approach yielded 821 articles, of which 61 (7.4%) were included in the requirement elicitation process. There, we identified 220 requirements, which were covered by 314 references. We assigned the requirements to different data life cycle categories as follows: 25% (55/220) to data acquisition, 35.9% (79/220) to data processing, 12.7% (28/220) to data storage, 9.1% (20/220) to data analysis, 6.4% (14/220) to metadata management, 2.3% (5/220) to data lineage, 3.2% (7/220) to data traceability, and 5.5% (12/220) to data security. CONCLUSIONS The aim of this study was to present a cross-section of functional data integration-related requirements defined in the literature by other researchers. The aim was achieved with 220 distinct requirements from 61 publications. We concluded that scientific publications are, in principle, a reliable source of information for functional requirements with respect to medical data integration. Finally, we provide a broad catalog to support other scientists in the requirement elicitation phase.
Collapse
Affiliation(s)
- G Eysenbach
- Institute for Medical Informatics and StatisticsKiel University and University Hospital Schleswig-HolsteinKielGermany
| | - Hannes Ulrich
- Institute for Medical Informatics and Statistics, Kiel University and University Hospital Schleswig-Holstein, Kiel, Germany
| | - Björn Bergh
- Institute for Medical Informatics and Statistics, Kiel University and University Hospital Schleswig-Holstein, Kiel, Germany
| | - Björn Schreiweis
- Institute for Medical Informatics and Statistics, Kiel University and University Hospital Schleswig-Holstein, Kiel, Germany
| |
Collapse
|
3
|
Urbanowicz RJ, Holmes JH, Appleby D, Narasimhan V, Durborow S, Al-Naamani N, Fernando M, Kawut SM. A Semi-Automated Term Harmonization Pipeline Applied to Pulmonary Arterial Hypertension Clinical Trials. Methods Inf Med 2022; 61:3-10. [PMID: 34820791 PMCID: PMC9978994 DOI: 10.1055/s-0041-1739361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
OBJECTIVE Data harmonization is essential to integrate individual participant data from multiple sites, time periods, and trials for meta-analysis. The process of mapping terms and phrases to an ontology is complicated by typographic errors, abbreviations, truncation, and plurality. We sought to harmonize medical history (MH) and adverse events (AE) term records across 21 randomized clinical trials in pulmonary arterial hypertension and chronic thromboembolic pulmonary hypertension. METHODS We developed and applied a semi-automated harmonization pipeline for use with domain-expert annotators to resolve ambiguous term mappings using exact and fuzzy matching. We summarized MH and AE term mapping success, including map quality measures, and imputation of a generalizing term hierarchy as defined by the applied Medical Dictionary for Regulatory Activities (MedDRA) ontology standard. RESULTS Over 99.6% of both MH (N = 37,105) and AE (N = 58,170) records were successfully mapped to MedDRA low-level terms. Automated exact matching accounted for 74.9% of MH and 85.5% of AE mappings. Term recommendations from fuzzy matching in the pipeline facilitated annotator mapping of the remaining 24.9% of MH and 13.8% of AE records. Imputation of the generalized MedDRA term hierarchy was unambiguous in 85.2% of high-level terms, 99.4% of high-level group terms, and 99.5% of system organ class in MH, and 75% of high-level terms, 98.3% of high-level group terms, and 98.4% of system organ class in AE. CONCLUSION This pipeline dramatically reduced the burden of manual annotation for MH and AE term harmonization and could be adapted to other data integration efforts.
Collapse
Affiliation(s)
- Ryan J. Urbanowicz
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States
| | - John H. Holmes
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States
| | - Dina Appleby
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States
| | - Vanamala Narasimhan
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States
| | - Stephen Durborow
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States
| | - Nadine Al-Naamani
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States
| | - Melissa Fernando
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States
| | - Steven M. Kawut
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States
| |
Collapse
|
4
|
Gurugubelli VS, Fang H, Shikany JM, Balkus SV, Rumbut J, Ngo H, Wang H, Allison JJ, Steffen LM. A review of harmonization methods for studying dietary patterns. SMART HEALTH (AMSTERDAM, NETHERLANDS) 2022; 23:100263. [PMID: 35252528 PMCID: PMC8896407 DOI: 10.1016/j.smhl.2021.100263] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]
Abstract
Data harmonization is the process by which each of the variables from different research studies are standardized to similar units resulting in comparable datasets. These data may be integrated for more powerful and accurate examination and prediction of outcomes for use in the intelligent and smart electronic health software programs and systems. Prospective harmonization is performed when researchers create guidelines for gathering and managing the data before data collection begins. In contrast, retrospective harmonization is performed by pooling previously collected data from various studies using expert domain knowledge to identify and translate variables. In nutritional epidemiology, dietary data harmonization is often necessary to construct the nutrient and food databases necessary to answer complex research questions and develop effective public health policy. In this paper, we review methods for effective data harmonization, including developing a harmonization plan, which common standards already exist for harmonization, and defining variables needed to harmonize datasets. Currently, several large-scale studies maintain harmonized nutrient databases, especially in Europe, and steps have been proposed to inform the retrospective harmonization process. As an example, data harmonization methods are applied to several U.S longitudinal diet datasets. Based on our review, considerations for future dietary data harmonization include user agreements for sharing private data among participating studies, defining variables and data dictionaries that accurately map variables among studies, and the use of secure data storage servers to maintain privacy. These considerations establish necessary components of harmonized data for smart health applications which can promote healthier eating and provide greater insights into the effect of dietary patterns on health.
Collapse
Affiliation(s)
| | - Hua Fang
- University of Massachusetts Dartmouth, 285 Old Westport Rd, North Dartmouth, 02747, Massachusetts, USA
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, 55 N Lake Ave, Worcester, 01655, Massachusetts, USA
- Corresponding author. Tel.: +0-508-910-6411;
| | - James M Shikany
- Division of Preventive Medicine, University of Alabama at Birmingham, 1720 University Blvd, Birmingham, 35294, Alabama, USA
| | - Salvador V Balkus
- University of Massachusetts Dartmouth, 285 Old Westport Rd, North Dartmouth, 02747, Massachusetts, USA
| | - Joshua Rumbut
- University of Massachusetts Dartmouth, 285 Old Westport Rd, North Dartmouth, 02747, Massachusetts, USA
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, 55 N Lake Ave, Worcester, 01655, Massachusetts, USA
| | - Hieu Ngo
- University of Massachusetts Dartmouth, 285 Old Westport Rd, North Dartmouth, 02747, Massachusetts, USA
| | - Honggang Wang
- University of Massachusetts Dartmouth, 285 Old Westport Rd, North Dartmouth, 02747, Massachusetts, USA
| | - Jeroan J Allison
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, 55 N Lake Ave, Worcester, 01655, Massachusetts, USA
| | - Lyn M. Steffen
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, 55455, Minnesota, USA
| |
Collapse
|
5
|
Nassiri F, Wang JZ, Au K, Barnholtz-Sloan J, Jenkinson MD, Drummond K, Zhou Y, Snyder JM, Brastianos P, Santarius T, Suppiah S, Poisson L, Gaillard F, Rosenthal M, Kaufmann T, Tsang D, Aldape K, Zadeh G. Consensus core clinical data elements for meningiomas. Neuro Oncol 2021; 24:683-693. [PMID: 34791428 DOI: 10.1093/neuonc/noab259] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND With increasing molecular analyses of meningiomas, there is a need to harmonize language used to capture clinical data across centers to ensure that molecular alterations are appropriately linked to clinical variables of interest. Here the International Consortium on Meningiomas presents a set of core and supplemental meningioma-specific Common Data Elements (CDEs) to facilitate comparative and pooled analyses. METHODS The generation of CDEs followed the four-phase process similar to other National Institute of Neurological Disorders and Stroke (NINDS) CDE projects: discovery, internal validation, external validation, and distribution. RESULTS The CDEs were organized into patient- and tumor-level modules. In total, 17 core CDEs (10 patient-level and 7-tumour-level) as well as 14 supplemental CDEs (7 patient-level and 7 tumour-level) were defined and described. These CDEs are now made publicly available for dissemination and adoption. CONCLUSIONS CDEs provide a framework for discussion in the neuro-oncology community that will facilitate data sharing for collaborative research projects and aid in developing a common language for comparative and pooled analyses. The meningioma-specific CDEs presented here are intended to be dynamic parameters that evolve with time and The Consortium welcomes international feedback for further refinement and implementation of these CDEs.
Collapse
Affiliation(s)
- Farshad Nassiri
- MacFeeters Hamilton Neuro-Oncology Program, Princess Margaret Cancer Centre, University Health Network and University of Toronto, ON, Canada.,Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto, ON, Canada.,Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Justin Z Wang
- MacFeeters Hamilton Neuro-Oncology Program, Princess Margaret Cancer Centre, University Health Network and University of Toronto, ON, Canada.,Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto, ON, Canada.,Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Karolyn Au
- Division of Neurosurgery, Department of Surgery, University of Alberta, AB, Canada
| | - Jill Barnholtz-Sloan
- Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH, United States
| | - Michael D Jenkinson
- Department of Neurosurgery, University of Liverpool, England, United Kingdom
| | - Kate Drummond
- Department of Neurosurgery, The Royal Melbourne Hospital, Melbourne, Australia
| | - Yueren Zhou
- Henry Ford Health System, Detroit, MI, United States
| | | | - Priscilla Brastianos
- Dana Farber/Harvard Cancer Center, Massachusetts General Hospital, Boston, MA, United States
| | - Thomas Santarius
- Department of Neurosurgery, Cambridge University Hospitals, Cambridge, United Kingdom
| | - Suganth Suppiah
- MacFeeters Hamilton Neuro-Oncology Program, Princess Margaret Cancer Centre, University Health Network and University of Toronto, ON, Canada.,Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto, ON, Canada.,Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Laila Poisson
- Henry Ford Health System, Detroit, MI, United States
| | - Francesco Gaillard
- Department of Radiology, The Royal Melbourne Hospital, Melbourne, Australia
| | - Mark Rosenthal
- Department of Medical Oncology, Peter MacCallum Cancer Centre, Melbourne, Australia
| | - Timothy Kaufmann
- Department of Radiology, The Mayo Clinic, Rochester, Min, United States
| | - Derek Tsang
- Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto, ON, Canada
| | - Kenneth Aldape
- National Cancer Institute, National Institutes of Health, Bethesda, MD, United States
| | - Gelareh Zadeh
- MacFeeters Hamilton Neuro-Oncology Program, Princess Margaret Cancer Centre, University Health Network and University of Toronto, ON, Canada.,Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto, ON, Canada.,Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| |
Collapse
|
6
|
Parimbelli E, Wilk S, Cornet R, Sniatala P, Sniatala K, Glaser SLC, Fraterman I, Boekhout AH, Ottaviano M, Peleg M. A review of AI and Data Science support for cancer management. Artif Intell Med 2021; 117:102111. [PMID: 34127240 DOI: 10.1016/j.artmed.2021.102111] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 12/23/2020] [Accepted: 05/11/2021] [Indexed: 02/09/2023]
Abstract
INTRODUCTION Thanks to improvement of care, cancer has become a chronic condition. But due to the toxicity of treatment, the importance of supporting the quality of life (QoL) of cancer patients increases. Monitoring and managing QoL relies on data collected by the patient in his/her home environment, its integration, and its analysis, which supports personalization of cancer management recommendations. We review the state-of-the-art of computerized systems that employ AI and Data Science methods to monitor the health status and provide support to cancer patients managed at home. OBJECTIVE Our main objective is to analyze the literature to identify open research challenges that a novel decision support system for cancer patients and clinicians will need to address, point to potential solutions, and provide a list of established best-practices to adopt. METHODS We designed a review study, in compliance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, analyzing studies retrieved from PubMed related to monitoring cancer patients in their home environments via sensors and self-reporting: what data is collected, what are the techniques used to collect data, semantically integrate it, infer the patient's state from it and deliver coaching/behavior change interventions. RESULTS Starting from an initial corpus of 819 unique articles, a total of 180 papers were considered in the full-text analysis and 109 were finally included in the review. Our findings are organized and presented in four main sub-topics consisting of data collection, data integration, predictive modeling and patient coaching. CONCLUSION Development of modern decision support systems for cancer needs to utilize best practices like the use of validated electronic questionnaires for quality-of-life assessment, adoption of appropriate information modeling standards supplemented by terminologies/ontologies, adherence to FAIR data principles, external validation, stratification of patients in subgroups for better predictive modeling, and adoption of formal behavior change theories. Open research challenges include supporting emotional and social dimensions of well-being, including PROs in predictive modeling, and providing better customization of behavioral interventions for the specific population of cancer patients.
Collapse
Affiliation(s)
| | - S Wilk
- Poznan University of Technology, Poland
| | - R Cornet
- Amsterdam University Medical Centre, the Netherlands
| | | | | | - S L C Glaser
- Amsterdam University Medical Centre, the Netherlands
| | - I Fraterman
- Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - A H Boekhout
- Netherlands Cancer Institute, Amsterdam, the Netherlands
| | | | | |
Collapse
|
7
|
|
8
|
Rance B, Canuel V, Countouris H, Laurent-Puig P, Burgun A. Integrating Heterogeneous Biomedical Data for Cancer Research: the CARPEM infrastructure. Appl Clin Inform 2016; 7:260-74. [PMID: 27437039 PMCID: PMC4941838 DOI: 10.4338/aci-2015-09-ra-0125] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 02/07/2016] [Indexed: 01/19/2023] Open
Abstract
Cancer research involves numerous disciplines. The multiplicity of data sources and their heterogeneous nature render the integration and the exploration of the data more and more complex. Translational research platforms are a promising way to assist scientists in these tasks. In this article, we identify a set of scientific and technical principles needed to build a translational research platform compatible with ethical requirements, data protection and data-integration problems. We describe the solution adopted by the CARPEM cancer research program to design and deploy a platform able to integrate retrospective, prospective, and day-to-day care data. We designed a three-layer architecture composed of a data collection layer, a data integration layer and a data access layer. We leverage a set of open-source resources including i2b2 and tranSMART.
Collapse
Affiliation(s)
- Bastien Rance
- University Hospital Georges Pompidou, Paris, France; INSERM UMR_S 1138, CRC, Paris, France
| | | | - Hector Countouris
- University Hospital Georges Pompidou, Paris, France; INSERM UMR_S 1138, CRC, Paris, France
| | - Pierre Laurent-Puig
- University Hospital Georges Pompidou, Paris, France; Université Paris Sorbonne Cité, Inserm UMR-S 1147, Paris, France
| | - Anita Burgun
- University Hospital Georges Pompidou, Paris, France; INSERM UMR_S 1138, CRC, Paris, France
| |
Collapse
|
9
|
Winter A, Hilgers RD, Hofestädt R, Hübner U, Knaup-Gregori P, Ose C, Schmoor C, Timmer A, Wege D. Good Medicine and Good Healthcare Demand Good Information (Systems). Methods Inf Med 2015; 54:385-7. [PMID: 26395286 DOI: 10.3414/me15-05-1001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The demand for evidence-based health informatics and benchmarking of 'good' information systems in health care gives an opportunity to continue reporting on recent papers in the German journal GMS Medical Informatics, Biometry and Epidemiology (MIBE) here. The publications in focus deal with a comparison of benchmarking initiatives in German-speaking countries, use of communication standards in telemonitoring scenarios, the estimation of national cancer incidence rates and modifications of parametric tests. Furthermore papers in this issue of MIM are introduced which originally have been presented at the Annual Conference of the German Society of Medical Informatics, Biometry and Epidemiology. They deal as well with evidence and evaluation of 'good' information systems but also with data harmonization, surveillance in obstetrics, adaptive designs and parametrical testing in statistical analysis, patient registries and signal processing.
Collapse
Affiliation(s)
- A Winter
- Prof. Dr. Alfred Winter, Leipzig University, Institute for Medical Informatics, Statistics and Epidemiology, Haertelstr. 16 -18, 04107 Leipzig, Germany E-mail:
| | | | | | | | | | | | | | | | | |
Collapse
|