1
|
Saha E, Rathore P. Discovering hidden patterns among medicines prescribed to patients using Association Rule Mining Technique. INTERNATIONAL JOURNAL OF HEALTHCARE MANAGEMENT 2022. [DOI: 10.1080/20479700.2022.2099335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Esha Saha
- Institute of Management Technology Hyderabad, Hyderabad, India
| | | |
Collapse
|
2
|
Campbell EA, Bass EJ, Masino AJ. Temporal condition pattern mining in large, sparse electronic health record data: A case study in characterizing pediatric asthma. J Am Med Inform Assoc 2021; 27:558-566. [PMID: 32049282 PMCID: PMC7075539 DOI: 10.1093/jamia/ocaa005] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 12/20/2019] [Accepted: 01/12/2020] [Indexed: 12/26/2022] Open
Abstract
Objective This study introduces a temporal condition pattern mining methodology to address the sparse nature of coded condition concept utilization in electronic health record data. As a validation study, we applied this method to reveal condition patterns surrounding an initial diagnosis of pediatric asthma. Materials and Methods The SPADE (Sequential PAttern Discovery using Equivalence classes) algorithm was used to identify common temporal condition patterns surrounding the initial diagnosis of pediatric asthma in a study population of 71 824 patients from the Children’s Hospital of Philadelphia. SPADE was applied to a dataset with diagnoses coded using International Classification of Diseases (ICD) concepts and separately to a dataset with the ICD codes mapped to their corresponding expanded diagnostic clusters (EDCs). Common temporal condition patterns surrounding the initial diagnosis of pediatric asthma ascertained by SPADE from both the ICD and EDC datasets were compared. Results SPADE identified 36 unique diagnoses in the mapped EDC dataset, whereas only 19 were recognized in the ICD dataset. Temporal trends in condition diagnoses ascertained from the EDC data were not discoverable in the ICD dataset. Discussion Mining frequent temporal condition patterns from large electronic health record datasets may reveal previously unknown associations between diagnoses that could inform future research into causation or other relationships. Mapping sparsely coded medical concepts into homogenous groups was essential to discovering potentially useful information from our dataset. Conclusions We expect that the presented methodology is applicable to the study of diagnostic trajectories for other clinical conditions and can be extended to study temporal patterns of other coded medical concepts such as medications and procedures.
Collapse
Affiliation(s)
- Elizabeth A Campbell
- Department of Information Science, College of Computing & Informatics, Drexel University, Philadelphia, Pennsylvania, USA.,Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Ellen J Bass
- Department of Information Science, College of Computing & Informatics, Drexel University, Philadelphia, Pennsylvania, USA.,Department of Health Systems and Sciences Research, College of Nursing & Health Professions, Philadelphia, Pennsylvania, USA
| | - Aaron J Masino
- Department of Information Science, College of Computing & Informatics, Drexel University, Philadelphia, Pennsylvania, USA.,Department of Anesthesiology and Critical Care, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
3
|
Rahman N, Wang DD, Ng SHX, Ramachandran S, Sridharan S, Khoo A, Tan CS, Goh WP, Tan XQ. Processing of Electronic Medical Records for Health Services Research in an Academic Medical Center: Methods and Validation. JMIR Med Inform 2018; 6:e10933. [PMID: 30578188 PMCID: PMC6320424 DOI: 10.2196/10933] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 10/09/2018] [Accepted: 10/10/2018] [Indexed: 01/08/2023] Open
Abstract
Background Electronic medical records (EMRs) contain a wealth of information that can support data-driven decision making in health care policy design and service planning. Although research using EMRs has become increasingly prevalent, challenges such as coding inconsistency, data validity, and lack of suitable measures in important domains still hinder the progress. Objective The objective of this study was to design a structured way to process records in administrative EMR systems for health services research and assess validity in selected areas. Methods On the basis of a local hospital EMR system in Singapore, we developed a structured framework for EMR data processing, including standardization and phenotyping of diagnosis codes, construction of cohort with multilevel views, and generation of variables and proxy measures to supplement primary data. Disease complexity was estimated by Charlson Comorbidity Index (CCI) and Polypharmacy Score (PPS), whereas socioeconomic status (SES) was estimated by housing type. Validity of modified diagnosis codes and derived measures were investigated. Results Visit-level (N=7,778,761) and patient-level records (n=549,109) were generated. The International Classification of Diseases, Tenth Revision, Australian Modification (ICD-10-AM) codes were standardized to the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) with a mapping rate of 87.1%. In all, 97.4% of the ICD-9-CM codes were phenotyped successfully using Clinical Classification Software by Agency for Healthcare Research and Quality. Diagnosis codes that underwent modification (truncation or zero addition) in standardization and phenotyping procedures had the modification validated by physicians, with validity rates of more than 90%. Disease complexity measures (CCI and PPS) and SES were found to be valid and robust after a correlation analysis and a multivariate regression analysis. CCI and PPS were correlated with each other and positively correlated with health care utilization measures. Larger housing type was associated with lower government subsidies received, suggesting association with higher SES. Profile of constructed cohorts showed differences in disease prevalence, disease complexity, and health care utilization in those aged above 65 years and those aged 65 years or younger. Conclusions The framework proposed in this study would be useful for other researchers working with EMR data for health services research. Further analyses would be needed to better understand differences observed in the cohorts.
Collapse
Affiliation(s)
- Nabilah Rahman
- Centre for Health Services and Policy Research, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Debby D Wang
- Centre for Health Services and Policy Research, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Sheryl Hui-Xian Ng
- Centre for Health Services and Policy Research, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Sravan Ramachandran
- Centre for Health Services and Policy Research, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Srinath Sridharan
- Centre for Health Services and Policy Research, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Astrid Khoo
- Regional Health System Planning Office, National University Health System, Singapore, Singapore
| | - Chuen Seng Tan
- Centre for Health Services and Policy Research, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore.,Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Wei-Ping Goh
- University Medicine Cluster, National University Hospital, Singapore, Singapore
| | - Xin Quan Tan
- Regional Health System Planning Office, National University Health System, Singapore, Singapore.,Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| |
Collapse
|
4
|
Leinonen MK, Hansen SA, Skare GB, Skaaret IB, Silva M, Johannesen TB, Nygård M. Low proportion of unreported cervical treatments in the cancer registry of Norway between 1998 and 2013. Acta Oncol 2018; 57:1663-1670. [PMID: 30169991 DOI: 10.1080/0284186x.2018.1497296] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
BACKGROUND Accurate information about treatment is needed to evaluate cervical cancer prevention efforts. We studied completeness and validity of reporting cervical treatments in the Cancer Registry of Norway (CRN). MATERIAL AND METHODS We identified 47,423 (92%) high-grade cervical dysplasia patients with and 3983 (8%) without recorded treatment in the CRN in 1998-2013. We linked the latter group to the nationwide registry of hospital discharges in 1998-2015. Of patients still without treatment records, we randomly selected 375 for review of their medical history. Factors predicting incomplete treatment records were assessed by multiple imputation and logistic regression. RESULTS Registry linkage revealed that 10% (401/3983) of patients received treatment, usually conization, within one year of their initial high-grade dysplasia diagnosis. Of those, 11% (n = 44) were missing due to unreporting and 89% (n = 357) due to misclassification at the CRN. Of all cases in medical review, patients under active surveillance contributed almost 60% (223/375). Other reasons of being without recorded treatment were uncertain dysplasia diagnosis, invasive cancer or death. Coding error occurred in 19% (73/375) of randomly selected cases. CRN undercounted receipt of treatment by 38% (n = 1526) among patients without recorded treatment which translates into 97% overall completeness of treatment data. Incomplete treatment records were particularly associated with public laboratories, patients aged 40-54 years, and the latest study years. CONCLUSIONS CRN holds accurate information on cervical treatments. Completeness and particularly validity can be further improved through the establishment of new internal routines and regular linkage to hospital discharges.
Collapse
Affiliation(s)
| | - Svenn A. Hansen
- Department of Health Management and Health Economics, University of Oslo, Oslo, Norway
| | | | | | - Monica Silva
- Department of Registration, Cancer Registry of Norway, Oslo, Norway
| | | | - Mari Nygård
- Department of Research, Cancer Registry of Norway, Oslo, Norway
| |
Collapse
|
5
|
Idri A, Benhar H, Fernández-Alemán JL, Kadi I. A systematic map of medical data preprocessing in knowledge discovery. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 162:69-85. [PMID: 29903496 DOI: 10.1016/j.cmpb.2018.05.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 04/25/2018] [Accepted: 05/03/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVE Datamining (DM) has, over the last decade, received increased attention in the medical domain and has been widely used to analyze medical datasets in order to extract useful knowledge and previously unknown patterns. However, historical medical data can often comprise inconsistent, noisy, imbalanced, missing and high dimensional data. These challenges lead to a serious bias in predictive modeling and reduce the performance of DM techniques. Data preprocessing is, therefore, an essential step in knowledge discovery as regards improving the quality of data and making it appropriate and suitable for DM techniques. The objective of this paper is to review the use of preprocessing techniques in clinical datasets. METHODS We performed a systematic map of studies regarding the application of data preprocessing to healthcare and published between January 2000 and December 2017. A search string was determined on the basis of the mapping questions and the PICO categories. The search string was then applied in digital databases covering the fields of computer science and medical informatics in order to identify relevant studies. The studies were initially selected by reading their titles, abstracts and keywords. Those that were selected at that stage were then reviewed using a set of inclusion and exclusion criteria in order to eliminate any that were not relevant. This process resulted in 126 primary studies. RESULTS Selected studies were analyzed and classified according to their publication years and channels, research type, empirical type and contribution type. The findings of this mapping study revealed that researchers have paid a considerable amount of attention to preprocessing in medical DM in last decade. A significant number of the selected studies used data reduction and cleaning preprocessing tasks. Moreover, the disciplines in which preprocessing have received most attention are: cardiology, endocrinology and oncology. CONCLUSIONS Researchers should develop and implement standards for an effective integration of multiple medical data types. Moreover, we identified the need to perform literature reviews.
Collapse
Affiliation(s)
- A Idri
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - H Benhar
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - J L Fernández-Alemán
- Department of Informatics and Systems, Faculty of Computer Science, University of Murcia, Spain.
| | - I Kadi
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| |
Collapse
|
6
|
Wunsch G, Gourbin C. Mortality, morbidity and health in developed societies: a review of data sources. GENUS 2018; 74:2. [PMID: 29398718 PMCID: PMC5787574 DOI: 10.1186/s41118-018-0027-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 01/11/2018] [Indexed: 12/26/2022] Open
Abstract
The purpose of this paper is to review the major sources of data on mortality, morbidity and health in Europe and in other developed regions in order to examine their potential for analysing mortality and morbidity levels and trends. The review is primarily focused on routinely collected information covering a whole country. No attempt is made to draw up an inventory of sources by country; the paper deals instead with the pros and cons of each source for mortality and morbidity studies in demography. While each source considered separately can already yield useful, though partial, results, record linkage among data sources can significantly improve the analysis. Record linkage can also lead to the detection of possible causal associations that could eventually be confirmed. More generally, Big Data can reveal changing mortality and morbidity trends and patterns that could lead to preventive measures being taken rather than more costly curative ones.
Collapse
Affiliation(s)
- Guillaume Wunsch
- Centre for Demographic Research, Catholic University of Louvain, Place Montesquieu 1/L2.08.03, B-1348 Louvain-la-Neuve, Belgium
| | - Catherine Gourbin
- Centre for Demographic Research, Catholic University of Louvain, Place Montesquieu 1/L2.08.03, B-1348 Louvain-la-Neuve, Belgium
| |
Collapse
|
7
|
Chen J, Wei W, Guo C, Tang L, Sun L. Textual analysis and visualization of research trends in data mining for electronic health records. HEALTH POLICY AND TECHNOLOGY 2017. [DOI: 10.1016/j.hlpt.2017.10.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|