1
|
Röchner P, Rothlauf F. Using machine learning to link electronic health records in cancer registries: On the tradeoff between linkage quality and manual effort. Int J Med Inform 2024; 185:105387. [PMID: 38428200 DOI: 10.1016/j.ijmedinf.2024.105387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 10/05/2023] [Accepted: 02/20/2024] [Indexed: 03/03/2024]
Abstract
BACKGROUND Cancer registries link a large number of electronic health records reported by medical institutions to already registered records of the matching individual and tumor. Records are automatically linked using deterministic and probabilistic approaches; machine learning is rarely used. Records that cannot be matched automatically with sufficient accuracy are typically processed manually. For application, it is important to know how well record linkage approaches match real-world records and how much manual effort is required to achieve the desired linkage quality. We study the task of linking reported records to the matching registered tumor in cancer registries. METHODS We compare the tradeoff between linkage quality and manual effort of five machine learning methods (logistic regression, random forest, gradient boosting, neural network, and a stacked method) to a deterministic baseline. The record linkage methods are compared in a two-class setting (no-match/ match) and a three-class setting (no-match/ undecided/ match). A cancer registry collected and linked the dataset consisting of categorical variables matching 145,755 reported records with 33,289 registered tumors. RESULTS In the two-class setting, the gradient boosting, neural network, and stacked models have higher accuracy and F1 score (accuracy: 0.968-0.978, F1 score: 0.983-0.988) than the deterministic baseline (accuracy: 0.964, F1 score: 0.980) when the same records are manually processed (0.89% of all records). In the three-class setting, these three machine learning methods can automatically process all reported records and still have higher accuracy and F1 score than the deterministic baseline. The linkage quality of the machine learning methods studied, except for the neural network, increase as the number of manually processed records increases. CONCLUSION Machine learning methods can significantly improve linkage quality and reduce the manual effort required by medical coders to match tumor records in cancer registries compared to a deterministic baseline. Our results help cancer registries estimate how linkage quality increases as more records are manually processed.
Collapse
|
2
|
Kirilov N. Comparison of WebSocket and Hypertext Transfer Protocol for Transfer of Electronic Health Records. Stud Health Technol Inform 2024; 313:124-128. [PMID: 38682516 DOI: 10.3233/shti240023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
BACKGROUND Electronic health records (EHR) emerged as a digital record of the data that is generated in the healthcare. OBJECTIVES In this paper the transfer times of EHRs using the Hypertext Transfer Protocol and WebSocket in both local network and wide area network (WAN) are compared. METHODS A python web application to serve Fast Health Interoperability Resources (FHIR) records is created and the transfer times of the EHRs over both HTTP and WebSocket connection are measured. 45000 test Patient resources in 20, 50, 100 and 200 resources per Bundle transfers are used. RESULTS WebSocket showed much better transfer times of large amount of data. These were 18 s shorter in the local network and 342 s shorter in WAN for the 20 resource per Bundle transfer. CONCLUSION RESTful APIs are a convenient way to implement EHR servers; on the other hand, HTTP becomes a bottleneck when transferring large amount of data. WebSocket shows better transfer times and thus its superiority in such situations. The problem can be addressed by developing a new communication protocol or by using network tunneling to handle large data transfer of EHRs.
Collapse
|
3
|
Kim JW, Choi H, Lim HJ, Oh M, Ahn JJ. Evaluating Linkage Quality of Population-Based Administrative Data for Health Service Research. J Korean Med Sci 2024; 39:e127. [PMID: 38622936 PMCID: PMC11018984 DOI: 10.3346/jkms.2024.39.e127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 03/11/2024] [Indexed: 04/17/2024] Open
Abstract
BACKGROUND To overcome the limitations of relying on data from a single institution, many researchers have studied data linkage methodologies. Data linkage includes errors owing to legal issues surrounding personal information and technical issues related to data processing. Linkage errors affect selection bias, and external and internal validity. Therefore, quality verification for each connection method with adherence to personal information protection is an important issue. This study evaluated the linkage quality of linked data and analyzed the potential bias resulting from linkage errors. METHODS This study analyzed claims data submitted to the Health Insurance Review and Assessment Service (HIRA DATA). The linkage errors of the two deterministic linkage methods were evaluated based on the use of the match key. The first deterministic linkage uses a unique identification number, and the second deterministic linkage uses the name, gender, and date of birth as a set of partial identifiers. The linkage error included in this deterministic linkage method was compared with the absolute standardized difference (ASD) of Cohen's according to the baseline characteristics, and the linkage quality was evaluated through the following indicators: linked rate, false match rate, missed match rate, positive predictive value, sensitivity, specificity, and F1-score. RESULTS For the deterministic linkage method that used the name, gender, and date of birth as a set of partial identifiers, the true match rate was 83.5 and the missed match rate was 16.5. Although there was bias in some characteristics of the data, most of the ASD values were less than 0.1, with no case greater than 0.5. Therefore, it is difficult to determine whether linked data constructed with deterministic linkages have substantial differences. CONCLUSION This study confirms the possibility of building health and medical data at the national level as the first data linkage quality verification study using big data from the HIRA. Analyzing the quality of linkages is crucial for comprehending linkage errors and generating reliable analytical outcomes. Linkers should increase the reliability of linked data by providing linkage error-related information to researchers. The results of this study will serve as reference data to increase the reliability of multicenter data linkage studies.
Collapse
|
4
|
Lloyd LK, Nicholson C, Strange G, Celermajer DS. The burdensome logistics of data linkage in Australia - the example of a national registry for congenital heart disease. AUST HEALTH REV 2024; 48:8-15. [PMID: 38118279 DOI: 10.1071/ah23185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 11/21/2023] [Indexed: 12/22/2023]
Abstract
Objective Data linkage is a very powerful research tool in epidemiology, however, establishing this can be a lengthy and intensive process. This paper reports on the complex landscape of conducting data linkage projects in Australia. Methods We reviewed the processes, required documentation, and applications required to conduct multi-jurisdictional data linkage across Australia, in 2023. Results Obtaining the necessary approvals to conduct linkage will likely take nearly 2 years (estimated 730 days, including 605 days from initial submission to obtaining all ethical approvals and an estimated further 125 days for the issuance of unexpected additionally required approvals). Ethical review for linkage projects ranged from 51 to 128 days from submission to ethical approval, and applications consisted of 9-25 documents. Conclusions Major obstacles to conducting multi-jurisdictional data linkage included the complexity of the process, and substantial time and financial costs. The process was characterised by inefficiencies at several levels, reduplication, and a lack of any key accountabilities for timely performance of processes. Data linkage is an invaluable resource for epidemiological research. Further streamlining, establishing accountability, and greater collaboration between jurisdictions is needed to ensure data linkage is both accessible and feasible to researchers.
Collapse
|
5
|
Silverwood RJ, Rajah N, Calderwood L, De Stavola BL, Harron K, Ploubidis GB. Examining the quality and population representativeness of linked survey and administrative data: guidance and illustration using linked 1958 National Child Development Study and Hospital Episode Statistics data. Int J Popul Data Sci 2024; 9:2137. [PMID: 38425790 PMCID: PMC10901060 DOI: 10.23889/ijpds.v9i1.2137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024] Open
Abstract
Introduction Recent years have seen an increase in linkages between survey and administrative data. It is important to evaluate the quality of such data linkages to discern the likely reliability of ensuing research. Evaluation of linkage quality and bias can be conducted using different approaches, but many of these are not possible when there is a separation of processes for linkage and analysis to help preserve privacy, as is typically the case in the UK (and elsewhere). Objectives We aimed to describe a suite of generalisable methods to evaluate linkage quality and population representativeness of linked survey and administrative data which remain tractable when users of the linked data are not party to the linkage process itself. We emphasise issues particular to longitudinal survey data throughout. Methods Our proposed approaches cover several areas: i) Linkage rates, ii) Selection into response, linkage consent and successful linkage, iii) Linkage quality, and iv) Linked data population representativeness. We illustrate these methods using a recent linkage between the 1958 National Child Development Study (NCDS; a cohort following an initial 17,415 people born in Great Britain in a single week of 1958) and Hospital Episode Statistics (HES) databases (containing important information regarding admissions, accident and emergency attendances and outpatient appointments at NHS hospitals in England). Results Our illustrative analyses suggest that the linkage quality of the NCDS-HES data is high and that the linked sample maintains an excellent level of population representativeness with respect to the single dimension we assessed. Conclusions Through this work we hope to encourage providers and users of linked data resources to undertake and publish thorough evaluations. We further hope that providing illustrative analyses using linked NCDS-HES data will improve the quality and transparency of research using this particular linked data resource.
Collapse
|
6
|
Kamat G, Shan M, Gutman R. Bayesian record linkage with variables in one file. Stat Med 2023; 42:4931-4951. [PMID: 37652076 DOI: 10.1002/sim.9894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 06/12/2023] [Accepted: 08/21/2023] [Indexed: 09/02/2023]
Abstract
In many healthcare and social science applications, information about units is dispersed across multiple data files. Linking records across files is necessary to estimate the associations of interest. Common record linkage algorithms only rely on similarities between linking variables that appear in all the files. Moreover, analysis of linked files often ignores errors that may arise from incorrect or missed links. Bayesian record linking methods allow for natural propagation of linkage error, by jointly sampling the linkage structure and the model parameters. We extend an existing Bayesian record linkage method to integrate associations between variables exclusive to each file being linked. We show analytically, and using simulations, that the proposed method can improve the linking process, and can result in accurate inferences. We apply the method to link Meals on Wheels recipients to Medicare enrollment records.
Collapse
|
7
|
Prindle J, Suthar H, Putnam-Hornstein E. An open-source probabilistic record linkage process for records with family-level information: Simulation study and applied analysis. PLoS One 2023; 18:e0291581. [PMID: 37862306 PMCID: PMC10588881 DOI: 10.1371/journal.pone.0291581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 08/31/2023] [Indexed: 10/22/2023] Open
Abstract
Research with administrative records involves the challenge of limited information in any single data source to answer policy-related questions. Record linkage provides researchers with a tool to supplement administrative datasets with other information about the same people when identified in separate sources as matched pairs. Several solutions are available for undertaking record linkage, producing linkage keys for merging data sources for positively matched pairs of records. In the current manuscript, we demonstrate a new application of the Python RecordLinkage package to family-based record linkages with machine learning algorithms for probability scoring, which we call probabilistic record linkage for families (PRLF). First, a simulation of administrative records identifies PRLF accuracy with variations in match and data degradation percentages. Accuracy is largely influenced by degradation (e.g., missing data fields, mismatched values) compared to the percentage of simulated matches. Second, an application of data linkage is presented to compare regression model estimate performance across three record linkage solutions (PRLF, ChoiceMaker, and Link Plus). Our findings indicate that all three solutions, when optimized, provide similar results for researchers. Strengths of our process, such as the use of ensemble methods, to improve match accuracy are discussed. We then identify caveats of record linkage in the context of administrative data.
Collapse
|
8
|
Garcia KKS, de Miranda CB, de Sousa FNEF. Procedures for health data linkage: applications in health surveillance. EPIDEMIOLOGIA E SERVIÇOS DE SAÚDE 2022; 31:e20211272. [PMID: 36287481 PMCID: PMC9887966 DOI: 10.1590/s2237-96222022000300004] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 07/08/2022] [Indexed: 12/23/2022] Open
Abstract
OBJECTIVE To present a standardized methodology for linking different public health databases. METHODS This was a methodological review article specifically describing data processing procedures for deterministic linkage between structured databases. It instructs on how to: treat data, select linkage keys, and link databases using two databases simulated in R software. RESULTS The commands used for the deterministic linkage of the inner_join type were presented. The linkage process resulted in a database with 40,108 pairs using only the "Name" key. Adding the second key, "Name of mother", the resulted dropped to 112 pairs. By adding the third key, "Date of birth", only two pairs were identified. CONCLUSION Database linkage and its analysis are valid and valuable tools for health services in supporting health surveillance actions.
Collapse
|
9
|
Heng Y, Armknecht F, Chen Y, Schnell R. On the effectiveness of graph matching attacks against privacy-preserving record linkage. PLoS One 2022; 17:e0267893. [PMID: 36137086 PMCID: PMC9499274 DOI: 10.1371/journal.pone.0267893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 04/19/2022] [Indexed: 11/19/2022] Open
Abstract
Linking several databases containing information on the same person is an essential step of many data workflows. Due to the potential sensitivity of the data, the identity of the persons should be kept private. Privacy-Preserving Record-Linkage (PPRL) techniques have been developed to link persons despite errors in the identifiers used to link the databases without violating their privacy. The basic approach is to use encoded quasi-identifiers instead of plain quasi-identifiers for making the linkage decision. Ideally, the encoded quasi-identifiers should prevent re-identification but still allow for a good linkage quality. While several PPRL techniques have been proposed so far, Bloom filter-based PPRL schemes (BF-PPRL) are among the most popular due to their scalability. However, a recently proposed attack on BF-PPRL based on graph similarities seems to allow individuals’ re-identification from encoded quasi-identifiers. Therefore, the graph matching attack is widely considered a serious threat to many PPRL-approaches and leads to the situation that BF-PPRL schemes are rejected as being insecure. In this work, we argue that this view is not fully justified. We show by experiments that the success of graph matching attacks requires a high overlap between encoded and plain records used for the attack. As soon as this condition is not fulfilled, the success rate sharply decreases and renders the attacks hardly effective. This necessary condition does severely limit the applicability of these attacks in practice and also allows for simple but effective countermeasures.
Collapse
|
10
|
Libuy N, Harron K, Gilbert R, Caulton R, Cameron E, Blackburn R. Linking education and hospital data in England: linkage process and quality. Int J Popul Data Sci 2021; 6:1671. [PMID: 34568585 PMCID: PMC8445153 DOI: 10.23889/ijpds.v6i1.1671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022] Open
Abstract
INTRODUCTION Linkage of administrative data for universal state education and National Health Service (NHS) hospital care would enable research into the inter-relationships between education and health for all children in England. OBJECTIVES We aim to describe the linkage process and evaluate the quality of linkage of four one-year birth cohorts within the National Pupil Database (NPD) and Hospital Episode Statistics (HES). METHODS We used multi-step deterministic linkage algorithms to link longitudinal records from state schools to the chronology of records in the NHS Personal Demographics Service (PDS; linkage stage 1), and HES (linkage stage 2). We calculated linkage rates and compared pupil characteristics in linked and unlinked samples for each stage of linkage and each cohort (1990/91, 1996/97, 1999/00, and 2004/05). RESULTS Of the 2,287,671 pupil records, 2,174,601 (95%) linked to HES. Linkage rates improved over time (92% in 1990/91 to 99% in 2004/05). Ethnic minority pupils and those living in more deprived areas were less likely to be matched to hospital records, but differences in pupil characteristics between linked and unlinked samples were moderate to small. CONCLUSION We linked nearly all pupils to at least one hospital record. The high coverage of the linkage represents a unique opportunity for wide-scale analyses across the domains of health and education. However, missed links disproportionately affected ethnic minorities or those living in the poorest neighbourhoods: selection bias could be mitigated by increasing the quality and completeness of identifiers recorded in administrative data or the application of statistical methods that account for missed links. HIGHLIGHTS Longitudinal administrative records for all children attending state school and acute hospital services in England have been used for research for more than two decades, but lack of a shared unique identifier has limited scope for linkage between these databases.We applied multi-step deterministic linkage algorithms to 4 one-year cohorts of children born 1 September-31 August in 1990/91, 1996/97, 1999/00 and 2004/05. In stage 1, full names, date of birth, and postcode histories from education data in the National Pupil Database were linked to the NHS Personal Demographic Service. In stage 2, NHS number, postcode, date of birth and sex were linked to hospital records in Hospital Episode Statistics.Between 92% and 99% of school pupils linked to at least one hospital record. Ethnic minority pupils and pupils who were living in the most deprived areas were least likely to link. Ethnic minority pupils were less likely than white children to link at the first step in both algorithms.Bias due to linkage errors could lead to an underestimate of the health needs in disadvantaged groups. Improved data quality, more sensitive linkage algorithms, and/or statistical methods that account for missed links in analyses, should be considered to reduce linkage bias.
Collapse
|
11
|
Aflaki K, Park AL, Nelson C, Luo W, Ray JG. Identifying maternal deaths with the use of hospital data versus death certificates: a retrospective population-based study. CMAJ Open 2021; 9:E539-E547. [PMID: 34021011 PMCID: PMC8177910 DOI: 10.9778/cmajo.20200201] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Accurate identification of maternal deaths is paramount for audit and policy purposes. Our aim was to determine the accuracy and completeness of data on maternal deaths in hospital and those recorded on a death certificate, and the level of agreement between the 2 data sources. METHODS We conducted a retrospective population-based study using data for Ontario, Canada, from Apr. 1, 2002, to Dec. 31, 2015. We used Canadian Institute for Health Information (CIHI) databases to identify deaths during inpatient, emergency department and same-day surgery encounters. We captured Vital Statistics deaths in the Office of the Registrar General, Deaths (ORGD) data set. Deaths were considered within 42 days and within 365 days after a pregnancy outcome (live birth, miscarriage, ectopic pregnancy or induced abortion) for all multiple and singleton pregnancies. We calculated agreement statistics and 95% confidence intervals (CIs). RESULTS Among 1 679 455 live births and stillbirths, 398 pregnancy-related deaths in the ORGD data set were mapped to a birth in CIHI databases, and 77 (16.2%) were not. Among 2 039 849 recognized pregnancies, 534 pregnancy-related deaths in the ORGD data set were linked to CIHI records, and 68 (11.3%) were not. Among live births and stillbirths, after pregnancy-related deaths in the ORGD data set not matched to a maternal death in the CIHI databases were removed, concordance measures between CIHI and ORGD records for maternal death within 42 days after delivery included a κ value of 0.87 (95% CI 0.82-0.91) and positive percent agreement of 0.88 (95% CI 0.83-0.94). The corresponding measures were similar for maternal death within 42 days after the end of a recognized pregnancy. When unlinked pregnancy-related deaths in the ORGD data set were retained, agreement measures declined for death within 42 days after a live birth or stillbirth (κ = 0.68, 95% CI 0.62-0.74). For maternal death within 365 days after a live birth or stillbirth, or after the end of a recognized pregnancy, the concordance statistics were generally favourable when unlinked pregnancy-related deaths in the ORGD data set were removed but were substantially declined when they were retained. INTERPRETATION Maternal mortality cannot be ascertained solely with the use of hospital data, including beyond 42 days after the end of pregnancy. To improve linkage, we propose including health insurance numbers on provincial and territorial medical death certificates.
Collapse
|
12
|
Chen Y, Wen H, Griffin R, Roach MJ, Kelly ML. Linking Individual Data From the Spinal Cord Injury Model Systems Center and Local Trauma Registry: Development and Validation of Probabilistic Matching Algorithm. Top Spinal Cord Inj Rehabil 2021; 26:221-231. [PMID: 33536727 PMCID: PMC7831288 DOI: 10.46292/sci20-00015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
BACKGROUND Linking records from the National Spinal Cord Injury Model Systems (SCIMS) database to the National Trauma Data Bank (NTDB) provides a unique opportunity to study early variables in predicting long-term outcomes after traumatic spinal cord injury (SCI). The public use data sets of SCIMS and NTDB are stripped of protected health information, including dates and zip code. OBJECTIVES To develop and validate a probabilistic algorithm linking data from an SCIMS center and its affiliated trauma registry. METHOD Data on SCI admissions 2011-2018 were retrieved from an SCIMS center (n = 302) and trauma registry (n = 723), of which 202 records had the same medical record number. The SCIMS records were divided equally into two data sets for algorithm development and validation, respectively. We used a two-step approach: blocking and weight generation for linking variables (race, insurance, height, and weight). RESULTS In the development set, 257 SCIMS-trauma pairs shared the same sex, age, and injury year across 129 clusters, of which 91 records were true-match. The probabilistic algorithm identified 65 of the 91 true-match records (sensitivity, 71.4%) with a positive predictive value (PPV) of 80.2%. The algorithm was validated over 282 SCIMS-trauma pairs across 127 clusters and had a sensitivity of 73.7% and PPV of 81.1%. Post hoc analysis shows the addition of injury date and zip code improved the specificity from 57.9% to 94.7%. CONCLUSION We demonstrate the feasibility of probabilistic linkage between SCIMS and trauma records, which needs further refinement and validation. Gaining access to injury date and zip code would improve record linkage significantly.
Collapse
|
13
|
Jewell A, Broadbent M, Hayes RD, Gilbert R, Stewart R, Downs J. Impact of matching error on linked mortality outcome in a data linkage of secondary mental health data with Hospital Episode Statistics (HES) and mortality records in South East London: a cross-sectional study. BMJ Open 2020; 10:e035884. [PMID: 32641360 PMCID: PMC7342822 DOI: 10.1136/bmjopen-2019-035884] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
OBJECTIVES Linkage of electronic health records (EHRs) to Hospital Episode Statistics (HES)-Office for National Statistics (ONS) mortality data has provided compelling evidence for lower life expectancy in people with severe mental illness. However, linkage error may underestimate these estimates. Using a clinical sample (n=265 300) of individuals accessing mental health services, we examined potential biases introduced through missed matching and examined the impact on the association between clinical disorders and mortality. SETTING The South London and Maudsley NHS Foundation Trust (SLaM) is a secondary mental healthcare provider in London. A deidentified version of SLaM's EHR was available via the Clinical Record Interactive Search system linked to HES-ONS mortality records. PARTICIPANTS Records from SLaM for patients active between January 2006 and December 2016. OUTCOME MEASURES Two sources of death data were available for SLaM participants: accurate and contemporaneous date of death via local batch tracing (gold standard) and date of death via linked HES-ONS mortality data. The effect of linkage error on mortality estimates was evaluated by comparing sociodemographic and clinical risk factor analyses using gold standard death data against HES-ONS mortality records. RESULTS Of the total sample, 93.74% were successfully matched to HES-ONS records. We found a number of statistically significant administrative, sociodemographic and clinical differences between matched and unmatched records. Of note, schizophrenia diagnosis showed a significant association with higher mortality using gold standard data (OR 1.08; 95% CI 1.01 to 1.15; p=0.02) but not in HES-ONS data (OR 1.05; 95% CI 0.98 to 1.13; p=0.16). Otherwise, little change was found in the strength of associated risk factors and mortality after accounting for missed matching bias. CONCLUSIONS Despite significant clinical and sociodemographic differences between matched and unmatched records, changes in mortality estimates were minimal. However, researchers and policy analysts using HES-ONS linked resources should be aware that administrative linkage processes can introduce error.
Collapse
|
14
|
Tapuria A, Kalra D, Curcin V. Feasibility of Using EN 13606 Clinical Archetypes for Defining Computable Phenotypes. Stud Health Technol Inform 2020; 270:228-232. [PMID: 32570380 DOI: 10.3233/shti200156] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
INTRODUCTION Computable phenotypes are gaining importance as structured and reproducible method of using electronic health data to identify people with certain clinical conditions. A formal standard is not available for defining and formally representing phenotyping algorithms. In this paper, we have tried to build a formal representation of such phenotyping algorithm. METHODS We built EN 13606 EHR standard for building clinical archetypes to represent the computable phenotyping algorithm for 'diagnosis of cardiac failure'. As part of this work, we created a set of new clinical archetypes for defining 'cardiac failure diagnosis'. The EN13606 editor called Object Dictionary Client was used which was in-house developed by University College London. We evaluated the ability of EN 13606 to provide clinical archetypes to define EHR phenotyping algorithms using the predefined desiderata for the purpose [Mo et al]. RESULTS EN 13606 archetypes could represent phenotype components grouped and nested based on their logical meaning. It was possible to build the EHR phenotyping algorithm with the clinical elements and their interrelationships along with hierarchical structure and temporal criteria. But the specific mathematical calculation and temporal relations involved in the algorithm was difficult to incorporate. These will need to be coded and integrated within the clinical information system. These archetypes can be mapped for comparison with the openEHR models. Binding to external clinical terminology is fully supported. However, it does not satisfy all the desiderata defined by Mo et al. A possible way could be an approach using phenotype ontologies and its architectural representation integrated with ISO interoperability. CONCLUSION The EN13606 archetypes can be used to define the phenotype algorithm that basically identifies patients by a set of clinical characteristics in their records. Phenotype representations defined in EN 13606 do not satisfy all the desiderata proposed by Mo et al. and thus currently has a limited ability to define the computable phenotyping algorithms. Further work is required to make the EN13606 standard to fully support the objective.
Collapse
|
15
|
Lindoerfer D, Mansmann U, Reinhardt I. Incorporation of Multiple Sources into IT - and Data Protection Concepts: Lessons Learned from the FARKOR Project. Stud Health Technol Inform 2020; 270:262-266. [PMID: 32570387 DOI: 10.3233/shti200163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The IT- and data protection concept of the FAmiliäres Risiko für das KOloRektale Karzinom (FARKOR) project will be presented. FARKOR is a risk adapted screening-project in Bavaria, Germany focusing on young adults with familial colorectal cancer (CRC). For each participant, data from different sources have to be integrated: Treatment records centrally administered by the resident doctors association (KVB), data from health insurance companies (HIC), and patient reported lifestyle data. Patient privacy rights must be observed. Record Linkage is performed by a central independent trust center. Data are decrypted, integrated and analyzed in a secure part of the scientific evaluation center with no connection to the internet (SECSP). The presented concept guarantees participants privacy through different identifiers, separation of responsibilities, data pseudonymization, public-private key encryption of medical data and encrypted data transfer.
Collapse
|
16
|
Nechuta S, Mukhopadhyay S, Krishnaswami S, Golladay M, McPheeters M. Record Linkage Approaches Using Prescription Drug Monitoring Program and Mortality Data for Public Health Analyses and Epidemiologic Studies. Epidemiology 2020; 31:22-31. [PMID: 31592867 PMCID: PMC6889900 DOI: 10.1097/ede.0000000000001110] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Accepted: 09/25/2019] [Indexed: 11/25/2022]
Abstract
BACKGROUND The use of Prescription Drug Monitoring Program (PDMP) data has greatly increased in recent years as these data have accumulated as part of the response to the opioid epidemic in the United States. We evaluated the accuracy of record linkage approaches using the Controlled Substance Monitoring Database (Tennessee's [TN] PDMP, 2012-2016) and mortality data on all drug overdose decedents in Tennessee (2013-2016). METHODS We compared total, missed, and false positive (FP) matches (with manual verification of all FPs) across approaches that included a variety of data cleaning and matching methods (probabilistic/fuzzy vs. deterministic) for patient and death linkages, and prescription history. We evaluated the influence of linkage approaches on key prescription measures used in public health analyses. We evaluated characteristics (e.g., age, education, sex) of missed matches and incorrect matches to consider potential bias. RESULTS The most accurate probabilistic/fuzzy matching approach identified 4,714 overdose deaths (vs. the deterministic approach, n = 4,572), with a low FP linkage error (<1%) and high correct match proportion (95% vs. 92% and ~90% for probabilistic approaches not using comprehensive data cleaning). Estimation of all prescription measures improved (vs. deterministic approach). For example, frequency (%) of decedents filling an oxycodone prescription in the last 60 days (n = 1,371 [32%] vs. n = 1,443 [33%]). Missed overdose decedents were more likely to be younger, male, nonwhite, and of higher education. CONCLUSION Implications of study findings include underreporting, prescribing and outcome misclassification, and reduced generalizability to population risk groups, information of importance to epidemiologists and researchers using PDMP data.
Collapse
|
17
|
Fraser C, Muller-Pebody B, Blackburn R, Gray J, Oddie SJ, Gilbert RE, Harron K. Linking surveillance and clinical data for evaluating trends in bloodstream infection rates in neonatal units in England. PLoS One 2019; 14:e0226040. [PMID: 31830076 PMCID: PMC6907823 DOI: 10.1371/journal.pone.0226040] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Accepted: 11/19/2019] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE To evaluate variation in trends in bloodstream infection (BSI) rates in neonatal units (NNUs) in England according to the data sources and linkage methods used. METHODS We used deterministic and probabilistic methods to link clinical records from 112 NNUs in the National Neonatal Research Database (NNRD) to national laboratory infection surveillance data from Public Health England. We calculated the proportion of babies in NNRD (aged <1 year and admitted between 2010-2017) with a BSI caused by clearly pathogenic organisms between two days after admission and two days after discharge. We used Poisson regression to determine trends in the proportion of babies with BSI based on i) deterministic and probabilistic linkage of NNRD and surveillance data (primary measure), ii) deterministic linkage of NNRD-surveillance data, iii) NNRD records alone, and iv) linked NNRD-surveillance data augmented with clinical records of laboratory-confirmed BSI in NNRD. RESULTS Using deterministic and probabilistic linkage, 5,629 of 349,740 babies admitted to a NNU in NNRD linked with 6,660 BSI episodes accounting for 38% of 17,388 BSI records aged <1 year in surveillance data. The proportion of babies with BSI due to clearly pathogenic organisms during their NNU admission was 1.0% using deterministic plus probabilistic linkage (primary measure), compared to 1.0% using deterministic linkage alone, 0.6% using NNRD records alone, and 1.2% using linkage augmented with clinical records of BSI in NNRD. Equivalent proportions for babies born before 32 weeks of gestation were 5.0%, 4.8%, 2.9% and 5.9%. The proportion of babies who linked to a BSI decreased by 7.5% each year (95% confidence interval [CI]: -14.3%, -0.1%) using deterministic and probabilistic linkage but was stable using clinical records of BSI or deterministic linkage alone. CONCLUSION Linkage that combines BSI records from national laboratory surveillance and clinical NNU data sources, and use of probabilistic methods, substantially improved ascertainment of BSI and estimates of BSI trends over time, compared with single data sources.
Collapse
|
18
|
Brennan JM, Wruck L, Pencina MJ, Clare RM, Lopes RD, Alexander JH, O'Brien S, Krucoff M, Rao SV, Wang TY, Curtis LH, Newby LK, Granger CB, Patel M, Mahaffey K, Ross JS, Normand SL, Eloff BC, Caños DA, Lokhnygina YV, Roe MT, Califf RM, Marinac-Dabic D, Peterson ED. Claims-based cardiovascular outcome identification for clinical research: Results from 7 large randomized cardiovascular clinical trials. Am Heart J 2019; 218:110-122. [PMID: 31726314 DOI: 10.1016/j.ahj.2019.09.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 09/05/2019] [Indexed: 12/14/2022]
Abstract
BACKGROUND Medicare insurance claims may provide an efficient means to ascertain follow-up of older participants in clinical research. We sought to determine the accuracy and completeness of claims- versus site-based follow-up with clinical event committee (+CEC) adjudication of cardiovascular outcomes. METHODS We performed a retrospective study using linked Medicare and Duke Database of Clinical Trials data. Medicare claims were linked to clinical data from 7 randomized cardiovascular clinical trials. Of 52,476 trial participants, linking resulted in 5,839 (of 10,497 linkage-eligible) Medicare-linked trial participants with fee-for-service A and B coverage. Death, myocardial infarction (MI), stroke, and revascularization incidences were compared using Medicare inpatient claims only, site-reported events (+CEC) only, or a combination of the 2. Randomized treatment effects were compared as a function of whether claims-based, site-based (+CEC), or a combined system was used for event detection. RESULTS Among the 5,839 study participants, the annual event rates were similar between claims- and site-based (+CEC) follow-up: death (overall rate 5.2% vs 5.2%; adjusted κ 0.99), MI (2.2% vs 2.3%; adjusted κ 0.96), stroke (0.7% vs 0.7%; adjusted κ 0.99), and any revascularization (7.4% vs 7.9%; adjusted κ 0.95). Of events detected by claims yet not reported by CEC, a minority were reported by sites but negatively adjudicated by CEC (39% of MIs and 18% of strokes). Differences in individual case concordance led to higher event rates when claims- and site-based (+CEC) systems were combined. Randomized treatment effects were similar among the 3 approaches for each outcome of interest. CONCLUSIONS Claims- versus site-based (+CEC) follow-up identified similar overall cardiovascular event rates despite meaningful differences in the events detected. Randomized treatment effects were similar using the 2 methods, suggesting claims data could be used to support clinical research leveraging routinely collected data. This approach may lead to more effective evidence generation, synthesis, and appraisal of medical products and inform the strategic approaches toward the National Evaluation System for Health Technology.
Collapse
|
19
|
Abstract
Linked data are increasingly being used for epidemiological research, to enhance primary research, and in planning, monitoring and evaluating public policy and services. Linkage error (missed links between records that relate to the same person or false links between unrelated records) can manifest in many ways: as missing data, measurement error and misclassification, unrepresentative sampling, or as a special combination of these that is specific to analysis of linked data: the merging and splitting of people that can occur when two hospital admission records are counted as one person admitted twice if linked and two people admitted once if not. Through these mechanisms, linkage error can ultimately lead to information bias and selection bias; so identifying relevant mechanisms is key in quantitative bias analysis. In this article we introduce five key concepts and a study classification system for identifying which mechanisms are relevant to any given analysis. We provide examples and discuss options for estimating parameters for bias analysis. This conceptual framework provides the 'links' between linkage error, information bias and selection bias, and lays the groundwork for quantitative bias analysis for linkage error.
Collapse
|
20
|
Delmestri A, Prieto-Alhambra D. CPRD GOLD and linked ONS mortality records: Reconciling guidelines. Int J Med Inform 2019; 136:104038. [PMID: 32078979 DOI: 10.1016/j.ijmedinf.2019.104038] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 10/19/2019] [Accepted: 11/26/2019] [Indexed: 12/17/2022]
Abstract
BACKGROUND The Clinical Practice Research Datalink (CPRD) GOLD is an extremely influential U.K. primary care dataset for epidemiological research having a number of published papers based on its data much bigger than any other U.K. primary care dataset. The Office for National Statistics (ONS) death data for England can be linked to GOLD at the patient level and are considered the gold standard on mortality. GOLD, which also holds death data, has been recently assessed against ONS linked dataset and the accuracy of its dates of death has been deemed sufficient for the majority of observational studies. However, there is a lack of guidance on how to manage the challenges existing when ONS mortality and GOLD datasets are linked, including linkage coverage period, linkage correctness likelihood, linkage regional limitations and data discrepancy. OBJECTIVES Provide reconciling guidelines on how to make maximum and at the same time trustworthy use of mortality information coming from both GOLD and ONS linked datasets with the aim of improving the quality, reproducibility, transparency and comparison of clinical research. METHOD AND RESULTS We have developed recommendations on how to manage mortality data coming from both GOLD and linked ONS, taking into account linkage coverage period, linkage correctness likelihood, linkage regional limitations and data discrepancies between these two datasets. We have also implemented these guidelines in an SQL algorithm for researchers to use. CONCLUSION We have provided detailed guidelines on the reconciliation of mortality data between GOLD and ONS linked death datasets, taking into account both their strengths and limitations. The consistent application of these guidelines made practical by an SQL algorithm, has the potential to improve clinical research quality, reproducibility, transparency and comparison.
Collapse
|
21
|
Norris KC, Duru OK, Alicic RZ, Daratha KB, Nicholas SB, McPherson SM, Bell DS, Shen JI, Jones CR, Moin T, Waterman AD, Neumiller JJ, Vargas RB, Bui AAT, Mangione CM, Tuttle KR. Rationale and design of a multicenter Chronic Kidney Disease (CKD) and at-risk for CKD electronic health records-based registry: CURE-CKD. BMC Nephrol 2019; 20:416. [PMID: 31747918 PMCID: PMC6868861 DOI: 10.1186/s12882-019-1558-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 09/12/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Chronic kidney disease (CKD) is a global public health problem, exhibiting sharp increases in incidence, prevalence, and attributable morbidity and mortality. There is a critical need to better understand the demographics, clinical characteristics, and key risk factors for CKD; and to develop platforms for testing novel interventions to improve modifiable risk factors, particularly for the CKD patients with a rapid decline in kidney function. METHODS We describe a novel collaboration between two large healthcare systems (Providence St. Joseph Health and University of California, Los Angeles Health) supported by leadership from both institutions, which was created to develop harmonized cohorts of patients with CKD or those at increased risk for CKD (hypertension/HTN, diabetes/DM, pre-diabetes) from electronic health record data. RESULTS The combined repository of candidate records included more than 3.3 million patients with at least a single qualifying measure for CKD and/or at-risk for CKD. The CURE-CKD registry includes over 2.6 million patients with and/or at-risk for CKD identified by stricter guide-line based criteria using a combination of administrative encounter codes, physical examinations, laboratory values and medication use. Notably, data based on race/ethnicity and geography in part, will enable robust analyses to study traditionally disadvantaged or marginalized patients not typically included in clinical trials. DISCUSSION CURE-CKD project is a unique multidisciplinary collaboration between nephrologists, endocrinologists, primary care physicians with health services research skills, health economists, and those with expertise in statistics, bio-informatics and machine learning. The CURE-CKD registry uses curated observations from real-world settings across two large healthcare systems and has great potential to provide important contributions for healthcare and for improving clinical outcomes in patients with and at-risk for CKD.
Collapse
|
22
|
Choudhary P, de Portu S, Arrieta A, Castañeda J, Campbell FM. Use of sensor-integrated pump therapy to reduce hypoglycaemia in people with Type 1 diabetes: a real-world study in the UK. Diabet Med 2019; 36:1100-1108. [PMID: 31134668 DOI: 10.1111/dme.14043] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/25/2019] [Indexed: 01/04/2023]
Abstract
AIMS To assess the efficacy of insulin pumps with automated insulin suspension systems in a real-world setting. METHODS We analysed anonymized data uploaded to CareLink™ by people (n=920) with Type 1 diabetes using the MiniMed Paradigm Veo system and the MiniMed 640G system (Medtronic International Trading Sàrl, Tolochanez, Switzerland) with SmartGuard technology, with or without automated insulin suspension enabled, between February 2016 and June 2018. Users with ≥15 days of sensor data and ≥70% sensor-wear time were classified as sensor-augmented pump alone, sensor-integrated pump with low glucose suspend enabled or sensor-integrated pump with predictive low glucose management enabled. RESULTS The median (25th -75th percentile) system use was 161 (58-348) days. The median time spent with sensor glucose values ≤3 mmol/l was 0.8 (0.3-1.7)% in the sensor-augmented pump group, 0.3 (0.1-0.7)% in the sensor-integrated pump with low glucose suspend group, and 0.3 (0.1-0.5)% in the sensor-integrated pump with predictive low glucose management group. In individuals switching from sensor-augmented pump to sensor-integrated pump with low glucose suspend (n=31), there were significant reductions in the monthly rate of hypoglycaemic events <3 mmol/l (rate ratio 0.63, 95% CI 0.45-0.89; P=0.009) and in the percentage of time with glucose values ≤3 mmol/l [sensor-augmented pump: 0.63% (95% CI 0.34-1.29), sensor-integrated pump with low glucose suspend: 0.33% (95% CI 0.16-0.64); P=0.001]. The monthly rate of hypoglycaemic events decreased further in individuals (n=139) switching from sensor-integrated pump with low glucose suspend to sensor-integrated pump with predictive low glucose management [rate ratio 0.82 (95% CI 0.69-0.98); P<0.0274]. Similar results were seen for events <3.9 mmol/l. There was no difference in median time spent in target glucose range. CONCLUSION Real-world UK data show that increasing automation of insulin suspension reduces hypoglycaemia exposure in people with Type 1 diabetes.
Collapse
|
23
|
Langner I, Ohlmeier C, Zeeb H, Haug U, Riedel O. Individual mortality information in the German Pharmacoepidemiological Research Database (GePaRD): a validation study using a record linkage with a large cancer registry. BMJ Open 2019; 9:e028223. [PMID: 31270118 PMCID: PMC6609119 DOI: 10.1136/bmjopen-2018-028223] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
OBJECTIVE Claims data need to be validated to assess their use for epidemiological research. This study aimed to examine the validity of mortality information in the German Pharmacoepidemiological Research Database (GePaRD). DESIGN Validation study, secondary data, medical claims. SETTING Claims data of two German nationwide acting statutory health insurance providers (SHIs) contributing data for GePaRD; record linkage with epidemiological cancer registry providing individual official mortality information. PARTICIPANTS All women insured with the two SHIs whose insurance coverage ended in the period 2006-2013 and who were residents of North Rhine Westphalia. MEASURES Descriptive statistics were used to analyse the performance of the linkage procedure. Further, we calculated measures of agreement between the official and the GePaRD-based vital status and assessed differences between the official and the GePaRD-based date of death. RESULTS Of the 256 111 women of the linkage sample, 25 528 were classified as 'deceased' in GePaRD and the others as 'alive'. Compared with the official data, the GePaRD-based vital status showed a sensitivity of 95.9% and a specificity of 99.4%. The negative predictive value was 99.6% and the positive predictive value 94.3%. The date of death agreed in 96.3% between both data sources. CONCLUSIONS The vital status recorded in GePaRD was of high accuracy and discrepancies between dates of death in GePaRD and official dates were rare. This underlines the potential of the database for conducting large cohort studies with mortality as the endpoint.
Collapse
|
24
|
Slotwiner DJ, Tarakji KG, Al-Khatib SM, Passman RS, Saxon LA, Peters NS, McCall D, Turakhia MP, Schaeffer J, Mendenhall GS, Hindricks G, Narayan SM, Davenport EE, Marrouche NF. Transparent sharing of digital health data: A call to action. Heart Rhythm 2019; 16:e95-e106. [PMID: 31077802 DOI: 10.1016/j.hrthm.2019.04.042] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/22/2019] [Indexed: 11/18/2022]
|
25
|
Lavery JA, Lipitz-Snyderman A, Li DG, Bach PB, Panageas KS. Identifying Cancer-Directed Surgeries in Medicare Claims: A Validation Study Using SEER-Medicare Data. JCO Clin Cancer Inform 2019; 3:1-24. [PMID: 30715928 PMCID: PMC6648680 DOI: 10.1200/cci.18.00093] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/19/2018] [Indexed: 02/06/2023] Open
Abstract
PURPOSE Medicare claims provide a rich data source for large-scale quality assessment because data are available for all beneficiaries nationally. For cancer surgery, the absence of information regarding site of cancer and date of diagnosis on an administrative claim necessitates testing to ensure accurate quality assessment and public reporting. METHODS Using the SEER Medicare-linked database as the gold standard, we developed and tested an approach to identify cancer-directed surgeries from Medicare fee-for-service claims alone. Our analysis evaluated two questions: (1) Can we identify a large percentage of patients who underwent a cancer-directed surgery using only Medicare claims? (2) Of all patients identified as having undergone a cancer-directed surgery, what percentage had cancer? We evaluated this approach for 17 primary cancer sites. RESULTS The number of Medicare beneficiaries diagnosed with their first cancer during the years 2011 to 2013 and who underwent cancer-directed surgery ranged from 45 patients (bones and joints) to 20,163 patients (breast). The percentage of cancer-directed surgeries identified using Medicare claims alone ranged from 62% (skin melanoma) to 94% (prostate). For all but three cancer sites (skin melanoma, thyroid, and urinary bladder), more than 80% of cancer-directed surgeries were identified using our approach. Of all surgeries identified, more than 90% were for patients with cancer. CONCLUSION Identifying patients who underwent a cancer-directed surgery from Medicare claims is feasible for many cancer sites, although careful consideration needs to be given to the validity of each site. Our findings support the use of Medicare claims for large-scale quality assessment of cancer surgery by disease site.
Collapse
|