1
|
Röchner P, Rothlauf F. Using machine learning to link electronic health records in cancer registries: On the tradeoff between linkage quality and manual effort. Int J Med Inform 2024; 185:105387. [PMID: 38428200 DOI: 10.1016/j.ijmedinf.2024.105387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 10/05/2023] [Accepted: 02/20/2024] [Indexed: 03/03/2024]
Abstract
BACKGROUND Cancer registries link a large number of electronic health records reported by medical institutions to already registered records of the matching individual and tumor. Records are automatically linked using deterministic and probabilistic approaches; machine learning is rarely used. Records that cannot be matched automatically with sufficient accuracy are typically processed manually. For application, it is important to know how well record linkage approaches match real-world records and how much manual effort is required to achieve the desired linkage quality. We study the task of linking reported records to the matching registered tumor in cancer registries. METHODS We compare the tradeoff between linkage quality and manual effort of five machine learning methods (logistic regression, random forest, gradient boosting, neural network, and a stacked method) to a deterministic baseline. The record linkage methods are compared in a two-class setting (no-match/ match) and a three-class setting (no-match/ undecided/ match). A cancer registry collected and linked the dataset consisting of categorical variables matching 145,755 reported records with 33,289 registered tumors. RESULTS In the two-class setting, the gradient boosting, neural network, and stacked models have higher accuracy and F1 score (accuracy: 0.968-0.978, F1 score: 0.983-0.988) than the deterministic baseline (accuracy: 0.964, F1 score: 0.980) when the same records are manually processed (0.89% of all records). In the three-class setting, these three machine learning methods can automatically process all reported records and still have higher accuracy and F1 score than the deterministic baseline. The linkage quality of the machine learning methods studied, except for the neural network, increase as the number of manually processed records increases. CONCLUSION Machine learning methods can significantly improve linkage quality and reduce the manual effort required by medical coders to match tumor records in cancer registries compared to a deterministic baseline. Our results help cancer registries estimate how linkage quality increases as more records are manually processed.
Collapse
Affiliation(s)
- Philipp Röchner
- Cancer Registry, Institute for Digital Health Data Rhineland-Palatinate, Große Bleiche 46, Mainz, 55116, Germany; Information Systems and Business Administration, Johannes Gutenberg University, Jakob-Welder-Weg 9, Mainz, 55128, Germany.
| | - Franz Rothlauf
- Information Systems and Business Administration, Johannes Gutenberg University, Jakob-Welder-Weg 9, Mainz, 55128, Germany
| |
Collapse
|
2
|
Kirilov N. Comparison of WebSocket and Hypertext Transfer Protocol for Transfer of Electronic Health Records. Stud Health Technol Inform 2024; 313:124-128. [PMID: 38682516 DOI: 10.3233/shti240023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
BACKGROUND Electronic health records (EHR) emerged as a digital record of the data that is generated in the healthcare. OBJECTIVES In this paper the transfer times of EHRs using the Hypertext Transfer Protocol and WebSocket in both local network and wide area network (WAN) are compared. METHODS A python web application to serve Fast Health Interoperability Resources (FHIR) records is created and the transfer times of the EHRs over both HTTP and WebSocket connection are measured. 45000 test Patient resources in 20, 50, 100 and 200 resources per Bundle transfers are used. RESULTS WebSocket showed much better transfer times of large amount of data. These were 18 s shorter in the local network and 342 s shorter in WAN for the 20 resource per Bundle transfer. CONCLUSION RESTful APIs are a convenient way to implement EHR servers; on the other hand, HTTP becomes a bottleneck when transferring large amount of data. WebSocket shows better transfer times and thus its superiority in such situations. The problem can be addressed by developing a new communication protocol or by using network tunneling to handle large data transfer of EHRs.
Collapse
Affiliation(s)
- Nikola Kirilov
- Institute of Medical Informatics, Heidelberg University Hospital, Germany
| |
Collapse
|
3
|
Kim JW, Choi H, Lim HJ, Oh M, Ahn JJ. Evaluating Linkage Quality of Population-Based Administrative Data for Health Service Research. J Korean Med Sci 2024; 39:e127. [PMID: 38622936 PMCID: PMC11018984 DOI: 10.3346/jkms.2024.39.e127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 03/11/2024] [Indexed: 04/17/2024] Open
Abstract
BACKGROUND To overcome the limitations of relying on data from a single institution, many researchers have studied data linkage methodologies. Data linkage includes errors owing to legal issues surrounding personal information and technical issues related to data processing. Linkage errors affect selection bias, and external and internal validity. Therefore, quality verification for each connection method with adherence to personal information protection is an important issue. This study evaluated the linkage quality of linked data and analyzed the potential bias resulting from linkage errors. METHODS This study analyzed claims data submitted to the Health Insurance Review and Assessment Service (HIRA DATA). The linkage errors of the two deterministic linkage methods were evaluated based on the use of the match key. The first deterministic linkage uses a unique identification number, and the second deterministic linkage uses the name, gender, and date of birth as a set of partial identifiers. The linkage error included in this deterministic linkage method was compared with the absolute standardized difference (ASD) of Cohen's according to the baseline characteristics, and the linkage quality was evaluated through the following indicators: linked rate, false match rate, missed match rate, positive predictive value, sensitivity, specificity, and F1-score. RESULTS For the deterministic linkage method that used the name, gender, and date of birth as a set of partial identifiers, the true match rate was 83.5 and the missed match rate was 16.5. Although there was bias in some characteristics of the data, most of the ASD values were less than 0.1, with no case greater than 0.5. Therefore, it is difficult to determine whether linked data constructed with deterministic linkages have substantial differences. CONCLUSION This study confirms the possibility of building health and medical data at the national level as the first data linkage quality verification study using big data from the HIRA. Analyzing the quality of linkages is crucial for comprehending linkage errors and generating reliable analytical outcomes. Linkers should increase the reliability of linked data by providing linkage error-related information to researchers. The results of this study will serve as reference data to increase the reliability of multicenter data linkage studies.
Collapse
Affiliation(s)
- Ji-Woo Kim
- Big Data Linkage Division, Health Insurance Review & Assessment Service, Wonju, Korea
| | - Hyojung Choi
- Digital Medical Technology Listing Division, Health Insurance Review & Assessment Service, Wonju, Korea
| | - Hyun Jeung Lim
- DRG Administration Division, Health Insurance Review & Assessment Service, Wonju, Korea
| | - Miae Oh
- Center for Research on Big Data Information, Korea Institute for Health and Social Affairs, Sejong, Korea
| | - Jae Joon Ahn
- Division of Data Science, Yonsei University, Wonju, Korea.
| |
Collapse
|
4
|
Lloyd LK, Nicholson C, Strange G, Celermajer DS. The burdensome logistics of data linkage in Australia - the example of a national registry for congenital heart disease. AUST HEALTH REV 2024; 48:8-15. [PMID: 38118279 DOI: 10.1071/ah23185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 11/21/2023] [Indexed: 12/22/2023]
Abstract
Objective Data linkage is a very powerful research tool in epidemiology, however, establishing this can be a lengthy and intensive process. This paper reports on the complex landscape of conducting data linkage projects in Australia. Methods We reviewed the processes, required documentation, and applications required to conduct multi-jurisdictional data linkage across Australia, in 2023. Results Obtaining the necessary approvals to conduct linkage will likely take nearly 2 years (estimated 730 days, including 605 days from initial submission to obtaining all ethical approvals and an estimated further 125 days for the issuance of unexpected additionally required approvals). Ethical review for linkage projects ranged from 51 to 128 days from submission to ethical approval, and applications consisted of 9-25 documents. Conclusions Major obstacles to conducting multi-jurisdictional data linkage included the complexity of the process, and substantial time and financial costs. The process was characterised by inefficiencies at several levels, reduplication, and a lack of any key accountabilities for timely performance of processes. Data linkage is an invaluable resource for epidemiological research. Further streamlining, establishing accountability, and greater collaboration between jurisdictions is needed to ensure data linkage is both accessible and feasible to researchers.
Collapse
Affiliation(s)
- Larissa K Lloyd
- Clinical Research Group, Heart Research Institute, Sydney, NSW, Australia; and Cardiology Department, Royal Prince Alfred Hospital, Level 6, Building 75, Missenden Road, Camperdown, Sydney, NSW 2050, Australia; and Faculty of Medicine, The University of Sydney, Sydney, NSW, Australia
| | - Calum Nicholson
- Clinical Research Group, Heart Research Institute, Sydney, NSW, Australia; and Cardiology Department, Royal Prince Alfred Hospital, Level 6, Building 75, Missenden Road, Camperdown, Sydney, NSW 2050, Australia; and Faculty of Medicine, The University of Sydney, Sydney, NSW, Australia
| | - Geoff Strange
- Clinical Research Group, Heart Research Institute, Sydney, NSW, Australia; and Cardiology Department, Royal Prince Alfred Hospital, Level 6, Building 75, Missenden Road, Camperdown, Sydney, NSW 2050, Australia; and Faculty of Medicine, The University of Sydney, Sydney, NSW, Australia
| | - David S Celermajer
- Clinical Research Group, Heart Research Institute, Sydney, NSW, Australia; and Cardiology Department, Royal Prince Alfred Hospital, Level 6, Building 75, Missenden Road, Camperdown, Sydney, NSW 2050, Australia; and Faculty of Medicine, The University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
5
|
Silverwood RJ, Rajah N, Calderwood L, De Stavola BL, Harron K, Ploubidis GB. Examining the quality and population representativeness of linked survey and administrative data: guidance and illustration using linked 1958 National Child Development Study and Hospital Episode Statistics data. Int J Popul Data Sci 2024; 9:2137. [PMID: 38425790 PMCID: PMC10901060 DOI: 10.23889/ijpds.v9i1.2137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024] Open
Abstract
Introduction Recent years have seen an increase in linkages between survey and administrative data. It is important to evaluate the quality of such data linkages to discern the likely reliability of ensuing research. Evaluation of linkage quality and bias can be conducted using different approaches, but many of these are not possible when there is a separation of processes for linkage and analysis to help preserve privacy, as is typically the case in the UK (and elsewhere). Objectives We aimed to describe a suite of generalisable methods to evaluate linkage quality and population representativeness of linked survey and administrative data which remain tractable when users of the linked data are not party to the linkage process itself. We emphasise issues particular to longitudinal survey data throughout. Methods Our proposed approaches cover several areas: i) Linkage rates, ii) Selection into response, linkage consent and successful linkage, iii) Linkage quality, and iv) Linked data population representativeness. We illustrate these methods using a recent linkage between the 1958 National Child Development Study (NCDS; a cohort following an initial 17,415 people born in Great Britain in a single week of 1958) and Hospital Episode Statistics (HES) databases (containing important information regarding admissions, accident and emergency attendances and outpatient appointments at NHS hospitals in England). Results Our illustrative analyses suggest that the linkage quality of the NCDS-HES data is high and that the linked sample maintains an excellent level of population representativeness with respect to the single dimension we assessed. Conclusions Through this work we hope to encourage providers and users of linked data resources to undertake and publish thorough evaluations. We further hope that providing illustrative analyses using linked NCDS-HES data will improve the quality and transparency of research using this particular linked data resource.
Collapse
Affiliation(s)
- Richard J. Silverwood
- Centre for Longitudinal Studies, UCL Social Research Institute, 20 Bedford Way, London WC1H 0AL
| | - Nasir Rajah
- Centre for Longitudinal Studies, UCL Social Research Institute, 20 Bedford Way, London WC1H 0AL
| | - Lisa Calderwood
- Centre for Longitudinal Studies, UCL Social Research Institute, 20 Bedford Way, London WC1H 0AL
| | - Bianca L. De Stavola
- Population, Policy & Practice Research and Teaching Department, UCL Great Ormond Street Institute of Child Health, 30 Guilford Street, London WC1N 1EH
| | - Katie Harron
- Population, Policy & Practice Research and Teaching Department, UCL Great Ormond Street Institute of Child Health, 30 Guilford Street, London WC1N 1EH
| | - George B. Ploubidis
- Centre for Longitudinal Studies, UCL Social Research Institute, 20 Bedford Way, London WC1H 0AL
| |
Collapse
|
6
|
Kamat G, Shan M, Gutman R. Bayesian record linkage with variables in one file. Stat Med 2023; 42:4931-4951. [PMID: 37652076 DOI: 10.1002/sim.9894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 06/12/2023] [Accepted: 08/21/2023] [Indexed: 09/02/2023]
Abstract
In many healthcare and social science applications, information about units is dispersed across multiple data files. Linking records across files is necessary to estimate the associations of interest. Common record linkage algorithms only rely on similarities between linking variables that appear in all the files. Moreover, analysis of linked files often ignores errors that may arise from incorrect or missed links. Bayesian record linking methods allow for natural propagation of linkage error, by jointly sampling the linkage structure and the model parameters. We extend an existing Bayesian record linkage method to integrate associations between variables exclusive to each file being linked. We show analytically, and using simulations, that the proposed method can improve the linking process, and can result in accurate inferences. We apply the method to link Meals on Wheels recipients to Medicare enrollment records.
Collapse
Affiliation(s)
- Gauri Kamat
- Department of Biostatistics, Brown University, Providence, Rhode Island, USA
| | | | - Roee Gutman
- Department of Biostatistics, Brown University, Providence, Rhode Island, USA
| |
Collapse
|
7
|
Prindle J, Suthar H, Putnam-Hornstein E. An open-source probabilistic record linkage process for records with family-level information: Simulation study and applied analysis. PLoS One 2023; 18:e0291581. [PMID: 37862306 PMCID: PMC10588881 DOI: 10.1371/journal.pone.0291581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 08/31/2023] [Indexed: 10/22/2023] Open
Abstract
Research with administrative records involves the challenge of limited information in any single data source to answer policy-related questions. Record linkage provides researchers with a tool to supplement administrative datasets with other information about the same people when identified in separate sources as matched pairs. Several solutions are available for undertaking record linkage, producing linkage keys for merging data sources for positively matched pairs of records. In the current manuscript, we demonstrate a new application of the Python RecordLinkage package to family-based record linkages with machine learning algorithms for probability scoring, which we call probabilistic record linkage for families (PRLF). First, a simulation of administrative records identifies PRLF accuracy with variations in match and data degradation percentages. Accuracy is largely influenced by degradation (e.g., missing data fields, mismatched values) compared to the percentage of simulated matches. Second, an application of data linkage is presented to compare regression model estimate performance across three record linkage solutions (PRLF, ChoiceMaker, and Link Plus). Our findings indicate that all three solutions, when optimized, provide similar results for researchers. Strengths of our process, such as the use of ensemble methods, to improve match accuracy are discussed. We then identify caveats of record linkage in the context of administrative data.
Collapse
Affiliation(s)
- John Prindle
- Suzanne Dworak-Peck School of Social Work, University of Southern California, Los Angeles, Los Angeles, California, United States America
| | - Himal Suthar
- Keck School of Medicine, University of Southern California, Los Angeles, Los Angeles, California, United States America
| | - Emily Putnam-Hornstein
- School of Social Work, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States America
| |
Collapse
|
8
|
Garcia KKS, de Miranda CB, de Sousa FNEF. Procedures for health data linkage: applications in health surveillance. Epidemiol Serv Saude 2022; 31:e20211272. [PMID: 36287481 PMCID: PMC9887966 DOI: 10.1590/s2237-96222022000300004] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 07/08/2022] [Indexed: 12/23/2022] Open
Abstract
OBJECTIVE To present a standardized methodology for linking different public health databases. METHODS This was a methodological review article specifically describing data processing procedures for deterministic linkage between structured databases. It instructs on how to: treat data, select linkage keys, and link databases using two databases simulated in R software. RESULTS The commands used for the deterministic linkage of the inner_join type were presented. The linkage process resulted in a database with 40,108 pairs using only the "Name" key. Adding the second key, "Name of mother", the resulted dropped to 112 pairs. By adding the third key, "Date of birth", only two pairs were identified. CONCLUSION Database linkage and its analysis are valid and valuable tools for health services in supporting health surveillance actions.
Collapse
|
9
|
Heng Y, Armknecht F, Chen Y, Schnell R. On the effectiveness of graph matching attacks against privacy-preserving record linkage. PLoS One 2022; 17:e0267893. [PMID: 36137086 PMCID: PMC9499274 DOI: 10.1371/journal.pone.0267893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 04/19/2022] [Indexed: 11/19/2022] Open
Abstract
Linking several databases containing information on the same person is an essential step of many data workflows. Due to the potential sensitivity of the data, the identity of the persons should be kept private. Privacy-Preserving Record-Linkage (PPRL) techniques have been developed to link persons despite errors in the identifiers used to link the databases without violating their privacy. The basic approach is to use encoded quasi-identifiers instead of plain quasi-identifiers for making the linkage decision. Ideally, the encoded quasi-identifiers should prevent re-identification but still allow for a good linkage quality. While several PPRL techniques have been proposed so far, Bloom filter-based PPRL schemes (BF-PPRL) are among the most popular due to their scalability. However, a recently proposed attack on BF-PPRL based on graph similarities seems to allow individuals’ re-identification from encoded quasi-identifiers. Therefore, the graph matching attack is widely considered a serious threat to many PPRL-approaches and leads to the situation that BF-PPRL schemes are rejected as being insecure. In this work, we argue that this view is not fully justified. We show by experiments that the success of graph matching attacks requires a high overlap between encoded and plain records used for the attack. As soon as this condition is not fulfilled, the success rate sharply decreases and renders the attacks hardly effective. This necessary condition does severely limit the applicability of these attacks in practice and also allows for simple but effective countermeasures.
Collapse
Affiliation(s)
- Youzhe Heng
- School of Business Informatics and Mathematics, University of Mannheim, Mannheim, Baden-Württemberg, Germany
- * E-mail:
| | - Frederik Armknecht
- School of Business Informatics and Mathematics, University of Mannheim, Mannheim, Baden-Württemberg, Germany
| | - Yanling Chen
- Research Methodology Group, University of Duisburg-Essen, Duisburg, Nordrhein-Westfalen, Germany
| | - Rainer Schnell
- Research Methodology Group, University of Duisburg-Essen, Duisburg, Nordrhein-Westfalen, Germany
| |
Collapse
|
10
|
Libuy N, Harron K, Gilbert R, Caulton R, Cameron E, Blackburn R. Linking education and hospital data in England: linkage process and quality. Int J Popul Data Sci 2021; 6:1671. [PMID: 34568585 PMCID: PMC8445153 DOI: 10.23889/ijpds.v6i1.1671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022] Open
Abstract
INTRODUCTION Linkage of administrative data for universal state education and National Health Service (NHS) hospital care would enable research into the inter-relationships between education and health for all children in England. OBJECTIVES We aim to describe the linkage process and evaluate the quality of linkage of four one-year birth cohorts within the National Pupil Database (NPD) and Hospital Episode Statistics (HES). METHODS We used multi-step deterministic linkage algorithms to link longitudinal records from state schools to the chronology of records in the NHS Personal Demographics Service (PDS; linkage stage 1), and HES (linkage stage 2). We calculated linkage rates and compared pupil characteristics in linked and unlinked samples for each stage of linkage and each cohort (1990/91, 1996/97, 1999/00, and 2004/05). RESULTS Of the 2,287,671 pupil records, 2,174,601 (95%) linked to HES. Linkage rates improved over time (92% in 1990/91 to 99% in 2004/05). Ethnic minority pupils and those living in more deprived areas were less likely to be matched to hospital records, but differences in pupil characteristics between linked and unlinked samples were moderate to small. CONCLUSION We linked nearly all pupils to at least one hospital record. The high coverage of the linkage represents a unique opportunity for wide-scale analyses across the domains of health and education. However, missed links disproportionately affected ethnic minorities or those living in the poorest neighbourhoods: selection bias could be mitigated by increasing the quality and completeness of identifiers recorded in administrative data or the application of statistical methods that account for missed links. HIGHLIGHTS Longitudinal administrative records for all children attending state school and acute hospital services in England have been used for research for more than two decades, but lack of a shared unique identifier has limited scope for linkage between these databases.We applied multi-step deterministic linkage algorithms to 4 one-year cohorts of children born 1 September-31 August in 1990/91, 1996/97, 1999/00 and 2004/05. In stage 1, full names, date of birth, and postcode histories from education data in the National Pupil Database were linked to the NHS Personal Demographic Service. In stage 2, NHS number, postcode, date of birth and sex were linked to hospital records in Hospital Episode Statistics.Between 92% and 99% of school pupils linked to at least one hospital record. Ethnic minority pupils and pupils who were living in the most deprived areas were least likely to link. Ethnic minority pupils were less likely than white children to link at the first step in both algorithms.Bias due to linkage errors could lead to an underestimate of the health needs in disadvantaged groups. Improved data quality, more sensitive linkage algorithms, and/or statistical methods that account for missed links in analyses, should be considered to reduce linkage bias.
Collapse
Affiliation(s)
- Nicolás Libuy
- Institute of Health Informatics, University College London, London, NW1 2DA, UK
| | - Katie Harron
- Institute of Health Informatics, University College London, London, NW1 2DA, UK
- UCL Great Ormond Street Institute of Child Health, University College London, London, WC1N 1EH, UK
| | - Ruth Gilbert
- Institute of Health Informatics, University College London, London, NW1 2DA, UK
- UCL Great Ormond Street Institute of Child Health, University College London, London, WC1N 1EH, UK
| | | | | | - Ruth Blackburn
- Institute of Health Informatics, University College London, London, NW1 2DA, UK
| |
Collapse
|
11
|
Abstract
BACKGROUND Accurate identification of maternal deaths is paramount for audit and policy purposes. Our aim was to determine the accuracy and completeness of data on maternal deaths in hospital and those recorded on a death certificate, and the level of agreement between the 2 data sources. METHODS We conducted a retrospective population-based study using data for Ontario, Canada, from Apr. 1, 2002, to Dec. 31, 2015. We used Canadian Institute for Health Information (CIHI) databases to identify deaths during inpatient, emergency department and same-day surgery encounters. We captured Vital Statistics deaths in the Office of the Registrar General, Deaths (ORGD) data set. Deaths were considered within 42 days and within 365 days after a pregnancy outcome (live birth, miscarriage, ectopic pregnancy or induced abortion) for all multiple and singleton pregnancies. We calculated agreement statistics and 95% confidence intervals (CIs). RESULTS Among 1 679 455 live births and stillbirths, 398 pregnancy-related deaths in the ORGD data set were mapped to a birth in CIHI databases, and 77 (16.2%) were not. Among 2 039 849 recognized pregnancies, 534 pregnancy-related deaths in the ORGD data set were linked to CIHI records, and 68 (11.3%) were not. Among live births and stillbirths, after pregnancy-related deaths in the ORGD data set not matched to a maternal death in the CIHI databases were removed, concordance measures between CIHI and ORGD records for maternal death within 42 days after delivery included a κ value of 0.87 (95% CI 0.82-0.91) and positive percent agreement of 0.88 (95% CI 0.83-0.94). The corresponding measures were similar for maternal death within 42 days after the end of a recognized pregnancy. When unlinked pregnancy-related deaths in the ORGD data set were retained, agreement measures declined for death within 42 days after a live birth or stillbirth (κ = 0.68, 95% CI 0.62-0.74). For maternal death within 365 days after a live birth or stillbirth, or after the end of a recognized pregnancy, the concordance statistics were generally favourable when unlinked pregnancy-related deaths in the ORGD data set were removed but were substantially declined when they were retained. INTERPRETATION Maternal mortality cannot be ascertained solely with the use of hospital data, including beyond 42 days after the end of pregnancy. To improve linkage, we propose including health insurance numbers on provincial and territorial medical death certificates.
Collapse
Affiliation(s)
- Kayvan Aflaki
- Institute of Medical Science (Aflaki), University of Toronto; ICES Central (Park), Toronto, Ont.; Maternal, Child and Youth Health Division (Nelson, Luo), Centre for Surveillance and Applied Research, Public Health Agency of Canada, Ottawa, Ont.; Departments of Medicine (Ray) and Obstetrics and Gynecology (Ray), St. Michael's Hospital, Toronto, Ont
| | - Alison L Park
- Institute of Medical Science (Aflaki), University of Toronto; ICES Central (Park), Toronto, Ont.; Maternal, Child and Youth Health Division (Nelson, Luo), Centre for Surveillance and Applied Research, Public Health Agency of Canada, Ottawa, Ont.; Departments of Medicine (Ray) and Obstetrics and Gynecology (Ray), St. Michael's Hospital, Toronto, Ont
| | - Chantal Nelson
- Institute of Medical Science (Aflaki), University of Toronto; ICES Central (Park), Toronto, Ont.; Maternal, Child and Youth Health Division (Nelson, Luo), Centre for Surveillance and Applied Research, Public Health Agency of Canada, Ottawa, Ont.; Departments of Medicine (Ray) and Obstetrics and Gynecology (Ray), St. Michael's Hospital, Toronto, Ont
| | - Wei Luo
- Institute of Medical Science (Aflaki), University of Toronto; ICES Central (Park), Toronto, Ont.; Maternal, Child and Youth Health Division (Nelson, Luo), Centre for Surveillance and Applied Research, Public Health Agency of Canada, Ottawa, Ont.; Departments of Medicine (Ray) and Obstetrics and Gynecology (Ray), St. Michael's Hospital, Toronto, Ont
| | - Joel G Ray
- Institute of Medical Science (Aflaki), University of Toronto; ICES Central (Park), Toronto, Ont.; Maternal, Child and Youth Health Division (Nelson, Luo), Centre for Surveillance and Applied Research, Public Health Agency of Canada, Ottawa, Ont.; Departments of Medicine (Ray) and Obstetrics and Gynecology (Ray), St. Michael's Hospital, Toronto, Ont.
| |
Collapse
|
12
|
Chen Y, Wen H, Griffin R, Roach MJ, Kelly ML. Linking Individual Data From the Spinal Cord Injury Model Systems Center and Local Trauma Registry: Development and Validation of Probabilistic Matching Algorithm. Top Spinal Cord Inj Rehabil 2021; 26:221-231. [PMID: 33536727 PMCID: PMC7831288 DOI: 10.46292/sci20-00015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
BACKGROUND Linking records from the National Spinal Cord Injury Model Systems (SCIMS) database to the National Trauma Data Bank (NTDB) provides a unique opportunity to study early variables in predicting long-term outcomes after traumatic spinal cord injury (SCI). The public use data sets of SCIMS and NTDB are stripped of protected health information, including dates and zip code. OBJECTIVES To develop and validate a probabilistic algorithm linking data from an SCIMS center and its affiliated trauma registry. METHOD Data on SCI admissions 2011-2018 were retrieved from an SCIMS center (n = 302) and trauma registry (n = 723), of which 202 records had the same medical record number. The SCIMS records were divided equally into two data sets for algorithm development and validation, respectively. We used a two-step approach: blocking and weight generation for linking variables (race, insurance, height, and weight). RESULTS In the development set, 257 SCIMS-trauma pairs shared the same sex, age, and injury year across 129 clusters, of which 91 records were true-match. The probabilistic algorithm identified 65 of the 91 true-match records (sensitivity, 71.4%) with a positive predictive value (PPV) of 80.2%. The algorithm was validated over 282 SCIMS-trauma pairs across 127 clusters and had a sensitivity of 73.7% and PPV of 81.1%. Post hoc analysis shows the addition of injury date and zip code improved the specificity from 57.9% to 94.7%. CONCLUSION We demonstrate the feasibility of probabilistic linkage between SCIMS and trauma records, which needs further refinement and validation. Gaining access to injury date and zip code would improve record linkage significantly.
Collapse
Affiliation(s)
- Yuying Chen
- Department of Physical Medicine and Rehabilitation, University of Alabama at Birmingham, Birmingham, Alabama
| | - Huacong Wen
- Department of Physical Medicine and Rehabilitation, University of Alabama at Birmingham, Birmingham, Alabama
| | - Russel Griffin
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, Alabama
| | - Mary Joan Roach
- Department of Physical Medicine and Rehabilitation, Case Western Reserve University School of Medicine, Cleveland, Ohio
- Center for Health Research & Policy, MetroHealth Medical System, Cleveland, Ohio
| | - Michael L. Kelly
- Department of Neurosurgery, Case Western Reserve University School of Medicine, MetroHealth Medical Center, Cleveland, Ohio
| |
Collapse
|
13
|
Jewell A, Broadbent M, Hayes RD, Gilbert R, Stewart R, Downs J. Impact of matching error on linked mortality outcome in a data linkage of secondary mental health data with Hospital Episode Statistics (HES) and mortality records in South East London: a cross-sectional study. BMJ Open 2020; 10:e035884. [PMID: 32641360 PMCID: PMC7342822 DOI: 10.1136/bmjopen-2019-035884] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
OBJECTIVES Linkage of electronic health records (EHRs) to Hospital Episode Statistics (HES)-Office for National Statistics (ONS) mortality data has provided compelling evidence for lower life expectancy in people with severe mental illness. However, linkage error may underestimate these estimates. Using a clinical sample (n=265 300) of individuals accessing mental health services, we examined potential biases introduced through missed matching and examined the impact on the association between clinical disorders and mortality. SETTING The South London and Maudsley NHS Foundation Trust (SLaM) is a secondary mental healthcare provider in London. A deidentified version of SLaM's EHR was available via the Clinical Record Interactive Search system linked to HES-ONS mortality records. PARTICIPANTS Records from SLaM for patients active between January 2006 and December 2016. OUTCOME MEASURES Two sources of death data were available for SLaM participants: accurate and contemporaneous date of death via local batch tracing (gold standard) and date of death via linked HES-ONS mortality data. The effect of linkage error on mortality estimates was evaluated by comparing sociodemographic and clinical risk factor analyses using gold standard death data against HES-ONS mortality records. RESULTS Of the total sample, 93.74% were successfully matched to HES-ONS records. We found a number of statistically significant administrative, sociodemographic and clinical differences between matched and unmatched records. Of note, schizophrenia diagnosis showed a significant association with higher mortality using gold standard data (OR 1.08; 95% CI 1.01 to 1.15; p=0.02) but not in HES-ONS data (OR 1.05; 95% CI 0.98 to 1.13; p=0.16). Otherwise, little change was found in the strength of associated risk factors and mortality after accounting for missed matching bias. CONCLUSIONS Despite significant clinical and sociodemographic differences between matched and unmatched records, changes in mortality estimates were minimal. However, researchers and policy analysts using HES-ONS linked resources should be aware that administrative linkage processes can introduce error.
Collapse
Affiliation(s)
- Amelia Jewell
- South London and Maudsley NHS Foundation Trust, London, UK
| | | | - Richard D Hayes
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, UK
| | - Ruth Gilbert
- Centre for Paediatric Epidemiology and Biostatistics, UCL Institute of Child Health, London, UK
| | - Robert Stewart
- South London and Maudsley NHS Foundation Trust, London, UK
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, UK
| | - Johnny Downs
- South London and Maudsley NHS Foundation Trust, London, UK
- Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, UK
| |
Collapse
|
14
|
Abstract
INTRODUCTION Computable phenotypes are gaining importance as structured and reproducible method of using electronic health data to identify people with certain clinical conditions. A formal standard is not available for defining and formally representing phenotyping algorithms. In this paper, we have tried to build a formal representation of such phenotyping algorithm. METHODS We built EN 13606 EHR standard for building clinical archetypes to represent the computable phenotyping algorithm for 'diagnosis of cardiac failure'. As part of this work, we created a set of new clinical archetypes for defining 'cardiac failure diagnosis'. The EN13606 editor called Object Dictionary Client was used which was in-house developed by University College London. We evaluated the ability of EN 13606 to provide clinical archetypes to define EHR phenotyping algorithms using the predefined desiderata for the purpose [Mo et al]. RESULTS EN 13606 archetypes could represent phenotype components grouped and nested based on their logical meaning. It was possible to build the EHR phenotyping algorithm with the clinical elements and their interrelationships along with hierarchical structure and temporal criteria. But the specific mathematical calculation and temporal relations involved in the algorithm was difficult to incorporate. These will need to be coded and integrated within the clinical information system. These archetypes can be mapped for comparison with the openEHR models. Binding to external clinical terminology is fully supported. However, it does not satisfy all the desiderata defined by Mo et al. A possible way could be an approach using phenotype ontologies and its architectural representation integrated with ISO interoperability. CONCLUSION The EN13606 archetypes can be used to define the phenotype algorithm that basically identifies patients by a set of clinical characteristics in their records. Phenotype representations defined in EN 13606 do not satisfy all the desiderata proposed by Mo et al. and thus currently has a limited ability to define the computable phenotyping algorithms. Further work is required to make the EN13606 standard to fully support the objective.
Collapse
|
15
|
Lindoerfer D, Mansmann U, Reinhardt I. Incorporation of Multiple Sources into IT - and Data Protection Concepts: Lessons Learned from the FARKOR Project. Stud Health Technol Inform 2020; 270:262-266. [PMID: 32570387 DOI: 10.3233/shti200163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The IT- and data protection concept of the FAmiliäres Risiko für das KOloRektale Karzinom (FARKOR) project will be presented. FARKOR is a risk adapted screening-project in Bavaria, Germany focusing on young adults with familial colorectal cancer (CRC). For each participant, data from different sources have to be integrated: Treatment records centrally administered by the resident doctors association (KVB), data from health insurance companies (HIC), and patient reported lifestyle data. Patient privacy rights must be observed. Record Linkage is performed by a central independent trust center. Data are decrypted, integrated and analyzed in a secure part of the scientific evaluation center with no connection to the internet (SECSP). The presented concept guarantees participants privacy through different identifiers, separation of responsibilities, data pseudonymization, public-private key encryption of medical data and encrypted data transfer.
Collapse
Affiliation(s)
- Doris Lindoerfer
- Institute for medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Ulrich Mansmann
- Institute for medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Isabel Reinhardt
- Institute for medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians-Universität München, Munich, Germany
- Medical center of the university of Munich
| |
Collapse
|
16
|
Nechuta S, Mukhopadhyay S, Krishnaswami S, Golladay M, McPheeters M. Record Linkage Approaches Using Prescription Drug Monitoring Program and Mortality Data for Public Health Analyses and Epidemiologic Studies. Epidemiology 2020; 31:22-31. [PMID: 31592867 PMCID: PMC6889900 DOI: 10.1097/ede.0000000000001110] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Accepted: 09/25/2019] [Indexed: 11/25/2022]
Abstract
BACKGROUND The use of Prescription Drug Monitoring Program (PDMP) data has greatly increased in recent years as these data have accumulated as part of the response to the opioid epidemic in the United States. We evaluated the accuracy of record linkage approaches using the Controlled Substance Monitoring Database (Tennessee's [TN] PDMP, 2012-2016) and mortality data on all drug overdose decedents in Tennessee (2013-2016). METHODS We compared total, missed, and false positive (FP) matches (with manual verification of all FPs) across approaches that included a variety of data cleaning and matching methods (probabilistic/fuzzy vs. deterministic) for patient and death linkages, and prescription history. We evaluated the influence of linkage approaches on key prescription measures used in public health analyses. We evaluated characteristics (e.g., age, education, sex) of missed matches and incorrect matches to consider potential bias. RESULTS The most accurate probabilistic/fuzzy matching approach identified 4,714 overdose deaths (vs. the deterministic approach, n = 4,572), with a low FP linkage error (<1%) and high correct match proportion (95% vs. 92% and ~90% for probabilistic approaches not using comprehensive data cleaning). Estimation of all prescription measures improved (vs. deterministic approach). For example, frequency (%) of decedents filling an oxycodone prescription in the last 60 days (n = 1,371 [32%] vs. n = 1,443 [33%]). Missed overdose decedents were more likely to be younger, male, nonwhite, and of higher education. CONCLUSION Implications of study findings include underreporting, prescribing and outcome misclassification, and reduced generalizability to population risk groups, information of importance to epidemiologists and researchers using PDMP data.
Collapse
Affiliation(s)
- Sarah Nechuta
- From the Tennessee Department of Health, Office of Informatics and Analytics, 710 James Robertson Parkway, Nashville, TN
| | - Sutapa Mukhopadhyay
- From the Tennessee Department of Health, Office of Informatics and Analytics, 710 James Robertson Parkway, Nashville, TN
| | - Shanthi Krishnaswami
- From the Tennessee Department of Health, Office of Informatics and Analytics, 710 James Robertson Parkway, Nashville, TN
| | - Molly Golladay
- From the Tennessee Department of Health, Office of Informatics and Analytics, 710 James Robertson Parkway, Nashville, TN
| | - Melissa McPheeters
- From the Tennessee Department of Health, Office of Informatics and Analytics, 710 James Robertson Parkway, Nashville, TN
| |
Collapse
|
17
|
Fraser C, Muller-Pebody B, Blackburn R, Gray J, Oddie SJ, Gilbert RE, Harron K. Linking surveillance and clinical data for evaluating trends in bloodstream infection rates in neonatal units in England. PLoS One 2019; 14:e0226040. [PMID: 31830076 PMCID: PMC6907823 DOI: 10.1371/journal.pone.0226040] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Accepted: 11/19/2019] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE To evaluate variation in trends in bloodstream infection (BSI) rates in neonatal units (NNUs) in England according to the data sources and linkage methods used. METHODS We used deterministic and probabilistic methods to link clinical records from 112 NNUs in the National Neonatal Research Database (NNRD) to national laboratory infection surveillance data from Public Health England. We calculated the proportion of babies in NNRD (aged <1 year and admitted between 2010-2017) with a BSI caused by clearly pathogenic organisms between two days after admission and two days after discharge. We used Poisson regression to determine trends in the proportion of babies with BSI based on i) deterministic and probabilistic linkage of NNRD and surveillance data (primary measure), ii) deterministic linkage of NNRD-surveillance data, iii) NNRD records alone, and iv) linked NNRD-surveillance data augmented with clinical records of laboratory-confirmed BSI in NNRD. RESULTS Using deterministic and probabilistic linkage, 5,629 of 349,740 babies admitted to a NNU in NNRD linked with 6,660 BSI episodes accounting for 38% of 17,388 BSI records aged <1 year in surveillance data. The proportion of babies with BSI due to clearly pathogenic organisms during their NNU admission was 1.0% using deterministic plus probabilistic linkage (primary measure), compared to 1.0% using deterministic linkage alone, 0.6% using NNRD records alone, and 1.2% using linkage augmented with clinical records of BSI in NNRD. Equivalent proportions for babies born before 32 weeks of gestation were 5.0%, 4.8%, 2.9% and 5.9%. The proportion of babies who linked to a BSI decreased by 7.5% each year (95% confidence interval [CI]: -14.3%, -0.1%) using deterministic and probabilistic linkage but was stable using clinical records of BSI or deterministic linkage alone. CONCLUSION Linkage that combines BSI records from national laboratory surveillance and clinical NNU data sources, and use of probabilistic methods, substantially improved ascertainment of BSI and estimates of BSI trends over time, compared with single data sources.
Collapse
Affiliation(s)
- Caroline Fraser
- UCL Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
- * E-mail:
| | | | - Ruth Blackburn
- Institute of Health Informatics, University College London, London, United Kingdom
| | - Jim Gray
- Microbiology, Birmingham Women’s & Children’s Hospitals, Birmingham, United Kingdom
| | - Sam J. Oddie
- Bradford Neonatology, Bradford Royal Infirmary, Bradford, United Kingdom
- Centre for Reviews and Dissemination, University of York, York, United Kingdom
| | - Ruth E. Gilbert
- UCL Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Katie Harron
- UCL Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| |
Collapse
|
18
|
Brennan JM, Wruck L, Pencina MJ, Clare RM, Lopes RD, Alexander JH, O'Brien S, Krucoff M, Rao SV, Wang TY, Curtis LH, Newby LK, Granger CB, Patel M, Mahaffey K, Ross JS, Normand SL, Eloff BC, Caños DA, Lokhnygina YV, Roe MT, Califf RM, Marinac-Dabic D, Peterson ED. Claims-based cardiovascular outcome identification for clinical research: Results from 7 large randomized cardiovascular clinical trials. Am Heart J 2019; 218:110-122. [PMID: 31726314 DOI: 10.1016/j.ahj.2019.09.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 09/05/2019] [Indexed: 12/14/2022]
Abstract
BACKGROUND Medicare insurance claims may provide an efficient means to ascertain follow-up of older participants in clinical research. We sought to determine the accuracy and completeness of claims- versus site-based follow-up with clinical event committee (+CEC) adjudication of cardiovascular outcomes. METHODS We performed a retrospective study using linked Medicare and Duke Database of Clinical Trials data. Medicare claims were linked to clinical data from 7 randomized cardiovascular clinical trials. Of 52,476 trial participants, linking resulted in 5,839 (of 10,497 linkage-eligible) Medicare-linked trial participants with fee-for-service A and B coverage. Death, myocardial infarction (MI), stroke, and revascularization incidences were compared using Medicare inpatient claims only, site-reported events (+CEC) only, or a combination of the 2. Randomized treatment effects were compared as a function of whether claims-based, site-based (+CEC), or a combined system was used for event detection. RESULTS Among the 5,839 study participants, the annual event rates were similar between claims- and site-based (+CEC) follow-up: death (overall rate 5.2% vs 5.2%; adjusted κ 0.99), MI (2.2% vs 2.3%; adjusted κ 0.96), stroke (0.7% vs 0.7%; adjusted κ 0.99), and any revascularization (7.4% vs 7.9%; adjusted κ 0.95). Of events detected by claims yet not reported by CEC, a minority were reported by sites but negatively adjudicated by CEC (39% of MIs and 18% of strokes). Differences in individual case concordance led to higher event rates when claims- and site-based (+CEC) systems were combined. Randomized treatment effects were similar among the 3 approaches for each outcome of interest. CONCLUSIONS Claims- versus site-based (+CEC) follow-up identified similar overall cardiovascular event rates despite meaningful differences in the events detected. Randomized treatment effects were similar using the 2 methods, suggesting claims data could be used to support clinical research leveraging routinely collected data. This approach may lead to more effective evidence generation, synthesis, and appraisal of medical products and inform the strategic approaches toward the National Evaluation System for Health Technology.
Collapse
Affiliation(s)
| | - Lisa Wruck
- Duke University School of Medicine, Durham, NC
| | | | | | | | | | | | | | - Sunil V Rao
- Duke University School of Medicine, Durham, NC
| | | | | | | | | | | | | | | | | | - Benjamin C Eloff
- Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, MD
| | - Daniel A Caños
- Center for Clinical Standards and Quality, Centers for Medicare & Medicaid Services, Baltimore, MD
| | | | | | | | - Danica Marinac-Dabic
- Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, MD
| | | |
Collapse
|
19
|
Abstract
Linked data are increasingly being used for epidemiological research, to enhance primary research, and in planning, monitoring and evaluating public policy and services. Linkage error (missed links between records that relate to the same person or false links between unrelated records) can manifest in many ways: as missing data, measurement error and misclassification, unrepresentative sampling, or as a special combination of these that is specific to analysis of linked data: the merging and splitting of people that can occur when two hospital admission records are counted as one person admitted twice if linked and two people admitted once if not. Through these mechanisms, linkage error can ultimately lead to information bias and selection bias; so identifying relevant mechanisms is key in quantitative bias analysis. In this article we introduce five key concepts and a study classification system for identifying which mechanisms are relevant to any given analysis. We provide examples and discuss options for estimating parameters for bias analysis. This conceptual framework provides the 'links' between linkage error, information bias and selection bias, and lays the groundwork for quantitative bias analysis for linkage error.
Collapse
Affiliation(s)
- James C Doidge
- Intensive Care National Audit and Research Centre, London, UK
- UCL Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Katie L Harron
- UCL Great Ormond Street Institute of Child Health, University College London, London, UK
| |
Collapse
|
20
|
Delmestri A, Prieto-Alhambra D. CPRD GOLD and linked ONS mortality records: Reconciling guidelines. Int J Med Inform 2019; 136:104038. [PMID: 32078979 DOI: 10.1016/j.ijmedinf.2019.104038] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 10/19/2019] [Accepted: 11/26/2019] [Indexed: 12/17/2022]
Abstract
BACKGROUND The Clinical Practice Research Datalink (CPRD) GOLD is an extremely influential U.K. primary care dataset for epidemiological research having a number of published papers based on its data much bigger than any other U.K. primary care dataset. The Office for National Statistics (ONS) death data for England can be linked to GOLD at the patient level and are considered the gold standard on mortality. GOLD, which also holds death data, has been recently assessed against ONS linked dataset and the accuracy of its dates of death has been deemed sufficient for the majority of observational studies. However, there is a lack of guidance on how to manage the challenges existing when ONS mortality and GOLD datasets are linked, including linkage coverage period, linkage correctness likelihood, linkage regional limitations and data discrepancy. OBJECTIVES Provide reconciling guidelines on how to make maximum and at the same time trustworthy use of mortality information coming from both GOLD and ONS linked datasets with the aim of improving the quality, reproducibility, transparency and comparison of clinical research. METHOD AND RESULTS We have developed recommendations on how to manage mortality data coming from both GOLD and linked ONS, taking into account linkage coverage period, linkage correctness likelihood, linkage regional limitations and data discrepancies between these two datasets. We have also implemented these guidelines in an SQL algorithm for researchers to use. CONCLUSION We have provided detailed guidelines on the reconciliation of mortality data between GOLD and ONS linked death datasets, taking into account both their strengths and limitations. The consistent application of these guidelines made practical by an SQL algorithm, has the potential to improve clinical research quality, reproducibility, transparency and comparison.
Collapse
Affiliation(s)
- Antonella Delmestri
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, UK.
| | - Daniel Prieto-Alhambra
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, UK
| |
Collapse
|
21
|
Norris KC, Duru OK, Alicic RZ, Daratha KB, Nicholas SB, McPherson SM, Bell DS, Shen JI, Jones CR, Moin T, Waterman AD, Neumiller JJ, Vargas RB, Bui AAT, Mangione CM, Tuttle KR. Rationale and design of a multicenter Chronic Kidney Disease (CKD) and at-risk for CKD electronic health records-based registry: CURE-CKD. BMC Nephrol 2019; 20:416. [PMID: 31747918 PMCID: PMC6868861 DOI: 10.1186/s12882-019-1558-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 09/12/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Chronic kidney disease (CKD) is a global public health problem, exhibiting sharp increases in incidence, prevalence, and attributable morbidity and mortality. There is a critical need to better understand the demographics, clinical characteristics, and key risk factors for CKD; and to develop platforms for testing novel interventions to improve modifiable risk factors, particularly for the CKD patients with a rapid decline in kidney function. METHODS We describe a novel collaboration between two large healthcare systems (Providence St. Joseph Health and University of California, Los Angeles Health) supported by leadership from both institutions, which was created to develop harmonized cohorts of patients with CKD or those at increased risk for CKD (hypertension/HTN, diabetes/DM, pre-diabetes) from electronic health record data. RESULTS The combined repository of candidate records included more than 3.3 million patients with at least a single qualifying measure for CKD and/or at-risk for CKD. The CURE-CKD registry includes over 2.6 million patients with and/or at-risk for CKD identified by stricter guide-line based criteria using a combination of administrative encounter codes, physical examinations, laboratory values and medication use. Notably, data based on race/ethnicity and geography in part, will enable robust analyses to study traditionally disadvantaged or marginalized patients not typically included in clinical trials. DISCUSSION CURE-CKD project is a unique multidisciplinary collaboration between nephrologists, endocrinologists, primary care physicians with health services research skills, health economists, and those with expertise in statistics, bio-informatics and machine learning. The CURE-CKD registry uses curated observations from real-world settings across two large healthcare systems and has great potential to provide important contributions for healthcare and for improving clinical outcomes in patients with and at-risk for CKD.
Collapse
Affiliation(s)
- Keith C Norris
- David Geffen School of Medicine at University of California, Los Angeles, CA, 90095, USA.
- UCLA Department of Medicine, Division of General Internal Medicine, 1100 Glendon Ave. Suite 900, Los Angeles, CA, 90024, USA.
| | - O Kenrik Duru
- David Geffen School of Medicine at University of California, Los Angeles, CA, 90095, USA
| | - Radica Z Alicic
- Providence St. Joseph Health, Providence Medical Research Center, Spokane, Washington, USA
- University of Washington School of Medicine, Seattle, Washington, USA
| | - Kenn B Daratha
- Providence St. Joseph Health, Providence Medical Research Center, Spokane, Washington, USA
| | - Susanne B Nicholas
- David Geffen School of Medicine at University of California, Los Angeles, CA, 90095, USA
| | - Sterling M McPherson
- Providence St. Joseph Health, Providence Medical Research Center, Spokane, Washington, USA
- Washington State University Elson S. Floyd College of Medicine, Spokane, Washington, USA
| | - Douglas S Bell
- David Geffen School of Medicine at University of California, Los Angeles, CA, 90095, USA
| | - Jenny I Shen
- David Geffen School of Medicine at University of California, Los Angeles, CA, 90095, USA
- Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Cami R Jones
- Providence St. Joseph Health, Providence Medical Research Center, Spokane, Washington, USA
| | - Tannaz Moin
- David Geffen School of Medicine at University of California, Los Angeles, CA, 90095, USA
- VA Greater Los Angeles, Los Angeles, USA
| | - Amy D Waterman
- David Geffen School of Medicine at University of California, Los Angeles, CA, 90095, USA
| | - Joshua J Neumiller
- Washington State University College of Pharmacy and Pharmaceutical Sciences, Spokane, USA
| | - Roberto B Vargas
- Charles R. Drew University of Medicine and Science, Los Angeles, USA
- RAND Corporation, Santa Monica, CA, USA
| | - Alex A T Bui
- David Geffen School of Medicine at University of California, Los Angeles, CA, 90095, USA
| | - Carol M Mangione
- David Geffen School of Medicine at University of California, Los Angeles, CA, 90095, USA
| | - Katherine R Tuttle
- Providence St. Joseph Health, Providence Medical Research Center, Spokane, Washington, USA
- University of Washington School of Medicine, Seattle, Washington, USA
| |
Collapse
|
22
|
Choudhary P, de Portu S, Arrieta A, Castañeda J, Campbell FM. Use of sensor-integrated pump therapy to reduce hypoglycaemia in people with Type 1 diabetes: a real-world study in the UK. Diabet Med 2019; 36:1100-1108. [PMID: 31134668 DOI: 10.1111/dme.14043] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/25/2019] [Indexed: 01/04/2023]
Abstract
AIMS To assess the efficacy of insulin pumps with automated insulin suspension systems in a real-world setting. METHODS We analysed anonymized data uploaded to CareLink™ by people (n=920) with Type 1 diabetes using the MiniMed Paradigm Veo system and the MiniMed 640G system (Medtronic International Trading Sàrl, Tolochanez, Switzerland) with SmartGuard technology, with or without automated insulin suspension enabled, between February 2016 and June 2018. Users with ≥15 days of sensor data and ≥70% sensor-wear time were classified as sensor-augmented pump alone, sensor-integrated pump with low glucose suspend enabled or sensor-integrated pump with predictive low glucose management enabled. RESULTS The median (25th -75th percentile) system use was 161 (58-348) days. The median time spent with sensor glucose values ≤3 mmol/l was 0.8 (0.3-1.7)% in the sensor-augmented pump group, 0.3 (0.1-0.7)% in the sensor-integrated pump with low glucose suspend group, and 0.3 (0.1-0.5)% in the sensor-integrated pump with predictive low glucose management group. In individuals switching from sensor-augmented pump to sensor-integrated pump with low glucose suspend (n=31), there were significant reductions in the monthly rate of hypoglycaemic events <3 mmol/l (rate ratio 0.63, 95% CI 0.45-0.89; P=0.009) and in the percentage of time with glucose values ≤3 mmol/l [sensor-augmented pump: 0.63% (95% CI 0.34-1.29), sensor-integrated pump with low glucose suspend: 0.33% (95% CI 0.16-0.64); P=0.001]. The monthly rate of hypoglycaemic events decreased further in individuals (n=139) switching from sensor-integrated pump with low glucose suspend to sensor-integrated pump with predictive low glucose management [rate ratio 0.82 (95% CI 0.69-0.98); P<0.0274]. Similar results were seen for events <3.9 mmol/l. There was no difference in median time spent in target glucose range. CONCLUSION Real-world UK data show that increasing automation of insulin suspension reduces hypoglycaemia exposure in people with Type 1 diabetes.
Collapse
Affiliation(s)
- P Choudhary
- King's College London, School of Life Course Sciences, London, UK
| | - S de Portu
- Medtronic International Trading Sàrl, Tolochenaz, Switzerland
| | - A Arrieta
- Medtronic, Bakken Research Centre, Maastricht, The Netherlands
| | - J Castañeda
- Medtronic, Bakken Research Centre, Maastricht, The Netherlands
| | | |
Collapse
|
23
|
Langner I, Ohlmeier C, Zeeb H, Haug U, Riedel O. Individual mortality information in the German Pharmacoepidemiological Research Database (GePaRD): a validation study using a record linkage with a large cancer registry. BMJ Open 2019; 9:e028223. [PMID: 31270118 PMCID: PMC6609119 DOI: 10.1136/bmjopen-2018-028223] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
OBJECTIVE Claims data need to be validated to assess their use for epidemiological research. This study aimed to examine the validity of mortality information in the German Pharmacoepidemiological Research Database (GePaRD). DESIGN Validation study, secondary data, medical claims. SETTING Claims data of two German nationwide acting statutory health insurance providers (SHIs) contributing data for GePaRD; record linkage with epidemiological cancer registry providing individual official mortality information. PARTICIPANTS All women insured with the two SHIs whose insurance coverage ended in the period 2006-2013 and who were residents of North Rhine Westphalia. MEASURES Descriptive statistics were used to analyse the performance of the linkage procedure. Further, we calculated measures of agreement between the official and the GePaRD-based vital status and assessed differences between the official and the GePaRD-based date of death. RESULTS Of the 256 111 women of the linkage sample, 25 528 were classified as 'deceased' in GePaRD and the others as 'alive'. Compared with the official data, the GePaRD-based vital status showed a sensitivity of 95.9% and a specificity of 99.4%. The negative predictive value was 99.6% and the positive predictive value 94.3%. The date of death agreed in 96.3% between both data sources. CONCLUSIONS The vital status recorded in GePaRD was of high accuracy and discrepancies between dates of death in GePaRD and official dates were rare. This underlines the potential of the database for conducting large cohort studies with mortality as the endpoint.
Collapse
Affiliation(s)
- Ingo Langner
- Clinical Epidemiology, Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany
| | | | - Hajo Zeeb
- Clinical Epidemiology, Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany
- High-Profile Research Area Health Sciences, University of Bremen, Bremen, Germany
| | - Ulrike Haug
- Clinical Epidemiology, Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany
- High-Profile Research Area Health Sciences, University of Bremen, Bremen, Germany
| | - Oliver Riedel
- Clinical Epidemiology, Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany
| |
Collapse
|
24
|
Slotwiner DJ, Tarakji KG, Al-Khatib SM, Passman RS, Saxon LA, Peters NS, McCall D, Turakhia MP, Schaeffer J, Mendenhall GS, Hindricks G, Narayan SM, Davenport EE, Marrouche NF. Transparent sharing of digital health data: A call to action. Heart Rhythm 2019; 16:e95-e106. [PMID: 31077802 DOI: 10.1016/j.hrthm.2019.04.042] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/22/2019] [Indexed: 11/18/2022]
Affiliation(s)
- David J Slotwiner
- NewYork-Presbyterian Queens, Cardiology Division, Weill Cornell Medical College, New York, New York.
| | | | | | - Rod S Passman
- Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Leslie A Saxon
- Center for Body Computing, Keck School of Medicine, University of Southern California, Los Angeles, California
| | | | - Debbe McCall
- Functioning as the lay volunteer/patient representative
| | | | | | | | | | | | | | | |
Collapse
|
25
|
Lavery JA, Lipitz-Snyderman A, Li DG, Bach PB, Panageas KS. Identifying Cancer-Directed Surgeries in Medicare Claims: A Validation Study Using SEER-Medicare Data. JCO Clin Cancer Inform 2019; 3:1-24. [PMID: 30715928 PMCID: PMC6648680 DOI: 10.1200/cci.18.00093] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/19/2018] [Indexed: 02/06/2023] Open
Abstract
PURPOSE Medicare claims provide a rich data source for large-scale quality assessment because data are available for all beneficiaries nationally. For cancer surgery, the absence of information regarding site of cancer and date of diagnosis on an administrative claim necessitates testing to ensure accurate quality assessment and public reporting. METHODS Using the SEER Medicare-linked database as the gold standard, we developed and tested an approach to identify cancer-directed surgeries from Medicare fee-for-service claims alone. Our analysis evaluated two questions: (1) Can we identify a large percentage of patients who underwent a cancer-directed surgery using only Medicare claims? (2) Of all patients identified as having undergone a cancer-directed surgery, what percentage had cancer? We evaluated this approach for 17 primary cancer sites. RESULTS The number of Medicare beneficiaries diagnosed with their first cancer during the years 2011 to 2013 and who underwent cancer-directed surgery ranged from 45 patients (bones and joints) to 20,163 patients (breast). The percentage of cancer-directed surgeries identified using Medicare claims alone ranged from 62% (skin melanoma) to 94% (prostate). For all but three cancer sites (skin melanoma, thyroid, and urinary bladder), more than 80% of cancer-directed surgeries were identified using our approach. Of all surgeries identified, more than 90% were for patients with cancer. CONCLUSION Identifying patients who underwent a cancer-directed surgery from Medicare claims is feasible for many cancer sites, although careful consideration needs to be given to the validity of each site. Our findings support the use of Medicare claims for large-scale quality assessment of cancer surgery by disease site.
Collapse
Affiliation(s)
| | | | - Diane G. Li
- Memorial Sloan Kettering Cancer Center, New York, NY
| | - Peter B. Bach
- Memorial Sloan Kettering Cancer Center, New York, NY
| | | |
Collapse
|
26
|
Rentsch CT, Harron K, Urassa M, Todd J, Reniers G, Zaba B. Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania. BMC Med Res Methodol 2018; 18:165. [PMID: 30526518 PMCID: PMC6288858 DOI: 10.1186/s12874-018-0632-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Accepted: 11/30/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Studies based on high-quality linked data in developed countries show that even minor linkage errors, which occur when records of two different individuals are erroneously linked or when records belonging to the same individual are not linked, can impact bias and precision of subsequent analyses. We evaluated the impact of linkage quality on inferences drawn from analyses using data with substantial linkage errors in rural Tanzania. METHODS Semi-automatic point-of-contact interactive record linkage was used to establish gold standard links between community-based HIV surveillance data and medical records at clinics serving the surveillance population. Automated probabilistic record linkage was used to create analytic datasets at minimum, low, medium, and high match score thresholds. Cox proportional hazards regression models were used to compare HIV care registration rates by testing modality (sero-survey vs. clinic) in each analytic dataset. We assessed linkage quality using three approaches: quantifying linkage errors, comparing characteristics between linked and unlinked data, and evaluating bias and precision of regression estimates. RESULTS Between 2014 and 2017, 405 individuals with gold standard links were newly diagnosed with HIV in sero-surveys (n = 263) and clinics (n = 142). Automated probabilistic linkage correctly identified 233 individuals (positive predictive value [PPV] = 65%) at the low threshold and 95 individuals (PPV = 90%) at the high threshold. Significant differences were found between linked and unlinked records in primary exposure and outcome variables and for adjusting covariates at every threshold. As expected, differences attenuated with increasing threshold. Testing modality was significantly associated with time to registration in the gold standard data (adjusted hazard ratio [HR] 4.98 for clinic-based testing, 95% confidence interval [CI] 3.34, 7.42). Increasing false matches weakened the association (HR 2.76 at minimum match score threshold, 95% CI 1.73, 4.41). Increasing missed matches (i.e., increasing match score threshold and positive predictive value of the linkage algorithm) was strongly correlated with a reduction in the precision of coefficient estimate (R2 = 0.97; p = 0.03). CONCLUSIONS Similar to studies with more negligible levels of linkage errors, false matches in this setting reduced the magnitude of the association; missed matches reduced precision. Adjusting for these biases could provide more robust results using data with considerable linkage errors.
Collapse
Affiliation(s)
- Christopher T. Rentsch
- Department of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT UK
| | | | - Mark Urassa
- The TAZAMA Project, National Institute for Medical Research, Mwanza, Tanzania
| | - Jim Todd
- Department of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT UK
- The TAZAMA Project, National Institute for Medical Research, Mwanza, Tanzania
| | - Georges Reniers
- Department of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT UK
- MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt), School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Basia Zaba
- Department of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT UK
| |
Collapse
|
27
|
Yan S, Kwan YH, Tan CS, Thumboo J, Low LL. A systematic review of the clinical application of data-driven population segmentation analysis. BMC Med Res Methodol 2018; 18:121. [PMID: 30390641 PMCID: PMC6215625 DOI: 10.1186/s12874-018-0584-9] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Accepted: 10/19/2018] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Data-driven population segmentation analysis utilizes data analytics to divide a heterogeneous population into parsimonious and relatively homogenous groups with similar healthcare characteristics. It is a promising patient-centric analysis that enables effective integrated healthcare interventions specific for each segment. Although widely applied, there is no systematic review on the clinical application of data-driven population segmentation analysis. METHODS We carried out a systematic literature search using PubMed, Embase and Web of Science following PRISMA criteria. We included English peer-reviewed articles that applied data-driven population segmentation analysis on empirical health data. We summarized the clinical settings in which segmentation analysis was applied, compared and contrasted strengths, limitations, and practical considerations of different segmentation methods, and assessed the segmentation outcome of all included studies. The studies were assessed by two independent reviewers. RESULTS We retrieved 14,514 articles and included 216 articles. Data-driven population segmentation analysis was widely used in different clinical contexts. 163 studies examined the general population while 53 focused on specific population with certain diseases or conditions, including psychological, oncological, respiratory, cardiovascular, and gastrointestinal conditions. Variables used for segmentation in the studies are heterogeneous. Most studies (n = 170) utilized secondary data in community settings (n = 185). The most common segmentation method was latent class/profile/transition/growth analysis (n = 96) followed by K-means cluster analysis (n = 60) and hierarchical analysis (n = 50), each having its advantages, disadvantages, and practical considerations. We also identified key criteria to evaluate a segmentation framework: internal validity, external validity, identifiability/interpretability, substantiality, stability, actionability/accessibility, and parsimony. CONCLUSIONS Data-driven population segmentation has been widely applied and holds great potential in managing population health. The evaluations of segmentation outcome require the interplay of data analytics and subject matter expertise. The optimal framework for segmentation requires further research.
Collapse
Affiliation(s)
- Shi Yan
- Duke-NUS Medical School, 8 College Road, Singapore, 169857 Singapore
| | - Yu Heng Kwan
- Program in Health Services and Systems Research, Duke-NUS Medical School, 8 College Road, Singapore, 169857 Singapore
| | - Chuen Seng Tan
- Saw Swee Hock School of Public Health, National University of Singapore, 12 Science Drive 2, Singapore, 117549 Singapore
| | - Julian Thumboo
- Rheumatology and Immunology, Singapore General Hospital, 16 College Road, Block 6 Level 9, Singapore, 169854 Singapore
| | - Lian Leng Low
- Family Medicine and Continuing Care, Singapore General Hospital, Outram Road, Bowyer Block, Block A, Level 2, Singapore, 169608 Singapore
| |
Collapse
|
28
|
Gilsenan A, Harding A, Kellier-Steele N, Harris D, Midkiff K, Andrews E. The Forteo Patient Registry linkage to multiple state cancer registries: study design and results from the first 8 years. Osteoporos Int 2018; 29:2335-2343. [PMID: 29978254 PMCID: PMC6154045 DOI: 10.1007/s00198-018-4604-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Accepted: 06/13/2018] [Indexed: 11/05/2022]
Abstract
UNLABELLED The Forteo Patient Registry (FPR) aims to estimate the incidence of osteosarcoma in US patients treated with teriparatide. Enrollment began in 2009 and will continue through 2019, with linkage planned through 2024. To date, no incident cases of osteosarcoma have been identified among patients registered in the FPR. INTRODUCTION The Forteo Patient Registry (FPR) was established in 2009 to estimate the incidence of osteosarcoma in US patients treated with teriparatide. The objective of this paper is to describe study methods, challenges encountered, and progress to date. METHODS The FPR is a prospective US registry designed to link data from participants annually with state cancer registries. Patient enrollment is planned for 10 years (2009-2019) and annual linkage with US state cancer registries for 15 years (2010-2024). All US state cancer registries and DC were invited to participate. Patients are recruited using pre-enrollment materials included in teriparatide device packaging, kits, and brochures distributed by health-care providers; a toll-free number; and a study website. A linkage algorithm is used to match data from enrolled participants with cancer registry data. RESULTS For the eighth annual linkage in 2017, information necessary for linkage with 63,270 patients in the FPR was submitted to each of the 42 participating registries. These patients contributed approximately 242,782 person-years of follow-up. A total of 5268 adult osteosarcoma cases diagnosed since January 1, 2009, were available for linkage from participating state cancer registries. To date, no incident cases of osteosarcoma have been identified among patients registered in the FPR. CONCLUSIONS Based on the estimated 242,782 person-years of observation as of the eighth annual linkage and projecting current enrollment rate to study end in 2024, it is anticipated that the completed study will be able to detect a fourfold increase in the risk of osteosarcoma if one exists.
Collapse
Affiliation(s)
- A Gilsenan
- RTI Health Solutions, 200 Park Offices Drive, P.O. Box 12194, Research Triangle Park, NC, 27709, USA.
| | - A Harding
- RTI Health Solutions, 200 Park Offices Drive, P.O. Box 12194, Research Triangle Park, NC, 27709, USA
| | - N Kellier-Steele
- Eli Lilly and Company, Lilly Corporate Center, Indianapolis, IN, 46285, USA
| | - D Harris
- RTI Health Solutions, 200 Park Offices Drive, P.O. Box 12194, Research Triangle Park, NC, 27709, USA
| | - K Midkiff
- RTI Health Solutions, 200 Park Offices Drive, P.O. Box 12194, Research Triangle Park, NC, 27709, USA
| | - E Andrews
- RTI Health Solutions, 200 Park Offices Drive, P.O. Box 12194, Research Triangle Park, NC, 27709, USA
| |
Collapse
|
29
|
Martin P, Cortina-Borja M, Newburn M, Harper G, Gibson R, Dodwell M, Dattani N, Macfarlane A. Timing of singleton births by onset of labour and mode of birth in NHS maternity units in England, 2005-2014: A study of linked birth registration, birth notification, and hospital episode data. PLoS One 2018; 13:e0198183. [PMID: 29902220 PMCID: PMC6002087 DOI: 10.1371/journal.pone.0198183] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Accepted: 05/11/2018] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Maternity care has to be available 24 hours a day, seven days a week. It is known that obstetric intervention can influence the time of birth, but no previous analysis at a national level in England has yet investigated in detail the ways in which the day and time of birth varies by onset of labour and mode of giving birth. METHOD We linked data from birth registration, birth notification, and Maternity Hospital Episode Statistics and analysed 5,093,615 singleton births in NHS maternity units in England from 2005 to 2014. We used descriptive statistics and negative binomial regression models with harmonic terms to establish how patterns of timing of birth vary by onset of labour, mode of giving birth and gestational age. RESULTS The timing of birth by time of day and day of the week varies considerably by onset of labour and mode of birth. Spontaneous births after spontaneous onset are more likely to occur between midnight and 6am than at other times of day, and are also slightly more likely on weekdays than at weekends and on public holidays. Elective caesarean births are concentrated onto weekday mornings. Births after induced labours are more likely to occur at hours around midnight on Tuesdays to Saturdays and on days before a public holiday period, than on Sundays, Mondays and during or just after a public holiday. CONCLUSION The timing of births varies by onset of labour and mode of birth and these patterns have implications for midwifery and medical staffing. Further research is needed to understand the processes behind these findings.
Collapse
Affiliation(s)
- Peter Martin
- Centre for Maternal and Child Health Research, School of Health Sciences, City, University of London, London, United Kingdom
| | - Mario Cortina-Borja
- Population, Policy and Practice Programme, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Mary Newburn
- Centre for Maternal and Child Health Research, School of Health Sciences, City, University of London, London, United Kingdom
| | - Gill Harper
- Centre for Maternal and Child Health Research, School of Health Sciences, City, University of London, London, United Kingdom
| | - Rod Gibson
- Rod Gibson Associates Ltd., Wotton-under-Edge, United Kingdom
| | - Miranda Dodwell
- Centre for Maternal and Child Health Research, School of Health Sciences, City, University of London, London, United Kingdom
| | - Nirupa Dattani
- Centre for Maternal and Child Health Research, School of Health Sciences, City, University of London, London, United Kingdom
| | - Alison Macfarlane
- Centre for Maternal and Child Health Research, School of Health Sciences, City, University of London, London, United Kingdom
| |
Collapse
|
30
|
de Paula AA, Pires DF, Filho PA, de Lemos KRV, Barçante E, Pacheco AG. A comparison of accuracy and computational feasibility of two record linkage algorithms in retrieving vital status information from HIV/AIDS patients registered in Brazilian public databases. Int J Med Inform 2018; 114:45-51. [PMID: 29673602 DOI: 10.1016/j.ijmedinf.2018.03.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Revised: 03/19/2018] [Accepted: 03/19/2018] [Indexed: 11/19/2022]
Abstract
BACKGROUND AND OBJECTIVE While cross-referencing information from people living with HIV/AIDS (PLWHA) to the official mortality database is a critical step in monitoring the HIV/AIDS epidemic in Brazil, the accuracy of the linkage routine may compromise the validity of the final database, yielding to biased epidemiological estimates. We compared the accuracy and the total runtime of two linkage algorithms applied to retrieve vital status information from PLWHA in Brazilian public databases. METHODS Nominally identified records from PLWHA were obtained from three distinct government databases. Linkage routines included an algorithm in Python language (PLA) and Reclink software (RlS), a probabilistic software largely utilized in Brazil. Records from PLWHA1 known to be alive were added to those from patients reported as deceased. Data were then searched into the mortality system. Scenarios where 5% and 50% of patients actually dead were simulated, considering both complete cases and 20% missing maternal names. RESULTS When complete information was available both algorithms had comparable accuracies. In the scenario of 20% missing maternal names, PLA2 and RlS3 had sensitivities of 94.5% and 94.6% (p > 0.5), respectively; after manual reviewing, PLA sensitivity increased to 98.4% (96.6-100.0) exceeding that for RlS (p < 0.01). PLA had higher positive predictive value in 5% death proportion. Manual reviewing was intrinsically required by RlS in up to 14% register for people actually dead, whereas the corresponding proportion ranged from 1.5% to 2% for PLA. The lack of manual inspection did not alter PLA sensitivity when complete information was available. When incomplete data was available PLA sensitivity increased from 94.5% to 98.4%, thus exceeding that presented by RlS (94.6%, p < 0.05). RlS spanned considerably less processing time compared to PLA. CONCLUSION Both linkage algorithms presented interchangeable accuracies in retrieving vital status data from PLWHA. RlS had a considerably lesser runtime but intrinsically required manually reviewing a fastidious proportion of the matched registries. On the other hand, PLA spent quite more runtime but spared manual reviewing at no expense of accuracy.
Collapse
Affiliation(s)
| | | | - Pedro Alves Filho
- Rio de Janeiro State Health Secretariat, Rua México, 128, Rio de Janeiro, Brazil.
| | | | - Eduardo Barçante
- DataUERJ/UERJ, Rua São Francisco Xavier, 524, Rio de Janeiro, Brazil.
| | | |
Collapse
|
31
|
|
32
|
Abstract
Summary
Objectives:
In 2002 a decision was reached to set up a nation-wide electronic health record system in Finland. The legal framework of actors with the necessary mandate was approved in the parliament in December 2006. A set of standards and norms have been selected that all health care actors need to follow. Functional specifications of the services were completed in 2006. Setting up the centralized health IT services begins in 2007.Centralization of patient record data allows the reorganization of health service providers to take place at local and regional levels according to need. The services allow users to access patient records securely from anywhere with the provision that they have the right to access private patient data.
Methods:
The functionality of the services and the necessary infrastructure has been agreed to in projects and working groups involving users, experts, key stakeholders and vendors.
Results:
The legal framework was approved in the parliament in December 2006. The functional specifications of thecentralized health IT services were finalized in 2006.
Conclusions:
The implementation of the services will start in 2007.
Collapse
Affiliation(s)
- N Saranummi
- VTT Technical Research Centre of Finland, Pervasive Health Technologies, P.O. Box 1300, 33101 Tampere, Finland.
| | | | | | | | | |
Collapse
|
33
|
Abstract
Summary
Objective:
It was the objective of this study to assess the impact of applying various record linkage methods to one of the most important outcome measures in oncological epidemiology, namely survival rates.
Methods:
To assess the life status of patients, incidence data published by the Cancer Registry of Tyrol were analyzed with three routinely used methods of record linkage for incidence and mortality data. Of these methods, two were deterministic and the third a probabilistic method developed by the Cancer Registry of Tyrol. We studied the impact of record linkage methods on a simple measure (mortality rate) and a more complex measure (relative survival rate). The analysis was based on the published incidence data for Tyrol for the years 1992 to 1996. Results of deterministic record linkage methodswere simulated.
Results:
The error rates for simple mortality rate and relative survival rate are considerable. For the first deterministic record linkage method, relative differences in mortality rate range from 11.9% to 14.8% (men) and 24.5% to 28.2% (women) and relative differences in relative five-year survival from 11.4% to 16.3% (men) and from 19.3% to 26.4% (women). For the second deterministic record linkage method, relative differences in mortality rate range from 4.8% to 5.9% (men) and from 4.9% to 7.4% (women), while relative differences in relative five-year survival range from 5.1% to 7.0% (men) and from 4.4% to 6.1% (women).
Conclusions:
Our study shows that in order to calculate valid mortality and survival rates a probabilistic method of record linkage must be applied.
Collapse
Affiliation(s)
- W Oberaigner
- Cancer Registry of Tyrol, Department of Clinical Epidemiology of the Tyrolian State Hospitals Ltd., Anichstrasse 35, Innsbruck, Austria.
| |
Collapse
|
34
|
Maojo V, Crespo J, de la Calle G, Barreiro J, Garcia-Remesal M. Using Web Services for Linking Genomic Data to Medical Information Systems. Methods Inf Med 2018; 46:484-92. [PMID: 17694245 DOI: 10.1160/me9056] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Summary
Objectives:
To develop a new perspective for biomedical information systems, regarding the introduction of ideas, methods and tools related to the new scenario of genomic medicine.
Methods:
Technological aspects related to the analysis and integration of heterogeneous clinical and genomic data include mapping clinical and genetic concepts, potential future standards or the development of integrated biomedical ontologies. In this clinicomics scenario, we describe the use of Web services technologies to improve access to and integrate different information sources. We give a concrete example of the use of Web services technologies: the Onto Fusion project.
Results:
Web services provide new biomedical informatics (BMI) approaches related to genomic medicine. Customized workflowswill aid research tasks by linking heterogeneous Web services. Two significant examples of these European Commission-funded efforts are the INFOBIOMED Network of Excellence and the Advancing Clinico-Genomic Trials on Cancer (ACGT) integrated project.
Conclusions:
Supplying medical researchers and practitioners with omicsdata and biologists with clinical datasets can help to develop genomic medicine. BMI is contributing by providing the informatics methods and technological infrastructure needed for these collaborative efforts.
Collapse
Affiliation(s)
- V Maojo
- Biomedical Informatics Group, Artificial Intelligence Lab, Universidad Politécnica de Madrid, Boadilla del Monte, 28660 Madrid, Spain.
| | | | | | | | | |
Collapse
|
35
|
Kimura M, Nakayasu K, Ohshima Y, Fujita N, Nakashima N, Jozaki H, Numano T, Shimizu T, Shimomura M, Sasaki F, Fujiki T, Nakashima T, Toyoda K, Hoshi H, Sakusabe T, Naito Y, Kawaguchi K, Watanabe H, Tani S. SS-MIX: A Ministry Project to Promote Standardized Healthcare Information Exchange. Methods Inf Med 2018; 50:131-9. [PMID: 21206962 DOI: 10.3414/me10-01-0015] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2010] [Accepted: 08/29/2010] [Indexed: 11/09/2022]
Abstract
Summary
Objectives: To promote healthcare information exchange between providers and to allow hospital information systems (HIS) export information in standardized format (HL7 and DICOM) in an environment of widespread legacy systems, which only can export data in proprietary format.
Methods: Through the Shizuoka prefecture EMR project in 2004–2005, followed by the ministry’s SS-MIX project, many software products have been provided, which consist of 1) a standardized storage to receive HL7 v2.5 mes sages of patient demographics, prescription orders, laboratory results, and diagnostic disease in ICD-10, 2) a referral letter creation system, 3) a formatted document creation system, 4) a progress note/nursing record system, and 5) an archive/viewer to incorporate incoming healthcare data CD and allow users to view on HIS terminal. Meanwhile, other useful applications have been produced, such as adverse event reporting and clinical information retrieval. To achieve the above-mentioned objectives, these software products were created and propagated, because users can use these software products, provided that their HIS can export the above information to the standardized storage in HL7 v2.5 format.
Results: In 20 hospitals of Japan, the standardized storage has been installed and some applications have been used. As major HIS vendors are shipping HIS with HL7 export function since 2007, HIS of 594 hospitals in Japan became capable of exporting data in HL7 v2.5 format (as of March 2010).
Conclusions: In high CPOE installation rate (85% in 400+ bed hospitals), though most of them only capable of exporting data in proprietary format, prefecture and ministry projects were effective to promote healthcare information exchange between providers. The standardized storage became an infrastructure for many useful applications, and many hospitals started using them. Ministry designation of proposed healthcare standards was effective so as to allow vendors to conform their products, and users to install them.
Collapse
Affiliation(s)
- M Kimura
- Hamamatsu University, Hamamatsu, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Kilburn LS, Aresu M, Banerji J, Barrett-Lee P, Ellis P, Bliss JM. Can routine data be used to support cancer clinical trials? A historical baseline on which to build: retrospective linkage of data from the TACT (CRUK 01/001) breast cancer trial and the National Cancer Data Repository. Trials 2017; 18:561. [PMID: 29179731 PMCID: PMC5702960 DOI: 10.1186/s13063-017-2308-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 11/27/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Randomised clinical trials (RCTs) are the gold standard for evaluating new cancer treatments. They are, however, expensive to conduct, particularly where long-term follow-up of participants is required. Tracking participants via routine datasets could provide a cost-effective alternative for ascertaining follow-up information required to evaluate disease outcomes. This project explores the potential for routine data to inform cancer trials, using, the historical National Cancer Data Repository (NCDR) for English NHS sites and, for validation, mature data available from the TACT trial. METHODS Datasets were matched using patients' NHS number, date of birth (dob) and name/initials. Demographics, clinical characteristics and outcomes were assessed for agreement and completeness. Overall survival was compared between NCDR and TACT. RESULTS A total of 3151 patients underwent linkage; 3047 (96.7%) of which had matched records. Extensive cleaning was required for some registry data fields, e.g. cause of death, whilst others had large amounts of missing data, e.g. tumour size (22.1%). Other data had high levels of matching such as dob (99.6%) and date of death (89.6%). There was no evidence of differential survival rates (8-year survival: TACT = 75% (95% CI 73, 76); NCDR = 76% (95% CI 74, 77)). CONCLUSIONS Data quality and completeness requires improvement before routine data could be used for RCTs. Introduction of new routine datasets, including COSD, is welcomed although reporting of disease-recurrence events remains a concern. Prospective validation of such datasets is required before RCTs can confidently switch patient follow-up to utilise routinely collected NHS-based data. TACT TRIAL REGISTRATION Clinicaltrials.gov NCT00033683 , registered on 9 April 2002; ISRCTN79718493 , registered on 1 July 2001.
Collapse
Affiliation(s)
- Lucy Suzanne Kilburn
- ICR Clinical Trials and Statistics Unit (ICR-CTSU), Division of Clinical Studies, The Institute of Cancer Research, Sir Richard Doll Building, Cotswold Road, SM2 5NG London, UK
| | - Maria Aresu
- Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
| | - Jane Banerji
- ICR Clinical Trials and Statistics Unit (ICR-CTSU), Division of Clinical Studies, The Institute of Cancer Research, Sir Richard Doll Building, Cotswold Road, SM2 5NG London, UK
| | | | - Paul Ellis
- Guy’s Hospital, Kings Health Partners AHSC, London, UK
| | - Judith Margaret Bliss
- ICR Clinical Trials and Statistics Unit (ICR-CTSU), Division of Clinical Studies, The Institute of Cancer Research, Sir Richard Doll Building, Cotswold Road, SM2 5NG London, UK
| |
Collapse
|
37
|
Elysee G, Herrin J, Horwitz LI. An observational study of the relationship between meaningful use-based electronic health information exchange, interoperability, and medication reconciliation capabilities. Medicine (Baltimore) 2017; 96:e8274. [PMID: 29019898 PMCID: PMC5662321 DOI: 10.1097/md.0000000000008274] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Stagnation in hospitals' adoption of data integration functionalities coupled with reduction in the number of operational health information exchanges could become a significant impediment to hospitals' adoption of 3 critical capabilities: electronic health information exchange, interoperability, and medication reconciliation, in which electronic systems are used to assist with resolving medication discrepancies and improving patient safety. Against this backdrop, we assessed the relationships between the 3 capabilities.We conducted an observational study applying partial least squares-structural equation modeling technique to 27 variables obtained from the 2013 American Hospital Association annual survey Information Technology (IT) supplement, which describes health IT capabilities.We included 1330 hospitals. In confirmatory factor analysis, out of the 27 variables, 15 achieved loading values greater than 0.548 at P < .001, as such were validated as the building blocks of the 3 capabilities. Subsequent path analysis showed a significant, positive, and cyclic relationship between the capabilities, in that decreases in the hospitals' adoption of one would lead to decreases in the adoption of the others.These results show that capability for high quality medication reconciliation may be impeded by lagging adoption of interoperability and health information exchange capabilities. Policies focused on improving one or more of these capabilities may have ancillary benefits.
Collapse
Affiliation(s)
- Gerald Elysee
- Health Information Technology Programs, Department of Computer Technology, Benjamin Franklin Institute of Technology, Boston, MA
| | - Jeph Herrin
- Section of Cardiology, Department of Internal Medicine, Yale School of Medicine, New Haven, CT
| | - Leora I. Horwitz
- Division of Healthcare Delivery Science, Department of Population Health, NYU School of Medicine, Center for Healthcare Innovation and Delivery Science, NYU Langone Health, New York, NY, USA
| |
Collapse
|
38
|
Abstract
With increasing availability of large datasets derived from administrative and other sources, there is an increasing demand for the successful linking of these to provide rich sources of data for further analysis. Variation in the quality of identifiers used to carry out linkage means that existing approaches are often based upon 'probabilistic' models, which are based on a number of assumptions, and can make heavy computational demands. In this paper, we suggest a new approach to classifying record pairs in linkage, based upon weights (scores) derived using a scaling algorithm. The proposed method does not rely on training data, is computationally fast, requires only moderate amounts of storage and has intuitive appeal. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Harvey Goldstein
- University of Bristol, Bristol, U.K
- University College London, London, U.K
| | - Katie Harron
- London School of Hygiene and Tropical Medicine, London, U.K
| | | |
Collapse
|
39
|
Saugo M, Mastrangelo G, Blengio G, Righetto G. [Extending traceability of malignant testicular tumours using hospital discharge records: an experience in Veneto Region (Northern Italy)]. Epidemiol Prev 2017; 41:184-186. [PMID: 28929714 DOI: 10.19191/ep17.3-4.p184.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
OBJECTIVES validation of codes of hospital discharge records (SDO) for identification of new cases of malignant testicular tumour in the Veneto Region (Northern Italy). DESIGN record linkage between the regional archive of SDO and the archive of the Veneto Tumour Registry (VTR). SETTING AND PARTICIPANTS extraction of cases from SDO source with ICD-9-CM 186 code for diagnosis and 62.3-62.4 codes for surgical procedure, and from VTR database using ICD-O-3 C62 code for site and 9060-9062, 9064-9066, 9070, 9071, 9080-9083, 9085, 9100, 9101 codes for morphology, with 5th digit behaviour code equal to "/3". Comparison of the two sources in a classification table using VTR data as gold standard. MAIN OUTCOME MEASURES positive predictive value and sensitivity of SDO, with 95% confidence interval (95%CI) based on binomial distribution. RESULTS from 2006 to 2008, in areas covered by the registry, SDO and VTR identified, respectively, 221 and 216 cases of testicular cancer. SDO procedure showed a sensitivity of 92% (95%CI 87%- 95%) and a positive predictive value of 90% (95%CI 85%-93%). CONCLUSIONS the SDO procedure can be considered an acceptable proxy for testis cancer incidence, thus allowing a wider spatiotemporal observation of the epidemiological trends.
Collapse
Affiliation(s)
- Mario Saugo
- già Sistema epidemiologico regionale Veneto, Padova.
| | - Giuseppe Mastrangelo
- UOC Medicina del lavoro, Dipartimento di scienze cardiologiche toraciche e vascolari, Università di Padova
| | - Gianstefano Blengio
- già Centro tematico di epidemiologia ambientale, Dipartimento di prevenzione, Azienda ULSS 22, Bussolengo (VR)
| | | |
Collapse
|
40
|
Culbertson A, Goel S, Madden MB, Safaeinili N, Jackson KL, Carton T, Waitman R, Liu M, Krishnamurthy A, Hall L, Cappella N, Visweswaran S, Becich MJ, Applegate R, Bernstam E, Rothman R, Matheny M, Lipori G, Bian J, Hogan W, Bell D, Martin A, Grannis S, Klann J, Sutphen R, O'Hara AB, Kho A. The Building Blocks of Interoperability. A Multisite Analysis of Patient Demographic Attributes Available for Matching. Appl Clin Inform 2017; 8:322-336. [PMID: 28378025 PMCID: PMC6241737 DOI: 10.4338/aci-2016-11-ra-0196] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 01/21/2017] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Patient matching is a key barrier to achieving interoperability. Patient demographic elements must be consistently collected over time and region to be valuable elements for patient matching. OBJECTIVES We sought to determine what patient demographic attributes are collected at multiple institutions in the United States and see how their availability changes over time and across clinical sites. METHODS We compiled a list of 36 demographic elements that stakeholders previously identified as essential patient demographic attributes that should be collected for the purpose of linking patient records. We studied a convenience sample of 9 health care systems from geographically distinct sites around the country. We identified changes in the availability of individual patient demographic attributes over time and across clinical sites. RESULTS Several attributes were consistently available over the study period (2005-2014) including last name (99.96%), first name (99.95%), date of birth (98.82%), gender/sex (99.73%), postal code (94.71%), and full street address (94.65%). Other attributes changed significantly from 2005-2014: Social security number (SSN) availability declined from 83.3% to 50.44% (p<0.0001). Email address availability increased from 8.94% up to 54% availability (p<0.0001). Work phone number increased from 20.61% to 52.33% (p<0.0001). CONCLUSIONS Overall, first name, last name, date of birth, gender/sex and address were widely collected across institutional sites and over time. Availability of emerging attributes such as email and phone numbers are increasing while SSN use is declining. Understanding the relative availability of patient attributes can inform strategies for optimal matching in healthcare.
Collapse
Affiliation(s)
- Adam Culbertson
- Adam Culbertson, 4300 Wilson Blvd., Suite 250, Arlington, VA 22203,
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Marx MM, Dulas FM, Schumacher KM. [Improving the visibility of rare diseases in health care systems by specific routine coding]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2017; 60:532-536. [PMID: 28349172 DOI: 10.1007/s00103-017-2534-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
The evaluation of healthcare providers' routine data is an important basis for the analysis, planning and evaluation of measures in public health. The representation of rare diseases in the classifications that are used to record health data is not adequate. Coding rare diseases in a specific way is a challenge all around the world. There is still no general international solution for the routine coding of rare diseases.The double coding of rare diseases with ICD-10 Codes and Orphacodes is a short-term and low-cost alternative solution. Furthermore, this double coding enables international comparability. The specific encoding of rare diseases through this double coding can improve their capturing for statistical analysis and thus their visibility in healthcare systems. Nevertheless, the provision of a new classification is not enough to gather valid data. Some measures have already been adopted in Germany (and at the European level) in order to support the implementation of this double coding. Subsequently it would be possible to adopt more specific public health measures, based on better data, in order to provide better care to the more than four million people in Germany affected by rare diseases.
Collapse
Affiliation(s)
- Magdalena María Marx
- Medizinische Klassifikationen, Deutsches Institut für Medizinische Dokumentation und Information, Waisenhausgasse 36-38a, 50676, Köln, Deutschland.
| | - Franzisca Marie Dulas
- Medizinische Klassifikationen, Deutsches Institut für Medizinische Dokumentation und Information, Waisenhausgasse 36-38a, 50676, Köln, Deutschland
| | - Katja Maria Schumacher
- Medizinische Klassifikationen, Deutsches Institut für Medizinische Dokumentation und Information, Waisenhausgasse 36-38a, 50676, Köln, Deutschland
| |
Collapse
|
42
|
St. Sauver JL, Carr AB, Yawn BP, Grossardt BR, Bock-Goodner CM, Klein LL, Pankratz JJ, Finney Rutten LJ, Rocca WA. Linking medical and dental health record data: a partnership with the Rochester Epidemiology Project. BMJ Open 2017; 7:e012528. [PMID: 28360234 PMCID: PMC5372048 DOI: 10.1136/bmjopen-2016-012528] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
PURPOSE The purpose of this project was to expand the Rochester Epidemiology Project (REP) medical records linkage infrastructure to include data from oral healthcare providers. The goal of this linkage is to facilitate research studies examining the role of oral health in overall health and quality of life. PARTICIPANTS Eight dental practices joined the REP between 2011 and 2015. The REP study team has linked oral healthcare information with medical record information from local healthcare providers for 31 750 participants who have resided in Olmsted County, Minnesota. Overall, 17 718 (56%) participants are women, 14 318 (45%) are 40 years of age or older and 26 090 (82%) are white. FINDINGS TO DATE A first study using this new information was recently completed. This resource was used to determine whether the 2007 guidelines from the American Heart Association affected prescription rates of antibiotics to patients with moderate-risk cardiac conditions prior to dental procedures. The REP infrastructure was used to identify a series of patients diagnosed with moderate-risk cardiac conditions by the local healthcare providers (n=1351), and to abstract antibiotic prescriptions from dental records both pre-2007 and post-2007. Antibiotic prescriptions prior to dental procedures declined from 62% to 7% following the change in guidelines. FUTURE PLANS Dental data from participating practitioners will be updated on an annual basis, and new dental data will be linked to patient medical records. In addition, we will continue to invite new dental practices to participate in the REP. Finally, we will continue to use this research infrastructure to investigate associations between oral and medical health, and will present findings at conferences and in the scientific literature.
Collapse
Affiliation(s)
- Jennifer L St. Sauver
- Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
- Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, Minnesota, USA
| | - Alan B Carr
- Department of Dental Specialties, Mayo Clinic, Rochester, Minnesota, USA
| | - Barbara P Yawn
- Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
- Department of Research, Olmsted Medical Center, Rochester, Minnesota, USA
| | - Brandon R Grossardt
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | | | - Lori L Klein
- Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Joshua J Pankratz
- Department of Information Technology, Mayo Clinic, Rochester, Minnesota, USA
| | - Lila J Finney Rutten
- Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
- Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, Minnesota, USA
| | - Walter A Rocca
- Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
- Department of Neurology, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
43
|
Bohn J, Eddings W, Schneeweiss S. Conducting Privacy-Preserving Multivariable Propensity Score Analysis When Patient Covariate Information Is Stored in Separate Locations. Am J Epidemiol 2017; 185:501-510. [PMID: 28399565 DOI: 10.1093/aje/kww155] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 03/24/2016] [Indexed: 11/13/2022] Open
Abstract
Distributed networks of health-care data sources are increasingly being utilized to conduct pharmacoepidemiologic database studies. Such networks may contain data that are not physically pooled but instead are distributed horizontally (separate patients within each data source) or vertically (separate measures within each data source) in order to preserve patient privacy. While multivariable methods for the analysis of horizontally distributed data are frequently employed, few practical approaches have been put forth to deal with vertically distributed health-care databases. In this paper, we propose 2 propensity score-based approaches to vertically distributed data analysis and test their performance using 5 example studies. We found that these approaches produced point estimates close to what could be achieved without partitioning. We further found a performance benefit (i.e., lower mean squared error) for sequentially passing a propensity score through each data domain (called the "sequential approach") as compared with fitting separate domain-specific propensity scores (called the "parallel approach"). These results were validated in a small simulation study. This proof-of-concept study suggests a new multivariable analysis approach to vertically distributed health-care databases that is practical, preserves patient privacy, and warrants further investigation for use in clinical research applications that rely on health-care databases.
Collapse
Affiliation(s)
- Justin Bohn
- Department of Education and Psychology, Free University Berlin, Germany
| | - Wesley Eddings
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Sebastian Schneeweiss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, MA, USA
- Harvard Medical School, Boston, MA, USA
| |
Collapse
|
44
|
Pettus DC, Vanderveen T, Canfield RL, Schad R. Reliable and Scalable Infusion System Integration with the Electronic Medical Record. Biomed Instrum Technol 2017; 51:120-129. [PMID: 28296444 DOI: 10.2345/0899-8205-51.2.120] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
|
45
|
Ni MY, Li TK, Hui RWH, McDowell I, Leung GM. Requesting a unique personal identifier or providing a souvenir incentive did not affect overall consent to health record linkage: evidence from an RCT nested within a cohort. J Clin Epidemiol 2017; 84:142-149. [PMID: 28115256 DOI: 10.1016/j.jclinepi.2017.01.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Revised: 12/15/2016] [Accepted: 01/13/2017] [Indexed: 11/15/2022]
Abstract
OBJECTIVE It is unclear if unique personal identifiers should be requested from participants for health record linkage: this permits high-quality data linkage but at the potential cost of lower consent rates due to privacy concerns. STUDY DESIGN AND SETTING Drawing from a sampling frame based on the FAMILY Cohort, using a 2 × 2 factorial design, we randomly assigned 1,200 participants to (1) request for Hong Kong Identity Card number (HKID) or no request and (2) receiving a souvenir incentive (valued at USD4) or no incentive. The primary outcome was consent to health record linkage. We also investigated associations between demographics, health status, and postal reminders with consent. RESULTS Overall, we received signed consent forms from 33.3% (95% confidence interval [CI] 30.6-36.0%) of respondents. We did not find an overall effect of requesting HKID (-4.3%, 95% CI -9.8% to 1.2%) or offering souvenir incentives (2.4%, 95% CI -3.1% to 7.9%) on consent to linkage. In subgroup analyses, requesting HKID significantly reduced consent among adults aged 18-44 years (odds ratio [OR] 0.53, 95% CI 0.30-0.94, compared to no request). Souvenir incentives increased consent among women (OR 1.55, 95% CI 1.13-2.11, compared to no souvenirs). CONCLUSIONS Requesting a unique personal identifier or providing a souvenir incentive did not affect overall consent to health record linkage.
Collapse
Affiliation(s)
- Michael Y Ni
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, 7 Sassoon Road, Hong Kong, China.
| | - Tom K Li
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, 7 Sassoon Road, Hong Kong, China
| | - Rex W H Hui
- Li Ka Shing Faculty of Medicine, The University of Hong Kong, 7 Sassoon Road, Hong Kong, China
| | - Ian McDowell
- School of Epidemiology, Public Health and Preventive Medicine, University of Ottawa, 600 Peter Morand Crescent, Ottawa, Canada
| | - Gabriel M Leung
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, 7 Sassoon Road, Hong Kong, China
| |
Collapse
|
46
|
Stausberg J, Waldenburger A, Borgs C, Schnell R. Combining Different Privacy-Preserving Record Linkage Methods for Hospital Admission Data. Stud Health Technol Inform 2017; 235:161-165. [PMID: 28423775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Record linkage (RL) is the process of identifying pairs of records that correspond to the same entity, for example the same patient. The basic approach assigns to each pair of records a similarity weight, and then determines a certain threshold, above which the two records are considered to be a match. Three different RL methods were applied under privacy-preserving conditions on hospital admission data: deterministic RL (DRL), probabilistic RL (PRL), and Bloom filters. The patient characteristics like names were one-way encrypted (DRL, PRL) or transformed to a cryptographic longterm key (Bloom filters). Based on one year of hospital admissions, the data set was split randomly in 30 thousand new and 1,5 million known patients. With the combination of the three RL-methods, a positive predictive value of 83 % (95 %-confidence interval 65 %-94 %) was attained. Thus, the application of the presented combination of RL-methods seem to be suited for other applications of population-based research.
Collapse
Affiliation(s)
- Jürgen Stausberg
- Institute for Medical Informatics, Biometry and Epidemiology, Faculty of Medicine, University Duisburg-Essen, Essen, Germany
| | - Andreas Waldenburger
- Institute for Medical Informatics, Biometry and Epidemiology, Faculty of Medicine, University Duisburg-Essen, Essen, Germany
| | - Christian Borgs
- German Record Linkage Center, University Duisburg-Essen, Duisburg, Germany
| | - Rainer Schnell
- German Record Linkage Center, University Duisburg-Essen, Duisburg, Germany
| |
Collapse
|
47
|
|
48
|
Holmgren AJ, Patel V, Charles D, Adler-Milstein J. US hospital engagement in core domains of interoperability. Am J Manag Care 2016; 22:e395-e402. [PMID: 27982673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
OBJECTIVES To assess US hospital engagement in the 4 core domains of interoperability (find, send, receive, integrate) and whether engaging in these domains is associated with electronic availability of clinical data from outside providers. STUDY DESIGN Retrospective analysis of survey data. METHODS Analysis of the American Hospital Association (AHA) Annual Survey of Hospitals and the American Hospital Association (AHA) Annual Survey of Hospitals - IT Supplement datasets for 2014. Respondents included 3307 US hospitals to the AHA Annual Survey - IT Supplement. We created measures of hospital engagement in 4 core domains of interoperability, as well as access to electronic clinical data from outside providers. Regression analysis was to identify hospital characteristics associated with each measure. RESULTS Twenty-one percent of US hospitals engaged in all 4 interoperability domains, and 25% engaged in none. Hospitals engaged in all 4 domains were more likely to have a "basic" (odds ratio [OR], 3.53; P < .01) or "comprehensive" (OR, 5.04; P < .01) electronic health record (EHR) in comparison to a less than "basic" EHR, participate in a Regional Health Information Organization (OR, 4.29; P < .01), use a single EHR vendor (OR, 2.15; P < .01), and have a third-party health information exchange vendor (OR, 2.32; P < .01). They also differed by non-IT characteristics, such as medical home participation (OR, 1.77; P < .01). Hospitals that find (OR, 5.51; P < .01), receive (OR, 2.56; P < .01), or integrate (OR, 2.53; P < .01) information were more likely to report routine clinical information availability from outside providers. CONCLUSIONS The one-fifth of US hospitals engaged in key domains of interoperability were more likely to have certain information technology infrastructure and participate in delivery reform. Encouragingly, interoperability engagement was associated with routine clinical information availability. Our results point to the need for ongoing efforts to expand interoperability, with the potential benefit of better information availability for clinicians and better care.
Collapse
|
49
|
Maguire A, Moriarty J, O'Reilly D, McCann M. Education as a predictor of antidepressant and anxiolytic medication use after bereavement: a population-based record linkage study. Qual Life Res 2016; 26:1251-1262. [PMID: 27770330 PMCID: PMC5376389 DOI: 10.1007/s11136-016-1440-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/15/2016] [Indexed: 11/25/2022]
Abstract
Purpose Educational attainment has been shown to be positively associated with mental health and a potential buffer to stressful events. One stressful life event likely to affect everyone in their lifetime is bereavement. This paper assesses the effect of educational attainment on mental health post-bereavement. Methods By utilising large administrative datasets, linking Census returns to death records and prescribed medication data, we analysed the bereavement exposure of 208,332 individuals aged 25–74 years. Two-level multi-level logistic regression models were constructed to determine the likelihood of antidepressant medication use (a proxy of mental ill health) post-bereavement given level of educational attainment. Results Individuals who are bereaved have greater antidepressant use than those who are not bereaved, with over a quarter (26.5 %) of those bereaved by suicide in receipt of antidepressant medication compared to just 12.4 % of those not bereaved. Within individuals bereaved by a sudden death, those with a university degree or higher qualifications are 73 % less likely to be in receipt of antidepressant medication compared to those with no qualifications, after full adjustment for demographic, socio-economic and area factors (OR 0.27, 95 % CI 0.09,0.75). Higher educational attainment and no qualifications have an equivalent effect for those bereaved by suicide. Conclusions Education may protect against poor mental health, as measured by the use of antidepressant medication, post-bereavement, except in those bereaved by suicide. This is likely due to the improved cognitive, personal and psychological skills gained from time spent in education. Electronic supplementary material The online version of this article (doi:10.1007/s11136-016-1440-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Aideen Maguire
- Centre of Excellence for Public Health, Queen's University Belfast, Belfast, UK.
| | - John Moriarty
- Administrative Data Research Network, Queen's University Belfast, Belfast, UK
| | - Dermot O'Reilly
- Centre of Excellence for Public Health, Queen's University Belfast, Belfast, UK
| | - Mark McCann
- MRC/CSO Social and Public Health Sciences Unit, University of Glasgow, Glasgow, UK
| |
Collapse
|
50
|
Harron K, Gilbert R, Cromwell D, van der Meulen J. Linking Data for Mothers and Babies in De-Identified Electronic Health Data. PLoS One 2016; 11:e0164667. [PMID: 27764135 PMCID: PMC5072610 DOI: 10.1371/journal.pone.0164667] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Accepted: 09/29/2016] [Indexed: 01/11/2023] Open
Abstract
OBJECTIVE Linkage of longitudinal administrative data for mothers and babies supports research and service evaluation in several populations around the world. We established a linked mother-baby cohort using pseudonymised, population-level data for England. DESIGN AND SETTING Retrospective linkage study using electronic hospital records of mothers and babies admitted to NHS hospitals in England, captured in Hospital Episode Statistics between April 2001 and March 2013. RESULTS Of 672,955 baby records in 2012/13, 280,470 (42%) linked deterministically to a maternal record using hospital, GP practice, maternal age, birthweight, gestation, birth order and sex. A further 380,164 (56%) records linked using probabilistic methods incorporating additional variables that could differ between mother/baby records (admission dates, ethnicity, 3/4-character postcode district) or that include missing values (delivery variables). The false-match rate was estimated at 0.15% using synthetic data. Data quality improved over time: for 2001/02, 91% of baby records were linked (holding the estimated false-match rate at 0.15%). The linked cohort was representative of national distributions of gender, gestation, birth weight and maternal age, and captured approximately 97% of births in England. CONCLUSION Probabilistic linkage of maternal and baby healthcare characteristics offers an efficient way to enrich maternity data, improve data quality, and create longitudinal cohorts for research and service evaluation. This approach could be extended to linkage of other datasets that have non-disclosive characteristics in common.
Collapse
Affiliation(s)
- Katie Harron
- Department of Health Services Research and Policy, London School of Hygiene and Tropical Medicine, 15-17 Tavistock Place, London, United Kingdom
| | - Ruth Gilbert
- Institute of Child Health, University College London, 30 Guilford Street, London, United Kingdom
| | - David Cromwell
- Department of Health Services Research and Policy, London School of Hygiene and Tropical Medicine, 15-17 Tavistock Place, London, United Kingdom
| | - Jan van der Meulen
- Department of Health Services Research and Policy, London School of Hygiene and Tropical Medicine, 15-17 Tavistock Place, London, United Kingdom
| |
Collapse
|