1
|
Lotspeich SC, Shepherd BE, Kariuki MA, Wools-Kaloustian K, McGowan CC, Musick B, Semeere A, Crabtree Ramírez BE, Mkwashapi DM, Cesar C, Ssemakadde M, Machado DM, Ngeresa A, Ferreira FF, Lwali J, Marcelin A, Cardoso SW, Luque MT, Otero L, Cortés CP, Duda SN. Lessons learned from over a decade of data audits in international observational HIV cohorts in Latin America and East Africa. J Clin Transl Sci 2023; 7:e245. [PMID: 38033704 PMCID: PMC10685260 DOI: 10.1017/cts.2023.659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 10/13/2023] [Accepted: 10/16/2023] [Indexed: 12/02/2023] Open
Abstract
Introduction Routine patient care data are increasingly used for biomedical research, but such "secondary use" data have known limitations, including their quality. When leveraging routine care data for observational research, developing audit protocols that can maximize informational return and minimize costs is paramount. Methods For more than a decade, the Latin America and East Africa regions of the International epidemiology Databases to Evaluate AIDS (IeDEA) consortium have been auditing the observational data drawn from participating human immunodeficiency virus clinics. Since our earliest audits, where external auditors used paper forms to record audit findings from paper medical records, we have streamlined our protocols to obtain more efficient and informative audits that keep up with advancing technology while reducing travel obligations and associated costs. Results We present five key lessons learned from conducting data audits of secondary-use data from resource-limited settings for more than 10 years and share eight recommendations for other consortia looking to implement data quality initiatives. Conclusion After completing multiple audit cycles in both the Latin America and East Africa regions of the IeDEA consortium, we have established a rich reference for data quality in our cohorts, as well as large, audited analytical datasets that can be used to answer important clinical questions with confidence. By sharing our audit processes and how they have been adapted over time, we hope that others can develop protocols informed by our lessons learned from more than a decade of experience in these large, diverse cohorts.
Collapse
Affiliation(s)
- Sarah C. Lotspeich
- Department of Statistical Sciences, Wake Forest
University, Winston-Salem, NC,
USA
- Department of Biostatistics, Vanderbilt University Medical
Center, Nashville, TN, USA
| | - Bryan E. Shepherd
- Department of Biostatistics, Vanderbilt University Medical
Center, Nashville, TN, USA
| | | | - Kara Wools-Kaloustian
- Department of Medicine, Indiana University School of
Medicine, Indianapolis, IN,
USA
| | - Catherine C. McGowan
- Division of Infectious Diseases, Department of Medicine,
Vanderbilt University Medical Center, Nashville,
TN, USA
| | - Beverly Musick
- Department of Biostatistics, Indiana University School of
Medicine, Indianapolis, IN,
USA
| | - Aggrey Semeere
- Infectious Diseases Institute, Makerere University,
Kampala, Uganda
| | - Brenda E. Crabtree Ramírez
- Department of Infectious Diseases, Instituto Nacional de
Ciencias Méxicas y Nutrición Salvador Zubirán, Mexico City,
Mexico
| | - Denna M. Mkwashapi
- Sexual and Reproductive Health Program, National Institute
for Medical Research Mwanza, United Republic of Tanzania,
Mwanza, Tanzania
| | | | | | - Daisy Maria Machado
- Departamento de Pediatria, Universidade Federal de São
Paulo, São Paulo, Brazil
| | - Antony Ngeresa
- Academic Model Providing Access to Health Care (AMPATH),
Eldoret, Kenya
| | | | - Jerome Lwali
- Tumbi Hospital HIV Care and Treatment Clinic, United Republic of
Tanzania, Kibaha, Tanzania
| | - Adias Marcelin
- Le Groupe Haïtien d’Etude du Sarcome de Kaposi et des Infections
Opportunistes, Port-au-Prince, Haiti
| | | | - Marco Tulio Luque
- Instituto Hondureño de Seguridad Social and Hospital Escuela
Universitario, Tegucigalpa, Honduras
| | - Larissa Otero
- Instituto de Medicina Tropical Alexander von Humboldt, Universidad Peruana
Cayetano Heredia, Lima, Peru
- School of Medicine, Universidad Peruana Cayetano Heredia,
Lima, Peru
| | | | - Stephany N. Duda
- Department of Biomedical Informatics, Vanderbilt University
Medical Center, Nashville, TN,
USA
| |
Collapse
|
2
|
Lotspeich SC, Amorim GGC, Shaw PA, Tao R, Shepherd BE. Optimal multiwave validation of secondary use data with outcome and exposure misclassification. CAN J STAT 2023. [DOI: 10.1002/cjs.11772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
3
|
Lotspeich SC, Shepherd BE, Amorim GGC, Shaw PA, Tao R. Efficient odds ratio estimation under two-phase sampling using error-prone data from a multi-national HIV research cohort. Biometrics 2022; 78:1674-1685. [PMID: 34213008 PMCID: PMC8720323 DOI: 10.1111/biom.13512] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 05/19/2021] [Accepted: 06/17/2021] [Indexed: 12/30/2022]
Abstract
Persons living with HIV engage in routine clinical care, generating large amounts of data in observational HIV cohorts. These data are often error-prone, and directly using them in biomedical research could bias estimation and give misleading results. A cost-effective solution is the two-phase design, under which the error-prone variables are observed for all patients during Phase I, and that information is used to select patients for data auditing during Phase II. For example, the Caribbean, Central, and South America network for HIV epidemiology (CCASAnet) selected a random sample from each site for data auditing. Herein, we consider efficient odds ratio estimation with partially audited, error-prone data. We propose a semiparametric approach that uses all information from both phases and accommodates a number of error mechanisms. We allow both the outcome and covariates to be error-prone and these errors to be correlated, and selection of the Phase II sample can depend on Phase I data in an arbitrary manner. We devise a computationally efficient, numerically stable EM algorithm to obtain estimators that are consistent, asymptotically normal, and asymptotically efficient. We demonstrate the advantages of the proposed methods over existing ones through extensive simulations. Finally, we provide applications to the CCASAnet cohort.
Collapse
Affiliation(s)
- Sarah C. Lotspeich
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, U.S.A
| | - Bryan E. Shepherd
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, U.S.A
| | - Gustavo G. C. Amorim
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, U.S.A
| | - Pamela A. Shaw
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, U.S.A
| | - Ran Tao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, U.S.A
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, U.S.A
| |
Collapse
|
4
|
Shepherd BE, Shaw PA. Errors in multiple variables in human immunodeficiency virus (HIV) cohort and electronic health record data: statistical challenges and opportunities. STATISTICAL COMMUNICATIONS IN INFECTIOUS DISEASES 2020; 12:20190015. [PMID: 35880997 PMCID: PMC9204761 DOI: 10.1515/scid-2019-0015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 08/21/2020] [Indexed: 06/15/2023]
Abstract
Objectives: Observational data derived from patient electronic health records (EHR) data are increasingly used for human immunodeficiency virus/acquired immunodeficiency syndrome (HIV/AIDS) research. There are challenges to using these data, in particular with regards to data quality; some are recognized, some unrecognized, and some recognized but ignored. There are great opportunities for the statistical community to improve inference by incorporating validation subsampling into analyses of EHR data.Methods: Methods to address measurement error, misclassification, and missing data are relevant, as are sampling designs such as two-phase sampling. However, many of the existing statistical methods for measurement error, for example, only address relatively simple settings, whereas the errors seen in these datasets span multiple variables (both predictors and outcomes), are correlated, and even affect who is included in the study.Results/Conclusion: We will discuss some preliminary methods in this area with a particular focus on time-to-event outcomes and outline areas of future research.
Collapse
Affiliation(s)
- Bryan E. Shepherd
- Biostatistics, Vanderbilt University, 2525 West End, Suite 11000, 37203Nashville, Tennessee, USA
| | - Pamela A. Shaw
- Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|