1
|
Bu F, Arshad F, Hripcsak G, Ryan PB, Schuemie MJ, Suchard MA. Authors' Response to Huang et al.'s Comment on "Serially Combining Epidemiological Designs Does Not Improve Overall Signal Detection in Vaccine Safety Surveillance". Drug Saf 2024; 47:403-404. [PMID: 38441750 DOI: 10.1007/s40264-024-01411-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/12/2024] [Indexed: 03/21/2024]
Affiliation(s)
- Fan Bu
- Observational Health Data Sciences and Informatics, New York, NY, USA
- Department of Biostatistics, University of California, 695 Charles E. Young Dr., South, Los Angeles, CA, 90095, USA
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Faaizah Arshad
- Observational Health Data Sciences and Informatics, New York, NY, USA
- Department of Biostatistics, University of California, 695 Charles E. Young Dr., South, Los Angeles, CA, 90095, USA
| | - George Hripcsak
- Observational Health Data Sciences and Informatics, New York, NY, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
- Medical Informatics Services, New York-Presbyterian Hospital, New York, NY, USA
| | - Patrick B Ryan
- Observational Health Data Sciences and Informatics, New York, NY, USA
- Observational Health Data Analytics, Janssen R&D, Titusville, NJ, USA
| | - Martijn J Schuemie
- Observational Health Data Sciences and Informatics, New York, NY, USA
- Department of Biostatistics, University of California, 695 Charles E. Young Dr., South, Los Angeles, CA, 90095, USA
- Observational Health Data Analytics, Janssen R&D, Titusville, NJ, USA
| | - Marc A Suchard
- Observational Health Data Sciences and Informatics, New York, NY, USA.
- Department of Biostatistics, University of California, 695 Charles E. Young Dr., South, Los Angeles, CA, 90095, USA.
- VA Informatics and Computing Infrastructure, US Department of Veterans Affairs, Washington, DC, USA.
| |
Collapse
|
2
|
Gao J, Bonzel CL, Hong C, Varghese P, Zakir K, Gronsbell J. Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms. J Am Med Inform Assoc 2024; 31:640-650. [PMID: 38128118 PMCID: PMC10873838 DOI: 10.1093/jamia/ocad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/22/2023] [Accepted: 11/20/2023] [Indexed: 12/23/2023] Open
Abstract
OBJECTIVE High-throughput phenotyping will accelerate the use of electronic health records (EHRs) for translational research. A critical roadblock is the extensive medical supervision required for phenotyping algorithm (PA) estimation and evaluation. To address this challenge, numerous weakly-supervised learning methods have been proposed. However, there is a paucity of methods for reliably evaluating the predictive performance of PAs when a very small proportion of the data is labeled. To fill this gap, we introduce a semi-supervised approach (ssROC) for estimation of the receiver operating characteristic (ROC) parameters of PAs (eg, sensitivity, specificity). MATERIALS AND METHODS ssROC uses a small labeled dataset to nonparametrically impute missing labels. The imputations are then used for ROC parameter estimation to yield more precise estimates of PA performance relative to classical supervised ROC analysis (supROC) using only labeled data. We evaluated ssROC with synthetic, semi-synthetic, and EHR data from Mass General Brigham (MGB). RESULTS ssROC produced ROC parameter estimates with minimal bias and significantly lower variance than supROC in the simulated and semi-synthetic data. For the 5 PAs from MGB, the estimates from ssROC are 30% to 60% less variable than supROC on average. DISCUSSION ssROC enables precise evaluation of PA performance without demanding large volumes of labeled data. ssROC is also easily implementable in open-source R software. CONCLUSION When used in conjunction with weakly-supervised PAs, ssROC facilitates the reliable and streamlined phenotyping necessary for EHR-based research.
Collapse
Affiliation(s)
- Jianhui Gao
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States
| | - Paul Varghese
- Health Informatics, Verily Life Sciences, Cambridge, MA, United States
| | - Karim Zakir
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
- Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
3
|
Swerdel JN, Conover MM. Comparing broad and narrow phenotype algorithms: differences in performance characteristics and immortal time incurred. JOURNAL OF PHARMACY & PHARMACEUTICAL SCIENCES : A PUBLICATION OF THE CANADIAN SOCIETY FOR PHARMACEUTICAL SCIENCES, SOCIETE CANADIENNE DES SCIENCES PHARMACEUTIQUES 2024; 26:12095. [PMID: 38235322 PMCID: PMC10791821 DOI: 10.3389/jpps.2023.12095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 12/15/2023] [Indexed: 01/19/2024]
Abstract
Introduction: When developing phenotype algorithms for observational research, there is usually a trade-off between definitions that are sensitive or specific. The objective of this study was to estimate the performance characteristics of phenotype algorithms designed for increasing specificity and to estimate the immortal time associated with each algorithm. Materials and methods: We examined algorithms for 11 chronic health conditions. The analyses were from data from five databases. For each health condition, we created five algorithms to examine performance (sensitivity and positive predictive value (PPV)) differences: one broad algorithm using a single code for the health condition and four narrow algorithms where a second diagnosis code was required 1-30 days, 1-90 days, 1-365 days, or 1- all days in a subject's continuous observation period after the first code. We also examined the proportion of immortal time relative to time-at-risk (TAR) for four outcomes. The TAR's were: 0-30 days after the first condition occurrence (the index date), 0-90 days post-index, 0-365 days post-index, and 0-1,095 days post-index. Performance of algorithms for chronic health conditions was estimated using PheValuator (V2.1.4) from the OHDSI toolstack. Immortal time was calculated as the time from the index date until the first of the following: 1) the outcome; 2) the end of the outcome TAR; 3) the occurrence of the second code for the chronic health condition. Results: In the first analysis, the narrow phenotype algorithms, i.e., those requiring a second condition code, produced higher estimates for PPV and lower estimates for sensitivity compared to the single code algorithm. In all conditions, increasing the time to the required second code increased the sensitivity of the algorithm. In the second analysis, the amount of immortal time increased as the window used to identify the second diagnosis code increased. The proportion of TAR that was immortal was highest in the 30 days TAR analyses compared to the 1,095 days TAR analyses. Conclusion: Attempting to increase the specificity of a health condition algorithm by adding a second code is a potentially valid approach to increase specificity, albeit at the cost of incurring immortal time.
Collapse
Affiliation(s)
- Joel N. Swerdel
- Observational Health Data Analytics, Global Epidemiology, Janssen Research and Development, Titusville, NJ, United States
- Observational Health Data Sciences and Informatics, New York, NY, United States
| | - Mitchell M. Conover
- Observational Health Data Analytics, Global Epidemiology, Janssen Research and Development, Titusville, NJ, United States
- Observational Health Data Sciences and Informatics, New York, NY, United States
| |
Collapse
|