Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

11
(from Reference Citation Analysis)

Article PDFs (5)

Cited by > 0 (9)

Searched Name

Ellery Wulczyn

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Schaekermann M, Spitz T, Pyles M, Cole-Lewis H, Wulczyn E, Pfohl SR, Martin D, Jaroensri R, Keeling G, Liu Y, Farquhar S, Xue Q, Lester J, Hughes C, Strachan P, Tan F, Bui P, Mermel CH, Peng LH, Matias Y, Corrado GS, Webster DR, Virmani S, Semturs C, Liu Y, Horn I, Cameron Chen PH. Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study. EClinicalMedicine 2024;70:102479. [PMID: 38685924 PMCID: PMC11056401 DOI: 10.1016/j.eclinm.2024.102479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 01/16/2024] [Accepted: 01/25/2024] [Indexed: 05/02/2024] Open

Abstract

Background

Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study.

Methods

Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case.

Findings

Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs.

Interpretation

Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes.

Funding

Google LLC.

Collapse

Azizi S, Culp L, Freyberg J, Mustafa B, Baur S, Kornblith S, Chen T, Tomasev N, Mitrović J, Strachan P, Mahdavi SS, Wulczyn E, Babenko B, Walker M, Loh A, Chen PHC, Liu Y, Bavishi P, McKinney SM, Winkens J, Roy AG, Beaver Z, Ryan F, Krogue J, Etemadi M, Telang U, Liu Y, Peng L, Corrado GS, Webster DR, Fleet D, Hinton G, Houlsby N, Karthikesalingam A, Norouzi M, Natarajan V. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat Biomed Eng 2023:10.1038/s41551-023-01049-7. [PMID: 37291435 DOI: 10.1038/s41551-023-01049-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 05/02/2023] [Indexed: 06/10/2023]

Affiliation(s)

Shekoofeh Azizi Google Research, Mountain View, CA, USA.
Laura Culp Google Research, Mountain View, CA, USA
Jan Freyberg Google Research, Mountain View, CA, USA
Basil Mustafa Google Research, Mountain View, CA, USA
Sebastien Baur Google Research, Mountain View, CA, USA
Simon Kornblith Google Research, Mountain View, CA, USA
Ting Chen Google Research, Mountain View, CA, USA
Nenad Tomasev DeepMind, London, UK
Jovana Mitrović DeepMind, London, UK
Patricia Strachan Google Research, Mountain View, CA, USA
S Sara Mahdavi Google Research, Mountain View, CA, USA
Ellery Wulczyn Google Research, Mountain View, CA, USA
Boris Babenko Google Research, Mountain View, CA, USA
Megan Walker Google Research, Mountain View, CA, USA
Aaron Loh Google Research, Mountain View, CA, USA
Po-Hsuan Cameron Chen Google Research, Mountain View, CA, USA
Yuan Liu Google Research, Mountain View, CA, USA
Pinal Bavishi Google Research, Mountain View, CA, USA
Scott Mayer McKinney Google Research, Mountain View, CA, USA
Jim Winkens Google Research, Mountain View, CA, USA
Abhijit Guha Roy Google Research, Mountain View, CA, USA
Zach Beaver Google Research, Mountain View, CA, USA
Fiona Ryan Georgia Institute of Technology, Computer Science, Atlanta, GA, USA
Justin Krogue Google Research, Mountain View, CA, USA
Mozziyar Etemadi School of Medicine/School of Engineering, Northwestern University, Chicago, IL, USA
Umesh Telang Google Research, Mountain View, CA, USA
Yun Liu Google Research, Mountain View, CA, USA
Lily Peng Google Research, Mountain View, CA, USA
Greg S Corrado Google Research, Mountain View, CA, USA
Dale R Webster Google Research, Mountain View, CA, USA
David Fleet Google Research, Mountain View, CA, USA
Geoffrey Hinton Google Research, Mountain View, CA, USA
Neil Houlsby Google Research, Mountain View, CA, USA
Alan Karthikesalingam Google Research, Mountain View, CA, USA.
Mohammad Norouzi Google Research, Mountain View, CA, USA
Vivek Natarajan Google Research, Mountain View, CA, USA.

Collapse

Krogue JD, Azizi S, Tan F, Flament-Auvigne I, Brown T, Plass M, Reihs R, Müller H, Zatloukal K, Richeson P, Corrado GS, Peng LH, Mermel CH, Liu Y, Chen PHC, Gombar S, Montine T, Shen J, Steiner DF, Wulczyn E. Predicting lymph node metastasis from primary tumor histology and clinicopathologic factors in colorectal cancer using deep learning. Commun Med (Lond) 2023;3:59. [PMID: 37095223 PMCID: PMC10125969 DOI: 10.1038/s43856-023-00282-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 03/29/2023] [Indexed: 04/26/2023] Open

L’Imperio V, Wulczyn E, Plass M, Müller H, Tamini N, Gianotti L, Zucchini N, Reihs R, Corrado GS, Webster DR, Peng LH, Chen PHC, Lavitrano M, Liu Y, Steiner DF, Zatloukal K, Pagni F. Pathologist Validation of a Machine Learning-Derived Feature for Colon Cancer Risk Stratification. JAMA Netw Open 2023;6:e2254891. [PMID: 36917112 PMCID: PMC10015309 DOI: 10.1001/jamanetworkopen.2022.54891] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/16/2023] Open

Abstract

IMPORTANCE

Identifying new prognostic features in colon cancer has the potential to refine histopathologic review and inform patient care. Although prognostic artificial intelligence systems have recently demonstrated significant risk stratification for several cancer types, studies have not yet shown that the machine learning-derived features associated with these prognostic artificial intelligence systems are both interpretable and usable by pathologists.

OBJECTIVE

To evaluate whether pathologist scoring of a histopathologic feature previously identified by machine learning is associated with survival among patients with colon cancer.

DESIGN, SETTING, AND PARTICIPANTS

This prognostic study used deidentified, archived colorectal cancer cases from January 2013 to December 2015 from the University of Milano-Bicocca. All available histologic slides from 258 consecutive colon adenocarcinoma cases were reviewed from December 2021 to February 2022 by 2 pathologists, who conducted semiquantitative scoring for tumor adipose feature (TAF), which was previously identified via a prognostic deep learning model developed with an independent colorectal cancer cohort.

MAIN OUTCOMES AND MEASURES

Prognostic value of TAF for overall survival and disease-specific survival as measured by univariable and multivariable regression analyses. Interpathologist agreement in TAF scoring was also evaluated.

RESULTS

A total of 258 colon adenocarcinoma histopathologic cases from 258 patients (138 men [53%]; median age, 67 years [IQR, 65-81 years]) with stage II (n = 119) or stage III (n = 139) cancer were included. Tumor adipose feature was identified in 120 cases (widespread in 63 cases, multifocal in 31, and unifocal in 26). For overall survival analysis after adjustment for tumor stage, TAF was independently prognostic in 2 ways: TAF as a binary feature (presence vs absence: hazard ratio [HR] for presence of TAF, 1.55 [95% CI, 1.07-2.25]; P = .02) and TAF as a semiquantitative categorical feature (HR for widespread TAF, 1.87 [95% CI, 1.23-2.85]; P = .004). Interpathologist agreement for widespread TAF vs lower categories (absent, unifocal, or multifocal) was 90%, corresponding to a κ metric at this threshold of 0.69 (95% CI, 0.58-0.80).

CONCLUSIONS AND RELEVANCE

In this prognostic study, pathologists were able to learn and reproducibly score for TAF, providing significant risk stratification on this independent data set. Although additional work is warranted to understand the biological significance of this feature and to establish broadly reproducible TAF scoring, this work represents the first validation to date of human expert learning from machine learning in pathology. Specifically, this validation demonstrates that a computationally identified histologic feature can represent a human-identifiable, prognostic feature with the potential for integration into pathology practice.

Collapse

Affiliation(s)

Vincenzo L’Imperio Department of Medicine and Surgery, Pathology, University of Milano-Bicocca, IRCCS (Scientific Institute for Research, Hospitalization and Healthcare) Fondazione San Gerardo dei Tintori, Monza, Italy
Ellery Wulczyn Google Health, Google LLC, Palo Alto, California
Markus Plass Medical University of Graz, Diagnostic and Research Institute of Pathology, Graz, Austria
Heimo Müller Medical University of Graz, Diagnostic and Research Institute of Pathology, Graz, Austria
Nicolò Tamini Department of Surgery, San Gerardo Hospital, Monza, Italy
Luca Gianotti Department of Surgery, San Gerardo Hospital, Monza, Italy
Nicola Zucchini Department of Medicine and Surgery, Pathology, University of Milano-Bicocca, IRCCS (Scientific Institute for Research, Hospitalization and Healthcare) Fondazione San Gerardo dei Tintori, Monza, Italy
Robert Reihs Medical University of Graz, Diagnostic and Research Institute of Pathology, Graz, Austria
Greg S. Corrado Google Health, Google LLC, Palo Alto, California
Dale R. Webster Google Health, Google LLC, Palo Alto, California
Lily H. Peng Google Health, Google LLC, Palo Alto, California
Po-Hsuan Cameron Chen Google Health, Google LLC, Palo Alto, California
Marialuisa Lavitrano Department of Medicine and Surgery, Pathology, University of Milano-Bicocca, IRCCS (Scientific Institute for Research, Hospitalization and Healthcare) Fondazione San Gerardo dei Tintori, Monza, Italy
Yun Liu Google Health, Google LLC, Palo Alto, California
David F. Steiner Google Health, Google LLC, Palo Alto, California
Kurt Zatloukal Medical University of Graz, Diagnostic and Research Institute of Pathology, Graz, Austria
Fabio Pagni Department of Medicine and Surgery, Pathology, University of Milano-Bicocca, IRCCS (Scientific Institute for Research, Hospitalization and Healthcare) Fondazione San Gerardo dei Tintori, Monza, Italy

Collapse

Sadhwani A, Chang HW, Behrooz A, Brown T, Auvigne-Flament I, Patel H, Findlater R, Velez V, Tan F, Tekiela K, Wulczyn E, Yi ES, Mermel CH, Hanks D, Chen PHC, Kulig K, Batenchuk C, Steiner DF, Cimermancic P. Comparative analysis of machine learning approaches to classify tumor mutation burden in lung adenocarcinoma using histopathology images. Sci Rep 2021;11:16605. [PMID: 34400666 PMCID: PMC8368039 DOI: 10.1038/s41598-021-95747-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 07/12/2021] [Indexed: 01/11/2023] Open

Abstract

Both histologic subtypes and tumor mutation burden (TMB) represent important biomarkers in lung cancer, with implications for patient prognosis and treatment decisions. Typically, TMB is evaluated by comprehensive genomic profiling but this requires use of finite tissue specimens and costly, time-consuming laboratory processes. Histologic subtype classification represents an established component of lung adenocarcinoma histopathology, but can be challenging and is associated with substantial inter-pathologist variability. Here we developed a deep learning system to both classify histologic patterns in lung adenocarcinoma and predict TMB status using de-identified Hematoxylin and Eosin (H&E) stained whole slide images. We first trained a convolutional neural network to map histologic features across whole slide images of lung cancer resection specimens. On evaluation using an external data source, this model achieved patch-level area under the receiver operating characteristic curve (AUC) of 0.78–0.98 across nine histologic features. We then integrated the output of this model with clinico-demographic data to develop an interpretable model for TMB classification. The resulting end-to-end system was evaluated on 172 held out cases from TCGA, achieving an AUC of 0.71 (95% CI 0.63–0.80). The benefit of using histologic features in predicting TMB is highlighted by the significant improvement this approach offers over using the clinical features alone (AUC of 0.63 [95% CI 0.53–0.72], p = 0.002). Furthermore, we found that our histologic subtype-based approach achieved performance similar to that of a weakly supervised approach (AUC of 0.72 [95% CI 0.64–0.80]). Together these results underscore that incorporating histologic patterns in biomarker prediction for lung cancer provides informative signals, and that interpretable approaches utilizing these patterns perform comparably with less interpretable, weakly supervised approaches.

Collapse

Wilson M, Chopra R, Wilson MZ, Cooper C, MacWilliams P, Liu Y, Wulczyn E, Florea D, Hughes CO, Karthikesalingam A, Khalid H, Vermeirsch S, Nicholson L, Keane PA, Balaskas K, Kelly CJ. Validation and Clinical Applicability of Whole-Volume Automated Segmentation of Optical Coherence Tomography in Retinal Disease Using Deep Learning. JAMA Ophthalmol 2021;139:964-973. [PMID: 34236406 PMCID: PMC8444027 DOI: 10.1001/jamaophthalmol.2021.2273] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Abstract

Question

Is deep learning–based segmentation of macular disease in optical coherence tomography (OCT) suitable for clinical use?

Findings

In this diagnostic study of OCT data from 173 patients with age-related macular degeneration or diabetic macular edema, model segmentations qualitatively ranked better or comparable for clinical applicability to 1 or more expert grader segmentations in 127 scans (73%) by a panel of 3 retinal specialists. Scans with high quantitative accuracy scores were not reliably associated with higher rankings.

Meaning

These findings suggest that qualitative evaluation adds to quantitative approaches when assessing clinical applicability of segmentation tools and clinician satisfaction in practice.

Importance

Quantitative volumetric measures of retinal disease in optical coherence tomography (OCT) scans are infeasible to perform owing to the time required for manual grading. Expert-level deep learning systems for automatic OCT segmentation have recently been developed. However, the potential clinical applicability of these systems is largely unknown.

Objective

To evaluate a deep learning model for whole-volume segmentation of 4 clinically important pathological features and assess clinical applicability.

Design, Setting, Participants

This diagnostic study used OCT data from 173 patients with a total of 15 558 B-scans, treated at Moorfields Eye Hospital. The data set included 2 common OCT devices and 2 macular conditions: wet age-related macular degeneration (107 scans) and diabetic macular edema (66 scans), covering the full range of severity, and from 3 points during treatment. Two expert graders performed pixel-level segmentations of intraretinal fluid, subretinal fluid, subretinal hyperreflective material, and pigment epithelial detachment, including all B-scans in each OCT volume, taking as long as 50 hours per scan. Quantitative evaluation of whole-volume model segmentations was performed. Qualitative evaluation of clinical applicability by 3 retinal experts was also conducted. Data were collected from June 1, 2012, to January 31, 2017, for set 1 and from January 1 to December 31, 2017, for set 2; graded between November 2018 and January 2020; and analyzed from February 2020 to November 2020.

Main Outcomes and Measures

Rating and stack ranking for clinical applicability by retinal specialists, model-grader agreement for voxelwise segmentations, and total volume evaluated using Dice similarity coefficients, Bland-Altman plots, and intraclass correlation coefficients.

Results

Among the 173 patients included in the analysis (92 [53%] women), qualitative assessment found that automated whole-volume segmentation ranked better than or comparable to at least 1 expert grader in 127 scans (73%; 95% CI, 66%-79%). A neutral or positive rating was given to 135 model segmentations (78%; 95% CI, 71%-84%) and 309 expert gradings (2 per scan) (89%; 95% CI, 86%-92%). The model was rated neutrally or positively in 86% to 92% of diabetic macular edema scans and 53% to 87% of age-related macular degeneration scans. Intraclass correlations ranged from 0.33 (95% CI, 0.08-0.96) to 0.96 (95% CI, 0.90-0.99). Dice similarity coefficients ranged from 0.43 (95% CI, 0.29-0.66) to 0.78 (95% CI, 0.57-0.85).

Conclusions and Relevance

This deep learning–based segmentation tool provided clinically useful measures of retinal disease that would otherwise be infeasible to obtain. Qualitative evaluation was additionally important to reveal clinical applicability for both care management and research.

Collapse

Affiliation(s)

Marc Wilson Google Health, London, United Kingdom
Reena Chopra Google Health, London, United Kingdom.,National Institute for Health Research Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS (National Health Service) Foundation Trust, London, United Kingdom.,University College London Institute of Ophthalmology, London, United Kingdom
Megan Z Wilson Google Health, London, United Kingdom
Charlotte Cooper Google Health, London, United Kingdom
Patricia MacWilliams Google Health, London, United Kingdom
Yun Liu Google Health, Palo Alto, California
Ellery Wulczyn Google Health, Palo Alto, California
Daniela Florea National Institute for Health Research Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS (National Health Service) Foundation Trust, London, United Kingdom.,University College London Institute of Ophthalmology, London, United Kingdom
Cían O Hughes Google Health, London, United Kingdom
Alan Karthikesalingam Google Health, London, United Kingdom
Hagar Khalid National Institute for Health Research Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS (National Health Service) Foundation Trust, London, United Kingdom.,University College London Institute of Ophthalmology, London, United Kingdom
Sandra Vermeirsch National Institute for Health Research Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS (National Health Service) Foundation Trust, London, United Kingdom.,University College London Institute of Ophthalmology, London, United Kingdom
Luke Nicholson National Institute for Health Research Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS (National Health Service) Foundation Trust, London, United Kingdom.,University College London Institute of Ophthalmology, London, United Kingdom
Pearse A Keane National Institute for Health Research Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS (National Health Service) Foundation Trust, London, United Kingdom.,University College London Institute of Ophthalmology, London, United Kingdom
Konstantinos Balaskas National Institute for Health Research Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS (National Health Service) Foundation Trust, London, United Kingdom.,University College London Institute of Ophthalmology, London, United Kingdom
Christopher J Kelly Google Health, London, United Kingdom

Collapse

Wulczyn E, Steiner DF, Moran M, Plass M, Reihs R, Mueller H, Sadhwani A, Cai Y, Flament I, Chen PHC, Liu Y, Stumpe MC, Xu Z, Zatloukal K, Mermel CH. Abstract 2096: A deep learning system to predict disease-specific survival in stage II and stage III colorectal cancer. Cancer Res 2020. [DOI: 10.1158/1538-7445.am2020-2096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract Abstract Accurate prognosis in colorectal cancer can have important implications for clinical management. Here, we develop a deep learning system (DLS) to first identify invasive cancer and then directly predict disease specific survival (DSS) for stage II and stage III colorectal cancer using only digitized histopathology whole-slide images. The DLS was trained using slides from 1173 stage II and 1266 stage III cases (18,304 total slides) and was evaluated on a held-out test set of 601 stage II and 638 stage III cases (9,340 total slides). The area under the receiver operating characteristic curve (AUC) for 5-year DSS prediction was 68.0 for stage II (95% CI 62.2-73.1) and 65.5 for stage III (95% CI 61.1-70.0). For stage II, 5-year DSS was 64% for DLS-predicted high-risk cases versus 89% for DLS-predicted low-risk cases (upper and lower risk quartiles; p<0.001, log rank test). For stage III, 5-year DSS was 35% for DLS-predicted high-risk cases versus 66% for DLS-predicted low-risk cases (upper and lower risk quartiles; p<0.001, log rank test). In a multivariable Cox model, the DLS prediction remained significantly associated with DSS after adjusting for T-category, N-category, age, gender, tumor grade, and lymphovascular invasion (stage II: adjusted hazard ratio 1.55, 95% CI 1.33-1.81, p<0.0001; stage III: adjusted hazard ratio 1.35, 95% CI (1.21-1.51), p<0.0001). Finally, a combined proportional-hazards model using the DLS along with baseline clinicopathologic information provided better risk prediction than the DLS or baseline information alone, increasing 5-year AUC over the baseline-only model by 8.9 points (95% CI 3.9-13.6) and 5.3 points (95% CI 2.3-8.4) for stages II and III, respectively. Taken together, these findings demonstrate that the DLS provides significant prognostic value and risk stratification in both stage II and stage III colorectal cancer, and can be combined with known risk features to further improve prognostic accuracy. This represents novel work to train a DLS to directly predict patient outcomes using whole-slide images and weakly supervised learning. The ability to use non-annotated slides as input has important implications for possible clinical applications and the features learned by the model may also help to identify new prognosis-associated morphologic factors in colorectal cancer. Additional work is ongoing to confirm the utility of these findings, such as validation in additional datasets and interpretability experiments to better understand the features learned by the DLS for these predictions. Citation Format: Ellery Wulczyn, David F. Steiner, Melissa Moran, Markus Plass, Robert Reihs, Heimo Mueller, Apaar Sadhwani, Yuannan Cai, Isabelle Flament, Po-Hsuan Cameron Chen, Yun Liu, Martin C. Stumpe, Zhaoyang Xu, Kurt Zatloukal, Craig H. Mermel. A deep learning system to predict disease-specific survival in stage II and stage III colorectal cancer [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 2096. Collapse

Wulczyn E, Steiner DF, Xu Z, Sadhwani A, Wang H, Flament-Auvigne I, Mermel CH, Chen PHC, Liu Y, Stumpe MC. Deep learning-based survival prediction for multiple cancer types using histopathology images. PLoS One 2020;15:e0233678. [PMID: 32555646 PMCID: PMC7299324 DOI: 10.1371/journal.pone.0233678] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 05/10/2020] [Indexed: 12/12/2022] Open

Abstract

Providing prognostic information at the time of cancer diagnosis has important implications for treatment and monitoring. Although cancer staging, histopathological assessment, molecular features, and clinical variables can provide useful prognostic insights, improving risk stratification remains an active research area. We developed a deep learning system (DLS) to predict disease specific survival across 10 cancer types from The Cancer Genome Atlas (TCGA). We used a weakly-supervised approach without pixel-level annotations, and tested three different survival loss functions. The DLS was developed using 9,086 slides from 3,664 cases and evaluated using 3,009 slides from 1,216 cases. In multivariable Cox regression analysis of the combined cohort including all 10 cancers, the DLS was significantly associated with disease specific survival (hazard ratio of 1.58, 95% CI 1.28–1.70, p<0.0001) after adjusting for cancer type, stage, age, and sex. In a per-cancer adjusted subanalysis, the DLS remained a significant predictor of survival in 5 of 10 cancer types. Compared to a baseline model including stage, age, and sex, the c-index of the model demonstrated an absolute 3.7% improvement (95% CI 1.0–6.5) in the combined cohort. Additionally, our models stratified patients within individual cancer stages, particularly stage II (p = 0.025) and stage III (p<0.001). By developing and evaluating prognostic models across multiple cancer types, this work represents one of the most comprehensive studies exploring the direct prediction of clinical outcomes using deep learning and histopathology images. Our analysis demonstrates the potential for this approach to provide significant prognostic information in multiple cancer types, and even within specific pathologic stages. However, given the relatively small number of cases and observed clinical events for a deep learning task of this type, we observed wide confidence intervals for model performance, thus highlighting that future work will benefit from larger datasets assembled for the purposes for survival modeling.

Collapse

Nagpal K, Foote D, Liu Y, Chen PHC, Wulczyn E, Tan F, Olson N, Smith JL, Mohtashamian A, Wren JH, Corrado GS, MacDonald R, Peng LH, Amin MB, Evans AJ, Sangoi AR, Mermel CH, Hipp JD, Stumpe MC. Erratum: Publisher Correction: Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. NPJ Digit Med 2019;2:113. [PMID: 31754638 PMCID: PMC6864046 DOI: 10.1038/s41746-019-0196-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Nagpal K, Foote D, Liu Y, Chen PHC, Wulczyn E, Tan F, Olson N, Smith JL, Mohtashamian A, Wren JH, Corrado GS, MacDonald R, Peng LH, Amin MB, Evans AJ, Sangoi AR, Mermel CH, Hipp JD, Stumpe MC. Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. NPJ Digit Med 2019;2:48. [PMID: 31304394 PMCID: PMC6555810 DOI: 10.1038/s41746-019-0112-2] [Citation(s) in RCA: 167] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 04/15/2019] [Indexed: 12/20/2022] Open

Wulczyn E, West R, Zia L, Leskovec J. Growing Wikipedia Across Languages via Recommendation. Proc Int World Wide Web Conf 2016;2016:975-985. [PMID: 27819073 PMCID: PMC5092237 DOI: 10.1145/2872427.2883077] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]