1
|
Larsen SE, Abdelaal HFM, Plumlee CR, Cohen SB, Kim HD, Barrett HW, Liu Q, Harband MH, Berube BJ, Baldwin SL, Fortune SM, Urdahl KB, Coler RN. The chosen few: Mycobacterium tuberculosis isolates for IMPAc-TB. Front Immunol 2024; 15:1427510. [PMID: 39530100 PMCID: PMC11551615 DOI: 10.3389/fimmu.2024.1427510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 09/06/2024] [Indexed: 11/16/2024] Open
Abstract
The three programs that make up the Immune Mechanisms of Protection Against Mycobacterium tuberculosis Centers (IMPAc-TB) had to prioritize and select strains to be leveraged for this work. The CASCADE team based at Seattle Children's Research Institute are leveraging M.tb H37Rv, M.tb CDC1551, and M.tb SA161. The HI-IMPACT team based at Harvard T.H. Chan School of Public Health, Boston, have selected M.tb Erdman as well as a novel clinical isolate recently characterized during a longitudinal study in Peru. The PHOENIX team also based at Seattle Children's Research Institute have selected M.tb HN878 and M.tb Erdman as their isolates of choice. Here, we describe original source isolation, genomic references, key virulence characteristics, and relevant tools that make these isolates attractive for use. The global context for M.tb lineage 2 and 4 selection is reviewed including what is known about their relative abundance and acquisition of drug resistance. Host-pathogen interactions seem driven by genomic differences on each side, and these play an important role in pathogenesis and immunity. The few M.tb strains chosen for this work do not reflect the vast genomic diversity within this species. They do, however, provide specific virulence, pathology, and growth kinetics of interest to the consortium. The strains selected should not be considered as "representative" of the growing available array of M.tb isolates, but rather tools that are being used to address key outstanding questions in the field.
Collapse
Affiliation(s)
- Sasha E. Larsen
- Seattle Children’s Research Institute, Center for Global Infectious Disease Research, Seattle Children’s, Seattle, WA, United States
| | - Hazem F. M. Abdelaal
- Seattle Children’s Research Institute, Center for Global Infectious Disease Research, Seattle Children’s, Seattle, WA, United States
| | - Courtney R. Plumlee
- Seattle Children’s Research Institute, Center for Global Infectious Disease Research, Seattle Children’s, Seattle, WA, United States
| | - Sara B. Cohen
- Seattle Children’s Research Institute, Center for Global Infectious Disease Research, Seattle Children’s, Seattle, WA, United States
| | - Ho D. Kim
- Seattle Children’s Research Institute, Center for Global Infectious Disease Research, Seattle Children’s, Seattle, WA, United States
| | - Holly W. Barrett
- Seattle Children’s Research Institute, Center for Global Infectious Disease Research, Seattle Children’s, Seattle, WA, United States
- Department of Global Health, University of Washington, Seattle, WA, United States
| | - Qingyun Liu
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Matthew H. Harband
- Seattle Children’s Research Institute, Center for Global Infectious Disease Research, Seattle Children’s, Seattle, WA, United States
| | - Bryan J. Berube
- Seattle Children’s Research Institute, Center for Global Infectious Disease Research, Seattle Children’s, Seattle, WA, United States
| | - Susan L. Baldwin
- Seattle Children’s Research Institute, Center for Global Infectious Disease Research, Seattle Children’s, Seattle, WA, United States
| | - Sarah M. Fortune
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, MA, United States
- Broad Institute of Massachusetts Institute of Technology (MIT), and Harvard, Cambridge, MA, United States
| | - Kevin B. Urdahl
- Seattle Children’s Research Institute, Center for Global Infectious Disease Research, Seattle Children’s, Seattle, WA, United States
- Department of Immunology, University of Washington, Seattle, WA, United States
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, United States
| | - Rhea N. Coler
- Seattle Children’s Research Institute, Center for Global Infectious Disease Research, Seattle Children’s, Seattle, WA, United States
- Department of Global Health, University of Washington, Seattle, WA, United States
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, United States
| |
Collapse
|
2
|
Meade RK, Smith CM. Immunological roads diverged: mapping tuberculosis outcomes in mice. Trends Microbiol 2024:S0966-842X(24)00170-7. [PMID: 39034171 DOI: 10.1016/j.tim.2024.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/24/2024] [Accepted: 06/25/2024] [Indexed: 07/23/2024]
Abstract
The journey from phenotypic observation to causal genetic mechanism is a long and challenging road. For pathogens like Mycobacterium tuberculosis (Mtb), which causes tuberculosis (TB), host-pathogen coevolution has spanned millennia, costing millions of human lives. Mammalian models can systematically recapitulate host genetic variation, producing a spectrum of disease outcomes. Leveraging genome sequences and deep phenotyping data from infected mouse genetic reference populations (GRPs), quantitative trait locus (QTL) mapping approaches have successfully identified host genomic regions associated with TB phenotypes. Here, we review the ongoing optimization of QTL mapping study design alongside advances in mouse GRPs. These next-generation resources and approaches have enabled identification of novel host-pathogen interactions governing one of the most prevalent infectious diseases in the world today.
Collapse
Affiliation(s)
- Rachel K Meade
- Department of Molecular Genetics and Microbiology, Duke University, Durham, NC, USA; University Program in Genetics and Genomics, Duke University, Durham, NC, USA
| | - Clare M Smith
- Department of Molecular Genetics and Microbiology, Duke University, Durham, NC, USA; University Program in Genetics and Genomics, Duke University, Durham, NC, USA.
| |
Collapse
|
3
|
Cisternino F, Ometto S, Chatterjee S, Giacopuzzi E, Levine AP, Glastonbury CA. Self-supervised learning for characterising histomorphological diversity and spatial RNA expression prediction across 23 human tissue types. Nat Commun 2024; 15:5906. [PMID: 39003292 PMCID: PMC11246527 DOI: 10.1038/s41467-024-50317-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 07/04/2024] [Indexed: 07/15/2024] Open
Abstract
As vast histological archives are digitised, there is a pressing need to be able to associate specific tissue substructures and incident pathology to disease outcomes without arduous annotation. Here, we learn self-supervised representations using a Vision Transformer, trained on 1.7 M histology images across 23 healthy tissues in 838 donors from the Genotype Tissue Expression consortium (GTEx). Using these representations, we can automatically segment tissues into their constituent tissue substructures and pathology proportions across thousands of whole slide images, outperforming other self-supervised methods (43% increase in silhouette score). Additionally, we can detect and quantify histological pathologies present, such as arterial calcification (AUROC = 0.93) and identify missing calcification diagnoses. Finally, to link gene expression to tissue morphology, we introduce RNAPath, a set of models trained on 23 tissue types that can predict and spatially localise individual RNA expression levels directly from H&E histology (mean genes significantly regressed = 5156, FDR 1%). We validate RNAPath spatial predictions with matched ground truth immunohistochemistry for several well characterised control genes, recapitulating their known spatial specificity. Together, these results demonstrate how self-supervised machine learning when applied to vast histological archives allows researchers to answer questions about tissue pathology, its spatial organisation and the interplay between morphological tissue variability and gene expression.
Collapse
Affiliation(s)
| | - Sara Ometto
- Human Technopole, Viale Rita Levi-Montalcini 1, 20157, Milan, Italy
| | | | | | - Adam P Levine
- Research Department of Pathology, University College London, London, UK
| | | |
Collapse
|
4
|
Koyuncu D, Tavolara T, Gatti DM, Gower AC, Ginese ML, Kramnik I, Yener B, Sajjad U, Niazi MKK, Gurcan M, Alsharaydeh A, Beamer G. B cells in perivascular and peribronchiolar granuloma-associated lymphoid tissue and B-cell signatures identify asymptomatic Mycobacterium tuberculosis lung infection in Diversity Outbred mice. Infect Immun 2024; 92:e0026323. [PMID: 38899881 PMCID: PMC11238564 DOI: 10.1128/iai.00263-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 04/09/2024] [Indexed: 06/21/2024] Open
Abstract
Because most humans resist Mycobacterium tuberculosis infection, there is a paucity of lung samples to study. To address this gap, we infected Diversity Outbred mice with M. tuberculosis and studied the lungs of mice in different disease states. After a low-dose aerosol infection, progressors succumbed to acute, inflammatory lung disease within 60 days, while controllers maintained asymptomatic infection for at least 60 days, and then developed chronic pulmonary tuberculosis (TB) lasting months to more than 1 year. Here, we identified features of asymptomatic M. tuberculosis infection by applying computational and statistical approaches to multimodal data sets. Cytokines and anti-M. tuberculosis cell wall antibodies discriminated progressors vs controllers with chronic pulmonary TB but could not classify mice with asymptomatic infection. However, a novel deep-learning neural network trained on lung granuloma images was able to accurately classify asymptomatically infected lungs vs acute pulmonary TB in progressors vs chronic pulmonary TB in controllers, and discrimination was based on perivascular and peribronchiolar lymphocytes. Because the discriminatory lesion was rich in lymphocytes and CD4 T cell-mediated immunity is required for resistance, we expected CD4 T-cell genes would be elevated in asymptomatic infection. However, the significantly different, highly expressed genes were from B-cell pathways (e.g., Bank1, Cd19, Cd79, Fcmr, Ms4a1, Pax5, and H2-Ob), and CD20+ B cells were enriched in the perivascular and peribronchiolar regions of mice with asymptomatic M. tuberculosis infection. Together, these results indicate that genetically controlled B-cell responses are important for establishing asymptomatic M. tuberculosis lung infection.
Collapse
Affiliation(s)
- Deniz Koyuncu
- Rensselaer Polytechnic Institute, Troy, New York, USA
| | - Thomas Tavolara
- Wake Forest University, School of Medicine, Winston Salem, North Carolina, USA
| | | | - Adam C. Gower
- Boston University Clinical and Translational Science Institute, Boston, Massachusetts, USA
| | - Melanie L. Ginese
- Tufts University Cummings School of Veterinary Medicine, North Grafton, Massachusetts, USA
| | - Igor Kramnik
- NIEDL, Boston University, Boston, Massachusetts, USA
| | - Bülent Yener
- Rensselaer Polytechnic Institute, Troy, New York, USA
| | - Usama Sajjad
- Wake Forest University, School of Medicine, Winston Salem, North Carolina, USA
| | | | - Metin Gurcan
- Wake Forest University, School of Medicine, Winston Salem, North Carolina, USA
| | | | - Gillian Beamer
- Aiforia Inc., Cambridge, Massachusetts, USA
- Texas Biomedical Research Institute, San Antonio, Texas, USA
| |
Collapse
|
5
|
Gatti DM, Tyler AL, Mahoney JM, Churchill GA, Yener B, Koyuncu D, Gurcan MN, Niazi MKK, Tavolara T, Gower A, Dayao D, McGlone E, Ginese ML, Specht A, Alsharaydeh A, Tessier PA, Kurtz SL, Elkins KL, Kramnik I, Beamer G. Systems genetics uncover new loci containing functional gene candidates in Mycobacterium tuberculosis-infected Diversity Outbred mice. PLoS Pathog 2024; 20:e1011915. [PMID: 38861581 PMCID: PMC11195971 DOI: 10.1371/journal.ppat.1011915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 06/24/2024] [Accepted: 04/17/2024] [Indexed: 06/13/2024] Open
Abstract
Mycobacterium tuberculosis infects two billion people across the globe, and results in 8-9 million new tuberculosis (TB) cases and 1-1.5 million deaths each year. Most patients have no known genetic basis that predisposes them to disease. Here, we investigate the complex genetic basis of pulmonary TB by modelling human genetic diversity with the Diversity Outbred mouse population. When infected with M. tuberculosis, one-third develop early onset, rapidly progressive, necrotizing granulomas and succumb within 60 days. The remaining develop non-necrotizing granulomas and survive longer than 60 days. Genetic mapping using immune and inflammatory mediators; and clinical, microbiological, and granuloma correlates of disease identified five new loci on mouse chromosomes 1, 2, 4, 16; and three known loci on chromosomes 3 and 17. Further, multiple positively correlated traits shared loci on chromosomes 1, 16, and 17 and had similar patterns of allele effects, suggesting these loci contain critical genetic regulators of inflammatory responses to M. tuberculosis. To narrow the list of candidate genes, we used a machine learning strategy that integrated gene expression signatures from lungs of M. tuberculosis-infected Diversity Outbred mice with gene interaction networks to generate scores representing functional relationships. The scores were used to rank candidates for each mapped trait, resulting in 11 candidate genes: Ncf2, Fam20b, S100a8, S100a9, Itgb5, Fstl1, Zbtb20, Ddr1, Ier3, Vegfa, and Zfp318. Although all candidates have roles in infection, inflammation, cell migration, extracellular matrix remodeling, or intracellular signaling, and all contain single nucleotide polymorphisms (SNPs), SNPs in only four genes (S100a8, Itgb5, Fstl1, Zfp318) are predicted to have deleterious effects on protein functions. We performed methodological and candidate validations to (i) assess biological relevance of predicted allele effects by showing that Diversity Outbred mice carrying PWK/PhJ alleles at the H-2 locus on chromosome 17 QTL have shorter survival; (ii) confirm accuracy of predicted allele effects by quantifying S100A8 protein in inbred founder strains; and (iii) infection of C57BL/6 mice deficient for the S100a8 gene. Overall, this body of work demonstrates that systems genetics using Diversity Outbred mice can identify new (and known) QTLs and functionally relevant gene candidates that may be major regulators of complex host-pathogens interactions contributing to granuloma necrosis and acute inflammation in pulmonary TB.
Collapse
Affiliation(s)
- Daniel M. Gatti
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Anna L. Tyler
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | | | | | - Bulent Yener
- Rensselaer Polytechnic Institute, Troy, New York, United States of America
| | - Deniz Koyuncu
- Rensselaer Polytechnic Institute, Troy, New York, United States of America
| | - Metin N. Gurcan
- Wake Forest University School of Medicine, Winston Salem, North Carolina, United States of America
| | - MK Khalid Niazi
- Wake Forest University School of Medicine, Winston Salem, North Carolina, United States of America
| | - Thomas Tavolara
- Wake Forest University School of Medicine, Winston Salem, North Carolina, United States of America
| | - Adam Gower
- Clinical and Translational Science Institute, Boston University, Boston, Massachusetts, United States of America
| | - Denise Dayao
- Tufts University Cummings School of Veterinary Medicine, North Grafton, Massachusetts, United States of America
| | - Emily McGlone
- Tufts University Cummings School of Veterinary Medicine, North Grafton, Massachusetts, United States of America
| | - Melanie L. Ginese
- Tufts University Cummings School of Veterinary Medicine, North Grafton, Massachusetts, United States of America
| | - Aubrey Specht
- Tufts University Cummings School of Veterinary Medicine, North Grafton, Massachusetts, United States of America
| | - Anas Alsharaydeh
- Texas Biomedical Research Institute, San Antonio, Texas, United States of America
| | - Philipe A. Tessier
- Department of Microbiology and Immunology, Laval University School of Medicine, Quebec, Canada
| | - Sherry L. Kurtz
- Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, United States of America
| | - Karen L. Elkins
- Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, United States of America
| | - Igor Kramnik
- National Emerging Infectious Diseases Laboratories, Boston University, Boston, Massachusetts, United States of America
| | - Gillian Beamer
- Texas Biomedical Research Institute, San Antonio, Texas, United States of America
| |
Collapse
|
6
|
Sajjad U, Chen W, Rezapour M, Su Z, Tavolara T, Frankel WL, Gurcan MN, Niazi MKK. Enhancing Colorectal Cancer Tumor Bud Detection Using Deep Learning from Routine H&E-Stained Slides. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2024; 12933:129330T. [PMID: 38752165 PMCID: PMC11095418 DOI: 10.1117/12.3006796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2024]
Abstract
Tumor budding refers to a cluster of one to four tumor cells located at the tumor-invasive front. While tumor budding is a prognostic factor for colorectal cancer, counting and grading tumor budding are time consuming and not highly reproducible. There could be high inter- and intra-reader disagreement on H&E evaluation. This leads to the noisy training (imperfect ground truth) of deep learning algorithms, resulting in high variability and losing their ability to generalize on unseen datasets. Pan-cytokeratin staining is one of the potential solutions to enhance the agreement, but it is not routinely used to identify tumor buds and can lead to false positives. Therefore, we aim to develop a weakly-supervised deep learning method for tumor bud detection from routine H&E-stained images that does not require strict tissue-level annotations. We also propose Bayesian Multiple Instance Learning (BMIL) that combines multiple annotated regions during the training process to further enhance the generalizability and stability in tumor bud detection. Our dataset consists of 29 colorectal cancer H&E-stained images that contain 115 tumor buds per slide on average. In six-fold cross-validation, our method demonstrated an average precision and recall of 0.94, and 0.86 respectively. These results provide preliminary evidence of the feasibility of our approach in improving the generalizability in tumor budding detection using H&E images while avoiding the need for non-routine immunohistochemical staining methods.
Collapse
Affiliation(s)
- Usama Sajjad
- Center for Artificial Intelligence Research, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Wei Chen
- Department of Pathology, The Ohio State University, Columbus, OH, USA
| | - Mostafa Rezapour
- Center for Artificial Intelligence Research, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Ziyu Su
- Center for Artificial Intelligence Research, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Thomas Tavolara
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Wendy L. Frankel
- Department of Pathology, The Ohio State University, Columbus, OH, USA
| | - Metin N. Gurcan
- Center for Artificial Intelligence Research, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - M. Khalid Khan Niazi
- Center for Artificial Intelligence Research, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| |
Collapse
|
7
|
Acharya V, Choi D, Yener B, Beamer G. Prediction of Tuberculosis From Lung Tissue Images of Diversity Outbred Mice Using Jump Knowledge Based Cell Graph Neural Network. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2024; 12:17164-17194. [PMID: 38515959 PMCID: PMC10956573 DOI: 10.1109/access.2024.3359989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Tuberculosis (TB), primarily affecting the lungs, is caused by the bacterium Mycobacterium tuberculosis and poses a significant health risk. Detecting acid-fast bacilli (AFB) in stained samples is critical for TB diagnosis. Whole Slide (WS) Imaging allows for digitally examining these stained samples. However, current deep-learning approaches to analyzing large-sized whole slide images (WSIs) often employ patch-wise analysis, potentially missing the complex spatial patterns observed in the granuloma essential for accurate TB classification. To address this limitation, we propose an approach that models cell characteristics and interactions as a graph, capturing both cell-level information and the overall tissue micro-architecture. This method differs from the strategies in related cell graph-based works that rely on edge thresholds based on sparsity/density in cell graph construction, emphasizing a biologically informed threshold determination instead. We introduce a cell graph-based jumping knowledge neural network (CG-JKNN) that operates on the cell graphs where the edge thresholds are selected based on the length of the mycobacteria's cords and the activated macrophage nucleus's size to reflect the actual biological interactions observed in the tissue. The primary process involves training a Convolutional Neural Network (CNN) to segment AFBs and macrophage nuclei, followed by converting large (42831*41159 pixels) lung histology images into cell graphs where an activated macrophage nucleus/AFB represents each node within the graph and their interactions are denoted as edges. To enhance the interpretability of our model, we employ Integrated Gradients and Shapely Additive Explanations (SHAP). Our analysis incorporated a combination of 33 graph metrics and 20 cell morphology features. In terms of traditional machine learning models, Extreme Gradient Boosting (XGBoost) was the best performer, achieving an F1 score of 0.9813 and an Area under the Precision-Recall Curve (AUPRC) of 0.9848 on the test set. Among graph-based models, our CG-JKNN was the top performer, attaining an F1 score of 0.9549 and an AUPRC of 0.9846 on the held-out test set. The integration of graph-based and morphological features proved highly effective, with CG-JKNN and XGBoost showing promising results in classifying instances into AFB and activated macrophage nucleus. The features identified as significant by our models closely align with the criteria used by pathologists in practice, highlighting the clinical applicability of our approach. Future work will explore knowledge distillation techniques and graph-level classification into distinct TB progression categories.
Collapse
Affiliation(s)
| | - Diana Choi
- Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA 02155, USA
| | - BüLENT Yener
- Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Gillian Beamer
- Research Pathology, Aiforia Technologies, Cambridge, MA 02142, USA
- Texas Biomedical Research Institute, San Antonio, TX 78227, USA
| |
Collapse
|
8
|
Tavolara TE, Niazi MKK, Feldman AL, Jaye DL, Flowers C, Cooper LAD, Gurcan MN. Translating prognostic quantification of c-MYC and BCL2 from tissue microarrays to whole slide images in diffuse large B-cell lymphoma using deep learning. Diagn Pathol 2024; 19:17. [PMID: 38243330 PMCID: PMC10797911 DOI: 10.1186/s13000-023-01425-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 12/04/2023] [Indexed: 01/21/2024] Open
Abstract
BACKGROUND c-MYC and BCL2 positivity are important prognostic factors for diffuse large B-cell lymphoma. However, manual quantification is subject to significant intra- and inter-observer variability. We developed an automated method for quantification in whole-slide images of tissue sections where manual quantification requires evaluating large areas of tissue with possibly heterogeneous staining. We train this method using annotations of tumor positivity in smaller tissue microarray cores where expression and staining are more homogeneous and then translate this model to whole-slide images. METHODS Our method applies a technique called attention-based multiple instance learning to regress the proportion of c-MYC-positive and BCL2-positive tumor cells from pathologist-scored tissue microarray cores. This technique does not require annotation of individual cell nuclei and is trained instead on core-level annotations of percent tumor positivity. We translate this model to scoring of whole-slide images by tessellating the slide into smaller core-sized tissue regions and calculating an aggregate score. Our method was trained on a public tissue microarray dataset from Stanford and applied to whole-slide images from a geographically diverse multi-center cohort produced by the Lymphoma Epidemiology of Outcomes study. RESULTS In tissue microarrays, the automated method had Pearson correlations of 0.843 and 0.919 with pathologist scores for c-MYC and BCL2, respectively. When utilizing standard clinical thresholds, the sensitivity/specificity of our method was 0.743 / 0.963 for c-MYC and 0.938 / 0.951 for BCL2. For double-expressors, sensitivity and specificity were 0.720 and 0.974. When translated to the external WSI dataset scored by two pathologists, Pearson correlation was 0.753 & 0.883 for c-MYC and 0.749 & 0.765 for BCL2, and sensitivity/specificity was 0.857/0.991 & 0.706/0.930 for c-MYC, 0.856/0.719 & 0.855/0.690 for BCL2, and 0.890/1.00 & 0.598/0.952 for double-expressors. Survival analysis demonstrates that for progression-free survival, model-predicted TMA scores significantly stratify double-expressors and non double-expressors (p = 0.0345), whereas pathologist scores do not (p = 0.128). CONCLUSIONS We conclude that proportion of positive stains can be regressed using attention-based multiple instance learning, that these models generalize well to whole slide images, and that our models can provide non-inferior stratification of progression-free survival outcomes.
Collapse
Affiliation(s)
- Thomas E Tavolara
- Center for Artificial Intelligence Research, Wake Forest University School of Medicine, Winston-Salem, NC, USA.
| | - M Khalid Khan Niazi
- Center for Artificial Intelligence Research, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Andrew L Feldman
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - David L Jaye
- Department of Pathology and Laboratory Medicine, Emory University School of Medicine, Atlanta, GA, USA
| | - Christopher Flowers
- Department of Lymphoma/Myeloma, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Lee A D Cooper
- Department of Pathology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Metin N Gurcan
- Center for Artificial Intelligence Research, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| |
Collapse
|
9
|
Gatti DM, Tyler AL, Mahoney JM, Churchill GA, Yener B, Koyuncu D, Gurcan MN, Niazi M, Tavolara T, Gower AC, Dayao D, McGlone E, Ginese ML, Specht A, Alsharaydeh A, Tessier PA, Kurtz SL, Elkins K, Kramnik I, Beamer G. Systems genetics uncover new loci containing functional gene candidates in Mycobacterium tuberculosis-infected Diversity Outbred mice. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.21.572738. [PMID: 38187647 PMCID: PMC10769337 DOI: 10.1101/2023.12.21.572738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Mycobacterium tuberculosis, the bacillus that causes tuberculosis (TB), infects 2 billion people across the globe, and results in 8-9 million new TB cases and 1-1.5 million deaths each year. Most patients have no known genetic basis that predisposes them to disease. We investigated the complex genetic basis of pulmonary TB by modelling human genetic diversity with the Diversity Outbred mouse population. When infected with M. tuberculosis, one-third develop early onset, rapidly progressive, necrotizing granulomas and succumb within 60 days. The remaining develop non-necrotizing granulomas and survive longer than 60 days. Genetic mapping using clinical indicators of disease, granuloma histopathological features, and immune response traits identified five new loci on mouse chromosomes 1, 2, 4, 16 and three previously identified loci on chromosomes 3 and 17. Quantitative trait loci (QTLs) on chromosomes 1, 16, and 17, associated with multiple correlated traits and had similar patterns of allele effects, suggesting these QTLs contain important genetic regulators of responses to M. tuberculosis. To narrow the list of candidate genes in QTLs, we used a machine learning strategy that integrated gene expression signatures from lungs of M. tuberculosis-infected Diversity Outbred mice with gene interaction networks, generating functional scores. The scores were then used to rank candidates for each mapped trait in each locus, resulting in 11 candidates: Ncf2, Fam20b, S100a8, S100a9, Itgb5, Fstl1, Zbtb20, Ddr1, Ier3, Vegfa, and Zfp318. Importantly, all 11 candidates have roles in infection, inflammation, cell migration, extracellular matrix remodeling, or intracellular signaling. Further, all candidates contain single nucleotide polymorphisms (SNPs), and some but not all SNPs were predicted to have deleterious consequences on protein functions. Multiple methods were used for validation including (i) a statistical method that showed Diversity Outbred mice carrying PWH/PhJ alleles on chromosome 17 QTL have shorter survival; (ii) quantification of S100A8 protein levels, confirming predicted allele effects; and (iii) infection of C57BL/6 mice deficient for the S100a8 gene. Overall, this work demonstrates that systems genetics using Diversity Outbred mice can identify new (and known) QTLs and new functionally relevant gene candidates that may be major regulators of granuloma necrosis and acute inflammation in pulmonary TB.
Collapse
Affiliation(s)
- D M Gatti
- The Jackson Laboratory, Bar Harbor, ME
| | - A L Tyler
- The Jackson Laboratory, Bar Harbor, ME
| | | | | | - B Yener
- Rensselaer Polytechnic Institute, Troy, NY
| | - D Koyuncu
- Rensselaer Polytechnic Institute, Troy, NY
| | - M N Gurcan
- Wake Forest University School of Medicine, Winston Salem, NC
| | - Mkk Niazi
- Wake Forest University School of Medicine, Winston Salem, NC
| | - T Tavolara
- Wake Forest University School of Medicine, Winston Salem, NC
| | - A C Gower
- Clinical and Translational Science Institute, Boston University, Boston, MA
| | - D Dayao
- Tufts University Cummings School of Veterinary Medicine, North Grafton, MA
| | - E McGlone
- Tufts University Cummings School of Veterinary Medicine, North Grafton, MA
| | - M L Ginese
- Tufts University Cummings School of Veterinary Medicine, North Grafton, MA
| | - A Specht
- Tufts University Cummings School of Veterinary Medicine, North Grafton, MA
| | - A Alsharaydeh
- Texas Biomedical Research Institute, San Antonio, TX
| | - P A Tessier
- Department of Microbiology and Immunology, Laval University School of Medicine, Quebec, Canada
| | - S L Kurtz
- Center for Biologics, Food and Drug Administration, Bethesda, MD
| | - K Elkins
- Center for Biologics, Food and Drug Administration, Bethesda, MD
| | - I Kramnik
- NIEDL, Boston University, Boston, MA
| | - G Beamer
- Texas Biomedical Research Institute, San Antonio, TX
| |
Collapse
|
10
|
Mahdi-Esferizi R, Haji Molla Hoseyni B, Mehrpanah A, Golzade Y, Najafi A, Elahian F, Zadeh Shirazi A, Gomez GA, Tahmasebian S. DeeP4med: deep learning for P4 medicine to predict normal and cancer transcriptome in multiple human tissues. BMC Bioinformatics 2023; 24:275. [PMID: 37403016 DOI: 10.1186/s12859-023-05400-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Accepted: 06/25/2023] [Indexed: 07/06/2023] Open
Abstract
BACKGROUND P4 medicine (predict, prevent, personalize, and participate) is a new approach to diagnosing and predicting diseases on a patient-by-patient basis. For the prevention and treatment of diseases, prediction plays a fundamental role. One of the intelligent strategies is the design of deep learning models that can predict the state of the disease using gene expression data. RESULTS We create an autoencoder deep learning model called DeeP4med, including a Classifier and a Transferor that predicts cancer's gene expression (mRNA) matrix from its matched normal sample and vice versa. The range of the F1 score of the model, depending on tissue type in the Classifier, is from 0.935 to 0.999 and in Transferor from 0.944 to 0.999. The accuracy of DeeP4med for tissue and disease classification was 0.986 and 0.992, respectively, which performed better compared to seven classic machine learning models (Support Vector Classifier, Logistic Regression, Linear Discriminant Analysis, Naive Bayes, Decision Tree, Random Forest, K Nearest Neighbors). CONCLUSIONS Based on the idea of DeeP4med, by having the gene expression matrix of a normal tissue, we can predict its tumor gene expression matrix and, in this way, find effective genes in transforming a normal tissue into a tumor tissue. Results of Differentially Expressed Genes (DEGs) and enrichment analysis on the predicted matrices for 13 types of cancer showed a good correlation with the literature and biological databases. This led that by using the gene expression matrix, to train the model with features of each person in a normal and cancer state, this model could predict diagnosis based on gene expression data from healthy tissue and be used to identify possible therapeutic interventions for those patients.
Collapse
Affiliation(s)
- Roohallah Mahdi-Esferizi
- Department of Medical Biotechnology, School of Advanced Technologies, Shahrekord University of Medical Sciences, Shahrekord, Iran
| | | | - Amir Mehrpanah
- Faculty of Mathematics, Shahid Beheshti University, Tehran, Iran
| | - Yazdan Golzade
- Department of Mathematics, Faculty of Basic Sciences, Iran University of Science and Technology,(IUST), Tehran, Iran
| | - Ali Najafi
- Molecular Biology Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Fatemeh Elahian
- Department of Medical Biotechnology, School of Advanced Technologies, Shahrekord University of Medical Sciences, Shahrekord, Iran
| | - Amin Zadeh Shirazi
- Centre for Cancer Biology, SA Pathology and University of South Australia, Adelaide, SA, 5000, Australia
| | - Guillermo A Gomez
- Centre for Cancer Biology, SA Pathology and University of South Australia, Adelaide, SA, 5000, Australia
| | - Shahram Tahmasebian
- Cellular and Molecular Research Center, Basic Health Sciences Institute, Shahrekord University of Medical Sciences, Shahrekord, Iran.
| |
Collapse
|
11
|
Mondol RK, Millar EKA, Graham PH, Browne L, Sowmya A, Meijering E. hist2RNA: An Efficient Deep Learning Architecture to Predict Gene Expression from Breast Cancer Histopathology Images. Cancers (Basel) 2023; 15:2569. [PMID: 37174035 PMCID: PMC10177559 DOI: 10.3390/cancers15092569] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 04/23/2023] [Accepted: 04/28/2023] [Indexed: 05/15/2023] Open
Abstract
Gene expression can be used to subtype breast cancer with improved prediction of risk of recurrence and treatment responsiveness over that obtained using routine immunohistochemistry (IHC). However, in the clinic, molecular profiling is primarily used for ER+ breast cancer, which is costly, tissue destructive, requires specialised platforms, and takes several weeks to obtain a result. Deep learning algorithms can effectively extract morphological patterns in digital histopathology images to predict molecular phenotypes quickly and cost-effectively. We propose a new, computationally efficient approach called hist2RNA inspired by bulk RNA sequencing techniques to predict the expression of 138 genes (incorporated from 6 commercially available molecular profiling tests), including luminal PAM50 subtype, from hematoxylin and eosin (H&E)-stained whole slide images (WSIs). The training phase involves the aggregation of extracted features for each patient from a pretrained model to predict gene expression at the patient level using annotated H&E images from The Cancer Genome Atlas (TCGA, n = 335). We demonstrate successful gene prediction on a held-out test set (n = 160, corr = 0.82 across patients, corr = 0.29 across genes) and perform exploratory analysis on an external tissue microarray (TMA) dataset (n = 498) with known IHC and survival information. Our model is able to predict gene expression and luminal PAM50 subtype (Luminal A versus Luminal B) on the TMA dataset with prognostic significance for overall survival in univariate analysis (c-index = 0.56, hazard ratio = 2.16 (95% CI 1.12-3.06), p < 5 × 10-3), and independent significance in multivariate analysis incorporating standard clinicopathological variables (c-index = 0.65, hazard ratio = 1.87 (95% CI 1.30-2.68), p < 5 × 10-3). The proposed strategy achieves superior performance while requiring less training time, resulting in less energy consumption and computational cost compared to patch-based models. Additionally, hist2RNA predicts gene expression that has potential to determine luminal molecular subtypes which correlates with overall survival, without the need for expensive molecular testing.
Collapse
Affiliation(s)
- Raktim Kumar Mondol
- School of Computer Science and Engineering, UNSW Sydney, Kensington, NSW 2052, Australia; (R.K.M.); (A.S.)
| | - Ewan K. A. Millar
- Department of Anatomical Pathology, NSW Health Pathology, St. George Hospital, Kogarah, NSW 2217, Australia;
- St. George and Sutherland Clinical School, UNSW Sydney, Kensington, NSW 2052, Australia;
- Faculty of Medicine and Health Sciences, Sydney Western University, Campbelltown, NSW 2560, Australia
- University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Peter H. Graham
- St. George and Sutherland Clinical School, UNSW Sydney, Kensington, NSW 2052, Australia;
- Cancer Care Centre, St George Hospital, Sydney, NSW 2217, Australia;
| | - Lois Browne
- Cancer Care Centre, St George Hospital, Sydney, NSW 2217, Australia;
| | - Arcot Sowmya
- School of Computer Science and Engineering, UNSW Sydney, Kensington, NSW 2052, Australia; (R.K.M.); (A.S.)
| | - Erik Meijering
- School of Computer Science and Engineering, UNSW Sydney, Kensington, NSW 2052, Australia; (R.K.M.); (A.S.)
| |
Collapse
|
12
|
Su Z, Niazi MKK, Tavolara TE, Niu S, Tozbikian GH, Wesolowski R, Gurcan MN. BCR-Net: A deep learning framework to predict breast cancer recurrence from histopathology images. PLoS One 2023; 18:e0283562. [PMID: 37014891 PMCID: PMC10072418 DOI: 10.1371/journal.pone.0283562] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 03/10/2023] [Indexed: 04/05/2023] Open
Abstract
Breast cancer is the most common malignancy in women, with over 40,000 deaths annually in the United States alone. Clinicians often rely on the breast cancer recurrence score, Oncotype DX (ODX), for risk stratification of breast cancer patients, by using ODX as a guide for personalized therapy. However, ODX and similar gene assays are expensive, time-consuming, and tissue destructive. Therefore, developing an AI-based ODX prediction model that identifies patients who will benefit from chemotherapy in the same way that ODX does would give a low-cost alternative to the genomic test. To overcome this problem, we developed a deep learning framework, Breast Cancer Recurrence Network (BCR-Net), which automatically predicts ODX recurrence risk from histopathology slides. Our proposed framework has two steps. First, it intelligently samples discriminative features from whole-slide histopathology images of breast cancer patients. Then, it automatically weights all features through a multiple instance learning model to predict the recurrence score at the slide level. On a dataset of H&E and Ki67 breast cancer resection whole slides images (WSIs) from 99 anonymized patients, the proposed framework achieved an overall AUC of 0.775 (68.9% and 71.1% accuracies for low and high risk) on H&E WSIs and overall AUC of 0.811 (80.8% and 79.2% accuracies for low and high risk) on Ki67 WSIs of breast cancer patients. Our findings provide strong evidence for automatically risk-stratify patients with a high degree of confidence. Our experiments reveal that the BCR-Net outperforms the state-of-the-art WSI classification models. Moreover, BCR-Net is highly efficient with low computational needs, making it practical to deploy in limited computational settings.
Collapse
Affiliation(s)
- Ziyu Su
- Center for Biomedical Informatics, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Muhammad Khalid Khan Niazi
- Center for Biomedical Informatics, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Thomas E. Tavolara
- Center for Biomedical Informatics, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Shuo Niu
- Department of Pathology, Wake Forest University School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Gary H. Tozbikian
- Department of Pathology, The Ohio State University, Columbus, Ohio, United States of America
| | - Robert Wesolowski
- Comprehensive Cancer Center, The Ohio State University College of Medicine, Columbus, Ohio, United States of America
| | - Metin N. Gurcan
- Center for Biomedical Informatics, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| |
Collapse
|
13
|
Alsaafin A, Safarpoor A, Sikaroudi M, Hipp JD, Tizhoosh HR. Learning to predict RNA sequence expressions from whole slide images with applications for search and classification. Commun Biol 2023; 6:304. [PMID: 36949169 PMCID: PMC10033650 DOI: 10.1038/s42003-023-04583-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 02/13/2023] [Indexed: 03/24/2023] Open
Abstract
Deep learning methods are widely applied in digital pathology to address clinical challenges such as prognosis and diagnosis. As one of the most recent applications, deep models have also been used to extract molecular features from whole slide images. Although molecular tests carry rich information, they are often expensive, time-consuming, and require additional tissue to sample. In this paper, we propose tRNAsformer, an attention-based topology that can learn both to predict the bulk RNA-seq from an image and represent the whole slide image of a glass slide simultaneously. The tRNAsformer uses multiple instance learning to solve a weakly supervised problem while the pixel-level annotation is not available for an image. We conducted several experiments and achieved better performance and faster convergence in comparison to the state-of-the-art algorithms. The proposed tRNAsformer can assist as a computational pathology tool to facilitate a new generation of search and classification methods by combining the tissue morphology and the molecular fingerprint of the biopsy samples.
Collapse
Affiliation(s)
- Areej Alsaafin
- Rhazes Lab, Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
- Kimia Lab, University of Waterloo, Waterloo, ON, Canada
| | | | | | - Jason D Hipp
- Division of Computational Pathology and AI, Mayo Clinic, Rochester, MN, USA
| | - H R Tizhoosh
- Rhazes Lab, Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA.
- Kimia Lab, University of Waterloo, Waterloo, ON, Canada.
| |
Collapse
|
14
|
Couture HD. Deep Learning-Based Prediction of Molecular Tumor Biomarkers from H&E: A Practical Review. J Pers Med 2022; 12:2022. [PMID: 36556243 PMCID: PMC9784641 DOI: 10.3390/jpm12122022] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/26/2022] [Accepted: 12/05/2022] [Indexed: 12/12/2022] Open
Abstract
Molecular and genomic properties are critical in selecting cancer treatments to target individual tumors, particularly for immunotherapy. However, the methods to assess such properties are expensive, time-consuming, and often not routinely performed. Applying machine learning to H&E images can provide a more cost-effective screening method. Dozens of studies over the last few years have demonstrated that a variety of molecular biomarkers can be predicted from H&E alone using the advancements of deep learning: molecular alterations, genomic subtypes, protein biomarkers, and even the presence of viruses. This article reviews the diverse applications across cancer types and the methodology to train and validate these models on whole slide images. From bottom-up to pathologist-driven to hybrid approaches, the leading trends include a variety of weakly supervised deep learning-based approaches, as well as mechanisms for training strongly supervised models in select situations. While results of these algorithms look promising, some challenges still persist, including small training sets, rigorous validation, and model explainability. Biomarker prediction models may yield a screening method to determine when to run molecular tests or an alternative when molecular tests are not possible. They also create new opportunities in quantifying intratumoral heterogeneity and predicting patient outcomes.
Collapse
|
15
|
Tavolara TE, Gurcan MN, Niazi MKK. Contrastive Multiple Instance Learning: An Unsupervised Framework for Learning Slide-Level Representations of Whole Slide Histopathology Images without Labels. Cancers (Basel) 2022; 14:5778. [PMID: 36497258 PMCID: PMC9738801 DOI: 10.3390/cancers14235778] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 11/16/2022] [Accepted: 11/19/2022] [Indexed: 11/25/2022] Open
Abstract
Recent methods in computational pathology have trended towards semi- and weakly-supervised methods requiring only slide-level labels. Yet, even slide-level labels may be absent or irrelevant to the application of interest, such as in clinical trials. Hence, we present a fully unsupervised method to learn meaningful, compact representations of WSIs. Our method initially trains a tile-wise encoder using SimCLR, from which subsets of tile-wise embeddings are extracted and fused via an attention-based multiple-instance learning framework to yield slide-level representations. The resulting set of intra-slide-level and inter-slide-level embeddings are attracted and repelled via contrastive loss, respectively. This resulted in slide-level representations with self-supervision. We applied our method to two tasks- (1) non-small cell lung cancer subtyping (NSCLC) as a classification prototype and (2) breast cancer proliferation scoring (TUPAC16) as a regression prototype-and achieved an AUC of 0.8641 ± 0.0115 and correlation (R2) of 0.5740 ± 0.0970, respectively. Ablation experiments demonstrate that the resulting unsupervised slide-level feature space can be fine-tuned with small datasets for both tasks. Overall, our method approaches computational pathology in a novel manner, where meaningful features can be learned from whole-slide images without the need for annotations of slide-level labels. The proposed method stands to benefit computational pathology, as it theoretically enables researchers to benefit from completely unlabeled whole-slide images.
Collapse
Affiliation(s)
- Thomas E. Tavolara
- Center for Biomedical Informatics, Wake Forest School of Medicine, Winston-Salem, NC 27101, USA
| | | | | |
Collapse
|
16
|
Qiao Y, Zhao L, Luo C, Luo Y, Wu Y, Li S, Bu D, Zhao Y. Multi-modality artificial intelligence in digital pathology. Brief Bioinform 2022; 23:6702380. [PMID: 36124675 PMCID: PMC9677480 DOI: 10.1093/bib/bbac367] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/27/2022] [Accepted: 08/05/2022] [Indexed: 12/14/2022] Open
Abstract
In common medical procedures, the time-consuming and expensive nature of obtaining test results plagues doctors and patients. Digital pathology research allows using computational technologies to manage data, presenting an opportunity to improve the efficiency of diagnosis and treatment. Artificial intelligence (AI) has a great advantage in the data analytics phase. Extensive research has shown that AI algorithms can produce more up-to-date and standardized conclusions for whole slide images. In conjunction with the development of high-throughput sequencing technologies, algorithms can integrate and analyze data from multiple modalities to explore the correspondence between morphological features and gene expression. This review investigates using the most popular image data, hematoxylin-eosin stained tissue slide images, to find a strategic solution for the imbalance of healthcare resources. The article focuses on the role that the development of deep learning technology has in assisting doctors' work and discusses the opportunities and challenges of AI.
Collapse
Affiliation(s)
- Yixuan Qiao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lianhe Zhao
- Corresponding authors: Yi Zhao, Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences; Shandong First Medical University & Shandong Academy of Medical Sciences. Tel.: +86 10 6260 0822; Fax: +86 10 6260 1356; E-mail: ; Lianhe Zhao, Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences. Tel.: +86 18513983324; E-mail:
| | - Chunlong Luo
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yufan Luo
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yang Wu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Shengtong Li
- Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Dechao Bu
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Yi Zhao
- Corresponding authors: Yi Zhao, Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences; Shandong First Medical University & Shandong Academy of Medical Sciences. Tel.: +86 10 6260 0822; Fax: +86 10 6260 1356; E-mail: ; Lianhe Zhao, Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences. Tel.: +86 18513983324; E-mail:
| |
Collapse
|
17
|
Hackett J, Gibson H, Frelinger J, Buntzman A. Using the Collaborative Cross and Diversity Outbred Mice in Immunology. Curr Protoc 2022; 2:e547. [PMID: 36066328 PMCID: PMC9612550 DOI: 10.1002/cpz1.547] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The Collaborative Cross (CC) and the Diversity Outbred (DO) stock mouse panels are the most powerful murine genetics tools available to the genetics community. Together, they combine the strength of inbred animal models with the diversity of outbred populations. Using the 63 CC strains or a panel of DO mice, each derived from the same 8 parental mouse strains, researchers can map genetic contributions to exceptionally complex immunological and infectious disease traits that would require far greater powering if performed by genome-wide association studies (GWAS) in human populations. These tools allow genes to be studied in heterozygous and homozygous states and provide a platform to study epistasis between interacting loci. Most importantly, once a quantitative phenotype is investigated and quantitative trait loci are identified, confirmatory genetic studies can be performed, which is often problematic using the GWAS approach. In addition, novel stable mouse models for immune phenotypes are often derived from studies utilizing the DO and CC mice that can serve as stronger model systems than existing ones in the field. The CC/DO systems have contributed to the fields of cancer immunology, autoimmunity, vaccinology, infectious disease, allergy, tissue rejection, and tolerance but have thus far been greatly underutilized. In this article, we present a recent review of the field and point out key areas of immunology that are ripe for further investigation and awaiting new CC/DO research projects. We also highlight some of the strong computational tools that have been developed for analyzing CC/DO genetic and phenotypic data. Additionally, we have formed a centralized community on the CyVerse infrastructure where immunogeneticists can utilize those software tools, collaborate with groups across the world, and expand the use of the CC and DO systems for investigating immunogenetic phenomena. © 2022 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Justin Hackett
- Barbara Ann Karmanos Cancer Institute, Hudson-Webber Cancer Research Center, Detroit, Michigan
| | - Heather Gibson
- Barbara Ann Karmanos Cancer Institute, Hudson-Webber Cancer Research Center, Detroit, Michigan
| | - Jeffrey Frelinger
- University of Arizona, Valley Fever Center for Excellence, Tucson, Arizona
- Department of Microbiology and Immunology, University of North Carolina System, Chapel Hill, North Carolina
| | - Adam Buntzman
- University of Arizona, BIO5 Institute, Valley Fever Center for Excellence, Tucson, Arizona
| |
Collapse
|
18
|
Arlova A, Jin C, Wong-Rolle A, Chen ES, Lisle C, Brown GT, Lay N, Choyke PL, Turkbey B, Harmon S, Zhao C. Artificial Intelligence-based Tumor Segmentation in Mouse Models of Lung Adenocarcinoma. J Pathol Inform 2022; 13:100007. [PMID: 35242446 PMCID: PMC8860735 DOI: 10.1016/j.jpi.2022.100007] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 12/14/2021] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Mouse models are highly effective for studying the pathophysiology of lung adenocarcinoma and evaluating new treatment strategies. Treatment efficacy is primarily determined by the total tumor burden measured on excised tumor specimens. The measurement process is time-consuming and prone to human errors. To address this issue, we developed a novel deep learning model to segment lung tumor foci on digitally scanned hematoxylin and eosin (H&E) histology slides. METHODS Digital slides of 239 mice from 9 experimental cohorts were split into training (n=137), validation (n=37), and testing cohorts (n=65). Image patches of 500×500 pixels were extracted from 5× and 10× magnifications, along with binary masks of expert annotations representing ground-truth tumor regions. Deep learning models utilizing DeepLabV3+ and UNet architectures were trained for binary segmentation of tumor foci under varying stain normalization conditions. The performance of algorithm segmentation was assessed by Dice Coefficient, and detection was evaluated by sensitivity and positive-predictive value (PPV). RESULTS The best model on patch-based validation was DeepLabV3+ using a Resnet-50 backbone, which achieved Dice 0.890 and 0.873 on validation and testing cohort, respectively. This result corresponded to 91.3 Sensitivity and 51.0 PPV in the validation cohort and 93.7 Sensitivity and 51.4 PPV in the testing cohort. False positives could be reduced 10-fold with thresholding artificial intelligence (AI) predicted output by area, without negative impact on Dice Coefficient. Evaluation at various stain normalization strategies did not demonstrate improvement from the baseline model. CONCLUSIONS A robust AI-based algorithm for detecting and segmenting lung tumor foci in the pre-clinical mouse models was developed. The output of this algorithm is compatible with open-source software that researchers commonly use.
Collapse
Affiliation(s)
- Alena Arlova
- Artificial Intelligence Resource, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Chengcheng Jin
- Department of Cancer Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Abigail Wong-Rolle
- Thoracic and GI Malignancies Branch, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Eric S. Chen
- Department of Cancer Biology, University of Pennsylvania, Philadelphia, PA, USA
| | | | - G. Thomas Brown
- Artificial Intelligence Resource, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Nathan Lay
- Artificial Intelligence Resource, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Peter L. Choyke
- Artificial Intelligence Resource, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Baris Turkbey
- Artificial Intelligence Resource, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Stephanie Harmon
- Artificial Intelligence Resource, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Chen Zhao
- Thoracic and GI Malignancies Branch, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| |
Collapse
|
19
|
Phan NN, Huang CC, Tseng LM, Chuang EY. Predicting Breast Cancer Gene Expression Signature by Applying Deep Convolutional Neural Networks From Unannotated Pathological Images. Front Oncol 2021; 11:769447. [PMID: 34926274 PMCID: PMC8673486 DOI: 10.3389/fonc.2021.769447] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 10/29/2021] [Indexed: 01/16/2023] Open
Abstract
We proposed a highly versatile two-step transfer learning pipeline for predicting the gene signature defining the intrinsic breast cancer subtypes using unannotated pathological images. Deciphering breast cancer molecular subtypes by deep learning approaches could provide a convenient and efficient method for the diagnosis of breast cancer patients. It could reduce costs associated with transcriptional profiling and subtyping discrepancy between IHC assays and mRNA expression. Four pretrained models such as VGG16, ResNet50, ResNet101, and Xception were trained with our in-house pathological images from breast cancer patient with recurrent status in the first transfer learning step and TCGA-BRCA dataset for the second transfer learning step. Furthermore, we also trained ResNet101 model with weight from ImageNet for comparison to the aforementioned models. The two-step deep learning models showed promising classification results of the four breast cancer intrinsic subtypes with accuracy ranging from 0.68 (ResNet50) to 0.78 (ResNet101) in both validation and testing sets. Additionally, the overall accuracy of slide-wise prediction showed even higher average accuracy of 0.913 with ResNet101 model. The micro- and macro-average area under the curve (AUC) for these models ranged from 0.88 (ResNet50) to 0.94 (ResNet101), whereas ResNet101_imgnet weighted with ImageNet archived an AUC of 0.92. We also show the deep learning model prediction performance is significantly improved relatively to the common Genefu tool for breast cancer classification. Our study demonstrated the capability of deep learning models to classify breast cancer intrinsic subtypes without the region of interest annotation, which will facilitate the clinical applicability of the proposed models.
Collapse
Affiliation(s)
- Nam Nhut Phan
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
- Bioinformatics and Biostatistics Core, Centre of Genomic and Precision Medicine, National Taiwan University, Taipei, Taiwan
| | - Chi-Cheng Huang
- Comprehensive Breast Health Center, Taipei Veterans General Hospital, Taipei, Taiwan
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Ling-Ming Tseng
- Comprehensive Breast Health Center, Taipei Veterans General Hospital, Taipei, Taiwan
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Eric Y. Chuang
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
- Bioinformatics and Biostatistics Core, Centre of Genomic and Precision Medicine, National Taiwan University, Taipei, Taiwan
- Master Program for Biomedical Engineering, China Medical University, Taichung, Taiwan
| |
Collapse
|
20
|
Koyuncu D, Niazi MKK, Tavolara T, Abeijon C, Ginese ML, Liao Y, Mark C, Specht A, Gower AC, Restrepo BI, Gatti DM, Kramnik I, Gurcan M, Yener B, Beamer G. CXCL1: A new diagnostic biomarker for human tuberculosis discovered using Diversity Outbred mice. PLoS Pathog 2021; 17:e1009773. [PMID: 34403447 PMCID: PMC8423361 DOI: 10.1371/journal.ppat.1009773] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 09/07/2021] [Accepted: 06/30/2021] [Indexed: 12/12/2022] Open
Abstract
More humans have died of tuberculosis (TB) than any other infectious disease and millions still die each year. Experts advocate for blood-based, serum protein biomarkers to help diagnose TB, which afflicts millions of people in high-burden countries. However, the protein biomarker pipeline is small. Here, we used the Diversity Outbred (DO) mouse population to address this gap, identifying five protein biomarker candidates. One protein biomarker, serum CXCL1, met the World Health Organization’s Targeted Product Profile for a triage test to diagnose active TB from latent M.tb infection (LTBI), non-TB lung disease, and normal sera in HIV-negative, adults from South Africa and Vietnam. To find the biomarker candidates, we quantified seven immune cytokines and four inflammatory proteins corresponding to highly expressed genes unique to progressor DO mice. Next, we applied statistical and machine learning methods to the data, i.e., 11 proteins in lungs from 453 infected and 29 non-infected mice. After searching all combinations of five algorithms and 239 protein subsets, validating, and testing the findings on independent data, two combinations accurately diagnosed progressor DO mice: Logistic Regression using MMP8; and Gradient Tree Boosting using a panel of 4: CXCL1, CXCL2, TNF, IL-10. Of those five protein biomarker candidates, two (MMP8 and CXCL1) were crucial for classifying DO mice; were above the limit of detection in most human serum samples; and had not been widely assessed for diagnostic performance in humans before. In patient sera, CXCL1 exceeded the triage diagnostic test criteria (>90% sensitivity; >70% specificity), while MMP8 did not. Using Area Under the Curve analyses, CXCL1 averaged 94.5% sensitivity and 88.8% specificity for active pulmonary TB (ATB) vs LTBI; 90.9% sensitivity and 71.4% specificity for ATB vs non-TB; and 100.0% sensitivity and 98.4% specificity for ATB vs normal sera. Our findings overall show that the DO mouse population can discover diagnostic-quality, serum protein biomarkers of human TB. More humans die of tuberculosis (TB) than any other infectious disease, yet diagnostic tools remain limited. Here, we used the Diversity Outbred mouse population to discover candidate protein biomarkers of human TB. By applying statistical methods and machine learning to multidimensional data, we identified CXCL1 and MMP8 as the two most promising protein biomarker candidates. When evaluated in samples from human patients, CXCL1 achieved the World Health Organization’s targeted profile for a triage diagnostic test, discriminating active TB from important clinical differential diagnoses: latent Mtb infection and non-TB lung disease in HIV-negative adults. Overall, our studies show how a translationally relevant animal population model can accelerate TB biomarker discovery, validation, and testing for humans.
Collapse
Affiliation(s)
- Deniz Koyuncu
- Rensselaer Polytechnic Institute, Department of Electrical, Computer, and Systems Engineering, Troy, New York, United States of America
| | - Muhammad Khalid Khan Niazi
- Wake Forest School of Medicine, Bowman Gray Center for Medical Education, Winston-Salem, North Carolina, United States of America
| | - Thomas Tavolara
- Wake Forest School of Medicine, Bowman Gray Center for Medical Education, Winston-Salem, North Carolina, United States of America
| | - Claudia Abeijon
- Tufts University, Cummings School of Veterinary Medicine, North Grafton, Massachusetts, United States of America
| | - Melanie L. Ginese
- Tufts University, Cummings School of Veterinary Medicine, North Grafton, Massachusetts, United States of America
| | | | - Carolyn Mark
- Kansas State University, College of Veterinary Medicine, Manhattan, Kansas, United States of America
| | - Aubrey Specht
- Tufts University, Cummings School of Veterinary Medicine, North Grafton, Massachusetts, United States of America
| | - Adam C. Gower
- Boston University Clinical and Translational Science Institute, Boston, Massachusetts, United States of America
| | - Blanca I. Restrepo
- The University of Texas Health Science Center at Houston School of Public Health in Brownsville, Texas, United States of America
| | - Daniel M. Gatti
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Igor Kramnik
- Boston University, National Emerging Infectious Diseases Laboratories, Boston, Massachusetts, United States of America
| | - Metin Gurcan
- Wake Forest School of Medicine, Bowman Gray Center for Medical Education, Winston-Salem, North Carolina, United States of America
| | - Bülent Yener
- Rensselaer Polytechnic Institute, Department of Computer Science, Troy, New York, United States of America
| | - Gillian Beamer
- Tufts University, Cummings School of Veterinary Medicine, North Grafton, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|