1
|
Tafavvoghi M, Bongo LA, Shvetsov N, Busund LTR, Møllersen K. Publicly available datasets of breast histopathology H&E whole-slide images: A scoping review. J Pathol Inform 2024; 15:100363. [PMID: 38405160 PMCID: PMC10884505 DOI: 10.1016/j.jpi.2024.100363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 11/24/2023] [Accepted: 01/23/2024] [Indexed: 02/27/2024] Open
Abstract
Advancements in digital pathology and computing resources have made a significant impact in the field of computational pathology for breast cancer diagnosis and treatment. However, access to high-quality labeled histopathological images of breast cancer is a big challenge that limits the development of accurate and robust deep learning models. In this scoping review, we identified the publicly available datasets of breast H&E-stained whole-slide images (WSIs) that can be used to develop deep learning algorithms. We systematically searched 9 scientific literature databases and 9 research data repositories and found 17 publicly available datasets containing 10 385 H&E WSIs of breast cancer. Moreover, we reported image metadata and characteristics for each dataset to assist researchers in selecting proper datasets for specific tasks in breast cancer computational pathology. In addition, we compiled 2 lists of breast H&E patches and private datasets as supplementary resources for researchers. Notably, only 28% of the included articles utilized multiple datasets, and only 14% used an external validation set, suggesting that the performance of other developed models may be susceptible to overestimation. The TCGA-BRCA was used in 52% of the selected studies. This dataset has a considerable selection bias that can impact the robustness and generalizability of the trained algorithms. There is also a lack of consistent metadata reporting of breast WSI datasets that can be an issue in developing accurate deep learning models, indicating the necessity of establishing explicit guidelines for documenting breast WSI dataset characteristics and metadata.
Collapse
Affiliation(s)
- Masoud Tafavvoghi
- Department of Community Medicine, Uit The Arctic University of Norway, Tromsø, Norway
| | - Lars Ailo Bongo
- Department of Computer Science, Uit The Arctic University of Norway, Tromsø, Norway
| | - Nikita Shvetsov
- Department of Computer Science, Uit The Arctic University of Norway, Tromsø, Norway
| | | | - Kajsa Møllersen
- Department of Community Medicine, Uit The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
2
|
Hosseini MS, Bejnordi BE, Trinh VQH, Chan L, Hasan D, Li X, Yang S, Kim T, Zhang H, Wu T, Chinniah K, Maghsoudlou S, Zhang R, Zhu J, Khaki S, Buin A, Chaji F, Salehi A, Nguyen BN, Samaras D, Plataniotis KN. Computational pathology: A survey review and the way forward. J Pathol Inform 2024; 15:100357. [PMID: 38420608 PMCID: PMC10900832 DOI: 10.1016/j.jpi.2023.100357] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 12/21/2023] [Accepted: 12/23/2023] [Indexed: 03/02/2024] Open
Abstract
Computational Pathology (CPath) is an interdisciplinary science that augments developments of computational approaches to analyze and model medical histopathology images. The main objective for CPath is to develop infrastructure and workflows of digital diagnostics as an assistive CAD system for clinical pathology, facilitating transformational changes in the diagnosis and treatment of cancer that are mainly address by CPath tools. With evergrowing developments in deep learning and computer vision algorithms, and the ease of the data flow from digital pathology, currently CPath is witnessing a paradigm shift. Despite the sheer volume of engineering and scientific works being introduced for cancer image analysis, there is still a considerable gap of adopting and integrating these algorithms in clinical practice. This raises a significant question regarding the direction and trends that are undertaken in CPath. In this article we provide a comprehensive review of more than 800 papers to address the challenges faced in problem design all-the-way to the application and implementation viewpoints. We have catalogued each paper into a model-card by examining the key works and challenges faced to layout the current landscape in CPath. We hope this helps the community to locate relevant works and facilitate understanding of the field's future directions. In a nutshell, we oversee the CPath developments in cycle of stages which are required to be cohesively linked together to address the challenges associated with such multidisciplinary science. We overview this cycle from different perspectives of data-centric, model-centric, and application-centric problems. We finally sketch remaining challenges and provide directions for future technical developments and clinical integration of CPath. For updated information on this survey review paper and accessing to the original model cards repository, please refer to GitHub. Updated version of this draft can also be found from arXiv.
Collapse
Affiliation(s)
- Mahdi S Hosseini
- Department of Computer Science and Software Engineering (CSSE), Concordia Univeristy, Montreal, QC H3H 2R9, Canada
| | | | - Vincent Quoc-Huy Trinh
- Institute for Research in Immunology and Cancer of the University of Montreal, Montreal, QC H3T 1J4, Canada
| | - Lyndon Chan
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Danial Hasan
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Xingwen Li
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Stephen Yang
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Taehyo Kim
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Haochen Zhang
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Theodore Wu
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Kajanan Chinniah
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Sina Maghsoudlou
- Department of Computer Science and Software Engineering (CSSE), Concordia Univeristy, Montreal, QC H3H 2R9, Canada
| | - Ryan Zhang
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Jiadai Zhu
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Samir Khaki
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| | - Andrei Buin
- Huron Digitial Pathology, St. Jacobs, ON N0B 2N0, Canada
| | - Fatemeh Chaji
- Department of Computer Science and Software Engineering (CSSE), Concordia Univeristy, Montreal, QC H3H 2R9, Canada
| | - Ala Salehi
- Department of Electrical and Computer Engineering, University of New Brunswick, Fredericton, NB E3B 5A3, Canada
| | - Bich Ngoc Nguyen
- University of Montreal Hospital Center, Montreal, QC H2X 0C2, Canada
| | - Dimitris Samaras
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, United States
| | - Konstantinos N Plataniotis
- The Edward S. Rogers Sr. Department of Electrical & Computer Engineering (ECE), University of Toronto, Toronto, ON M5S 3G4, Canada
| |
Collapse
|
3
|
Li Y, Shen Y, Zhang J, Song S, Li Z, Ke J, Shen D. A Hierarchical Graph V-Net With Semi-Supervised Pre-Training for Histological Image Based Breast Cancer Classification. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3907-3918. [PMID: 37725717 DOI: 10.1109/tmi.2023.3317132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/21/2023]
Abstract
Numerous patch-based methods have recently been proposed for histological image based breast cancer classification. However, their performance could be highly affected by ignoring spatial contextual information in the whole slide image (WSI). To address this issue, we propose a novel hierarchical Graph V-Net by integrating 1) patch-level pre-training and 2) context-based fine-tuning, with a hierarchical graph network. Specifically, a semi-supervised framework based on knowledge distillation is first developed to pre-train a patch encoder for extracting disease-relevant features. Then, a hierarchical Graph V-Net is designed to construct a hierarchical graph representation from neighboring/similar individual patches for coarse-to-fine classification, where each graph node (corresponding to one patch) is attached with extracted disease-relevant features and its target label during training is the average label of all pixels in the corresponding patch. To evaluate the performance of our proposed hierarchical Graph V-Net, we collect a large WSI dataset of 560 WSIs, with 30 labeled WSIs from the BACH dataset (through our further refinement), 30 labeled WSIs and 500 unlabeled WSIs from Yunnan Cancer Hospital. Those 500 unlabeled WSIs are employed for patch-level pre-training to improve feature representation, while 60 labeled WSIs are used to train and test our proposed hierarchical Graph V-Net. Both comparative assessment and ablation studies demonstrate the superiority of our proposed hierarchical Graph V-Net over state-of-the-art methods in classifying breast cancer from WSIs. The source code and our annotations for the BACH dataset have been released at https://github.com/lyhkevin/Graph-V-Net.
Collapse
|
4
|
Ke J, Liu K, Sun Y, Xue Y, Huang J, Lu Y, Dai J, Chen Y, Han X, Shen Y, Shen D. Artifact Detection and Restoration in Histology Images With Stain-Style and Structural Preservation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3487-3500. [PMID: 37352087 DOI: 10.1109/tmi.2023.3288940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/25/2023]
Abstract
The artifacts in histology images may encumber the accurate interpretation of medical information and cause misdiagnosis. Accordingly, prepending manual quality control of artifacts considerably decreases the degree of automation. To close this gap, we propose a methodical pre-processing framework to detect and restore artifacts, which minimizes their impact on downstream AI diagnostic tasks. First, the artifact recognition network AR-Classifier first differentiates common artifacts from normal tissues, e.g., tissue folds, marking dye, tattoo pigment, spot, and out-of-focus, and also catalogs artifact patches by their restorability. Then, the succeeding artifact restoration network AR-CycleGAN performs de-artifact processing where stain styles and tissue structures can be maximally retained. We construct a benchmark for performance evaluation, curated from both clinically collected WSIs and public datasets of colorectal and breast cancer. The functional structures are compared with state-of-the-art methods, and also comprehensively evaluated by multiple metrics across multiple tasks, including artifact classification, artifact restoration, downstream diagnostic tasks of tumor classification and nuclei segmentation. The proposed system allows full automation of deep learning based histology image analysis without human intervention. Moreover, the structure-independent characteristic enables its processing with various artifact subtypes. The source code and data in this research are available at https://github.com/yunboer/AR-classifier-and-AR-CycleGAN.
Collapse
|
5
|
Mohammed MA, Lakhan A, Abdulkareem KH, Garcia-Zapirain B. A hybrid cancer prediction based on multi-omics data and reinforcement learning state action reward state action (SARSA). Comput Biol Med 2023; 154:106617. [PMID: 36753981 DOI: 10.1016/j.compbiomed.2023.106617] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 01/21/2023] [Accepted: 01/28/2023] [Indexed: 02/05/2023]
Abstract
These days, the ratio of cancer diseases among patients has been growing day by day. Recently, many cancer cases have been reported in different clinical hospitals. Many machine learning algorithms have been suggested in the literature to predict cancer diseases with the same class types based on trained and test data. However, there are many research rooms available for further research. In this paper, the studies look into the different types of cancer by analyzing, classifying, and processing the multi-omics dataset in a fog cloud network. Based on SARSA on-policy and multi-omics workload learning, made possible by reinforcement learning, the study made new hybrid cancer detection schemes. It consists of different layers, such as clinical data collection via laboratories and tool processes (biopsy, colonoscopy, and mammography) at the distributed omics-based clinics in the network. The study considers the different cancer classes such as carcinomas, sarcomas, leukemias, and lymphomas with their types in work and processes them using the multi-omics distributed clinics in work. In order to solve the problem, the study presents omics cancer workload reinforcement learning state action reward state action "SARSA" (OCWLS) schemes, which are made up of an on-policy learning scheme on different parameters like states, actions, timestamps, reward, accuracy, and processing time constraints. The goal is to process multiple cancer classes and workload feature matching while reducing the time it takes to process in clinical hospitals that are spread out. Simulation results show that OCWLS is better than other machine learning methods regarding+ processing time, extracting features from multiple classes of cancer, and matching in the system.
Collapse
Affiliation(s)
- Mazin Abed Mohammed
- College of Computer Science and Information Technology, University of Anbar, Anbar 31001, Iraq; eVIDA Lab, University of Deusto, 48007 Bilbao, Spain.
| | - Abdullah Lakhan
- Department of Computer Science, Dawood University of Engineering and Technology, Pakistan.
| | - Karrar Hameed Abdulkareem
- College of Agriculture, Al-Muthanna University, Samawah 66001, Iraq; College of Engineering, University of Warith Al-Anbiyaa, Karbala 56001, Iraq.
| | | |
Collapse
|