1
|
Tong L, Corrigan A, Kumar NR, Hallbrook K, Orme J, Wang Y, Zhou H. CLANet: A comprehensive framework for cross-batch cell line identification using brightfield images. Med Image Anal 2024; 94:103123. [PMID: 38430651 DOI: 10.1016/j.media.2024.103123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 02/23/2024] [Accepted: 02/25/2024] [Indexed: 03/05/2024]
Abstract
Cell line authentication plays a crucial role in the biomedical field, ensuring researchers work with accurately identified cells. Supervised deep learning has made remarkable strides in cell line identification by studying cell morphological features through cell imaging. However, biological batch (bio-batch) effects, a significant issue stemming from the different times at which data is generated, lead to substantial shifts in the underlying data distribution, thus complicating reliable differentiation between cell lines from distinct batch cultures. To address this challenge, we introduce CLANet, a pioneering framework for cross-batch cell line identification using brightfield images, specifically designed to tackle three distinct bio-batch effects. We propose a cell cluster-level selection method to efficiently capture cell density variations, and a self-supervised learning strategy to manage image quality variations, thus producing reliable patch representations. Additionally, we adopt multiple instance learning(MIL) for effective aggregation of instance-level features for cell line identification. Our innovative time-series segment sampling module further enhances MIL's feature-learning capabilities, mitigating biases from varying incubation times across batches. We validate CLANet using data from 32 cell lines across 93 experimental bio-batches from the AstraZeneca Global Cell Bank. Our results show that CLANet outperforms related approaches (e.g. domain adaptation, MIL), demonstrating its effectiveness in addressing bio-batch effects in cell line identification.
Collapse
Affiliation(s)
- Lei Tong
- School of Computing and Mathematical Sciences, University of Leicester, Leicester, UK; Data Sciences and Quantitative Biology, Discovery Sciences, AstraZeneca R&D, Cambridge, UK
| | - Adam Corrigan
- Data Sciences and Quantitative Biology, Discovery Sciences, AstraZeneca R&D, Cambridge, UK
| | - Navin Rathna Kumar
- UK Cell Culture and Banking, Discovery Sciences, AstraZeneca R&D, Alderley Park, UK
| | - Kerry Hallbrook
- UK Cell Culture and Banking, Discovery Sciences, AstraZeneca R&D, Alderley Park, UK
| | - Jonathan Orme
- UK Cell Culture and Banking, Discovery Sciences, AstraZeneca R&D, Cambridge, UK
| | - Yinhai Wang
- Data Sciences and Quantitative Biology, Discovery Sciences, AstraZeneca R&D, Cambridge, UK.
| | - Huiyu Zhou
- School of Computing and Mathematical Sciences, University of Leicester, Leicester, UK.
| |
Collapse
|
2
|
Zhou M, Ma Y, Chiang CC, Rock EC, Butler SC, Anne R, Yatsenko S, Gong Y, Chen YC. Single-cell morphological and transcriptome analysis unveil inhibitors of polyploid giant breast cancer cells in vitro. Commun Biol 2023; 6:1301. [PMID: 38129519 PMCID: PMC10739852 DOI: 10.1038/s42003-023-05674-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 12/04/2023] [Indexed: 12/23/2023] Open
Abstract
Considerable evidence suggests that breast cancer therapeutic resistance and relapse can be driven by polyploid giant cancer cells (PGCCs). The number of PGCCs increases with the stages of disease and therapeutic stress. Given the importance of PGCCs, it remains challenging to eradicate them. To discover effective anti-PGCC compounds, there is an unmet need to rapidly distinguish compounds that kill non-PGCCs, PGCCs, or both. Here, we establish a single-cell morphological analysis pipeline with a high throughput and great precision to characterize dynamics of individual cells. In this manner, we screen a library to identify promising compounds that inhibit all cancer cells or only PGCCs (e.g., regulators of HDAC, proteasome, and ferroptosis). Additionally, we perform scRNA-Seq to reveal altered cell cycle, metabolism, and ferroptosis sensitivity in breast PGCCs. The combination of single-cell morphological and molecular investigation reveals promising anti-PGCC strategies for breast cancer treatment and other malignancies.
Collapse
Affiliation(s)
- Mengli Zhou
- UPMC Hillman Cancer Center, University of Pittsburgh, 5115 Centre Ave, Pittsburgh, PA, 15232, USA
- Department of Computational and Systems Biology, University of Pittsburgh, 3420 Forbes Avenue, Pittsburgh, PA, 15260, USA
- Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China
| | - Yushu Ma
- UPMC Hillman Cancer Center, University of Pittsburgh, 5115 Centre Ave, Pittsburgh, PA, 15232, USA
- Department of Computational and Systems Biology, University of Pittsburgh, 3420 Forbes Avenue, Pittsburgh, PA, 15260, USA
| | - Chun-Cheng Chiang
- UPMC Hillman Cancer Center, University of Pittsburgh, 5115 Centre Ave, Pittsburgh, PA, 15232, USA
- Department of Computational and Systems Biology, University of Pittsburgh, 3420 Forbes Avenue, Pittsburgh, PA, 15260, USA
| | - Edwin C Rock
- Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, PA, 15260, USA
| | - Samuel Charles Butler
- UPMC Hillman Cancer Center, University of Pittsburgh, 5115 Centre Ave, Pittsburgh, PA, 15232, USA
| | - Rajiv Anne
- Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, PA, 15260, USA
| | - Svetlana Yatsenko
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of Pittsburgh, Pittsburgh, PA, USA
- Magee Womens Research Institute, Pittsburgh, PA, USA
| | - Yinan Gong
- UPMC Hillman Cancer Center, University of Pittsburgh, 5115 Centre Ave, Pittsburgh, PA, 15232, USA
- Department of Immunology, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA
| | - Yu-Chih Chen
- UPMC Hillman Cancer Center, University of Pittsburgh, 5115 Centre Ave, Pittsburgh, PA, 15232, USA.
- Department of Computational and Systems Biology, University of Pittsburgh, 3420 Forbes Avenue, Pittsburgh, PA, 15260, USA.
- Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, PA, 15260, USA.
- CMU-Pitt Ph.D. Program in Computational Biology, University of Pittsburgh, 3420 Forbes Avenue, Pittsburgh, PA, 15260, USA.
| |
Collapse
|
3
|
Menke J, Eckmann P, Ozyurt IB, Roelandse M, Anderson N, Grethe J, Gamst A, Bandrowski A. Establishing Institutional Scores With the Rigor and Transparency Index: Large-scale Analysis of Scientific Reporting Quality. J Med Internet Res 2022; 24:e37324. [PMID: 35759334 PMCID: PMC9274430 DOI: 10.2196/37324] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 05/10/2022] [Accepted: 05/23/2022] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Improving rigor and transparency measures should lead to improvements in reproducibility across the scientific literature; however, the assessment of measures of transparency tends to be very difficult if performed manually. OBJECTIVE This study addresses the enhancement of the Rigor and Transparency Index (RTI, version 2.0), which attempts to automatically assess the rigor and transparency of journals, institutions, and countries using manuscripts scored on criteria found in reproducibility guidelines (eg, Materials Design, Analysis, and Reporting checklist criteria). METHODS The RTI tracks 27 entity types using natural language processing techniques such as Bidirectional Long Short-term Memory Conditional Random Field-based models and regular expressions; this allowed us to assess over 2 million papers accessed through PubMed Central. RESULTS Between 1997 and 2020 (where data were readily available in our data set), rigor and transparency measures showed general improvement (RTI 2.29 to 4.13), suggesting that authors are taking the need for improved reporting seriously. The top-scoring journals in 2020 were the Journal of Neurochemistry (6.23), British Journal of Pharmacology (6.07), and Nature Neuroscience (5.93). We extracted the institution and country of origin from the author affiliations to expand our analysis beyond journals. Among institutions publishing >1000 papers in 2020 (in the PubMed Central open access set), Capital Medical University (4.75), Yonsei University (4.58), and University of Copenhagen (4.53) were the top performers in terms of RTI. In country-level performance, we found that Ethiopia and Norway consistently topped the RTI charts of countries with 100 or more papers per year. In addition, we tested our assumption that the RTI may serve as a reliable proxy for scientific replicability (ie, a high RTI represents papers containing sufficient information for replication efforts). Using work by the Reproducibility Project: Cancer Biology, we determined that replication papers (RTI 7.61, SD 0.78) scored significantly higher (P<.001) than the original papers (RTI 3.39, SD 1.12), which according to the project required additional information from authors to begin replication efforts. CONCLUSIONS These results align with our view that RTI may serve as a reliable proxy for scientific replicability. Unfortunately, RTI measures for journals, institutions, and countries fall short of the replicated paper average. If we consider the RTI of these replication studies as a target for future manuscripts, more work will be needed to ensure that the average manuscript contains sufficient information for replication attempts.
Collapse
Affiliation(s)
- Joe Menke
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, CA, United States
- SciCrunch Inc., San Diego, CA, United States
| | - Peter Eckmann
- SciCrunch Inc., San Diego, CA, United States
- Department of Neuroscience, University of California, San Diego, La Jolla, CA, United States
| | - Ibrahim Burak Ozyurt
- SciCrunch Inc., San Diego, CA, United States
- Department of Neuroscience, University of California, San Diego, La Jolla, CA, United States
| | | | | | - Jeffrey Grethe
- SciCrunch Inc., San Diego, CA, United States
- Department of Neuroscience, University of California, San Diego, La Jolla, CA, United States
| | - Anthony Gamst
- Department of Mathematics, University of California, San Diego, CA, United States
| | - Anita Bandrowski
- SciCrunch Inc., San Diego, CA, United States
- Department of Neuroscience, University of California, San Diego, La Jolla, CA, United States
| |
Collapse
|
4
|
An automated cell line authentication method for AstraZeneca global cell bank using deep neural networks on brightfield images. Sci Rep 2022; 12:7894. [PMID: 35550583 PMCID: PMC9098893 DOI: 10.1038/s41598-022-12099-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 05/05/2022] [Indexed: 11/09/2022] Open
Abstract
Cell line authentication is important in the biomedical field to ensure that researchers are not working with misidentified cells. Short tandem repeat is the gold standard method, but has its own limitations, including being expensive and time-consuming. Deep neural networks achieve great success in the analysis of cellular images in a cost-effective way. However, because of the lack of centralized available datasets, whether or not cell line authentication can be replaced or supported by cell image classification is still a question. Moreover, the relationship between the incubation times and cellular images has not been explored in previous studies. In this study, we automated the process of the cell line authentication by using deep learning analysis of brightfield cell line images. We proposed a novel multi-task framework to identify cell lines from cell images and predict the duration of how long cell lines have been incubated simultaneously. Using thirty cell lines' data from the AstraZeneca Cell Bank, we demonstrated that our proposed method can accurately identify cell lines from brightfield images with a 99.8% accuracy and predicts the incubation durations for cell images with the coefficient of determination score of 0.927. Considering that new cell lines are continually added to the AstraZeneca Cell Bank, we integrated the transfer learning technique with the proposed system to deal with data from new cell lines not included in the pre-trained model. Our method achieved excellent performance with a precision of 97.7% and recall of 95.8% in the detection of 14 new cell lines. These results demonstrated that our proposed framework can effectively identify cell lines using brightfield images.
Collapse
|
5
|
Improving segmentation and classification of renal tumors in small sample 3D CT images using transfer learning with convolutional neural networks. Int J Comput Assist Radiol Surg 2022; 17:1303-1311. [PMID: 35290645 DOI: 10.1007/s11548-022-02587-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 02/24/2022] [Indexed: 11/05/2022]
Abstract
PURPOSE Computed tomography (CT) images can display internal organs of patients and are particularly suitable for preoperative surgical diagnoses. The increasing demands for computer-aided systems in recent years have facilitated the development of many automated algorithms, especially deep convolutional neural networks, to segment organs and tumors or identify diseases from CT images. However, performances of some systems are highly affected by the amount of training data, while the sizes of medical image data sets, especially three-dimensional (3D) data sets, are usually small. This condition limits the application of deep learning. METHODS In this study, given a practical clinical data set that has 3D CT images of 20 patients with renal carcinoma, we designed a pipeline employing transfer learning to alleviate the detrimental effect of the small sample size. A dual-channel fine segmentation network (FS-Net) was constructed to segment kidney and tumor regions, with 210 publicly available 3D images from a competition employed during the training phase. We also built discriminative classifiers to classify the benign and malignant tumors based on the segmented regions, where both handcrafted and deep features were tested. RESULTS Our experimental results showed that the Dice values of segmented kidney and tumor regions were 0.9662 and 0.7685, respectively, which were better than those of state-of-the-art methods. The classification model using radiomics features can classify most of the tumors correctly. CONCLUSIONS The designed FS-Net was demonstrated to be more effective than simply fine-tuning on the practical small size data set given that the model can borrow knowledge from large auxiliary data without diluting the signal in primary data. For the small data set, radiomics features outperformed deep features in the classification of benign and malignant tumors. This work highlights the importance of architecture design in transfer learning, and the proposed pipeline is anticipated to provide a reference and inspiration for small data analysis.
Collapse
|
6
|
Ayana G, Park J, Jeong JW, Choe SW. A Novel Multistage Transfer Learning for Ultrasound Breast Cancer Image Classification. Diagnostics (Basel) 2022; 12:135. [PMID: 35054303 PMCID: PMC8775102 DOI: 10.3390/diagnostics12010135] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 12/24/2021] [Accepted: 12/30/2021] [Indexed: 12/31/2022] Open
Abstract
Breast cancer diagnosis is one of the many areas that has taken advantage of artificial intelligence to achieve better performance, despite the fact that the availability of a large medical image dataset remains a challenge. Transfer learning (TL) is a phenomenon that enables deep learning algorithms to overcome the issue of shortage of training data in constructing an efficient model by transferring knowledge from a given source task to a target task. However, in most cases, ImageNet (natural images) pre-trained models that do not include medical images, are utilized for transfer learning to medical images. Considering the utilization of microscopic cancer cell line images that can be acquired in large amount, we argue that learning from both natural and medical datasets improves performance in ultrasound breast cancer image classification. The proposed multistage transfer learning (MSTL) algorithm was implemented using three pre-trained models: EfficientNetB2, InceptionV3, and ResNet50 with three optimizers: Adam, Adagrad, and stochastic gradient de-scent (SGD). Dataset sizes of 20,400 cancer cell images, 200 ultrasound images from Mendeley and 400 ultrasound images from the MT-Small-Dataset were used. ResNet50-Adagrad-based MSTL achieved a test accuracy of 99 ± 0.612% on the Mendeley dataset and 98.7 ± 1.1% on the MT-Small-Dataset, averaging over 5-fold cross validation. A p-value of 0.01191 was achieved when comparing MSTL against ImageNet based TL for the Mendeley dataset. The result is a significant improvement in the performance of artificial intelligence methods for ultrasound breast cancer classification compared to state-of-the-art methods and could remarkably improve the early diagnosis of breast cancer in young women.
Collapse
Affiliation(s)
- Gelan Ayana
- Department of Medical IT Convergence Engineering, Kumoh National Institute of Technology, Gumi 39253, Korea
| | - Jinhyung Park
- Department of Medical IT Convergence Engineering, Kumoh National Institute of Technology, Gumi 39253, Korea
| | - Jin-Woo Jeong
- Department of Data Science, Seoul National University of Science and Technology, Seoul 01811, Korea
| | - Se-Woon Choe
- Department of Medical IT Convergence Engineering, Kumoh National Institute of Technology, Gumi 39253, Korea
- Department of IT Convergence Engineering, Kumoh National Institute of Technology, Gumi 39253, Korea
| |
Collapse
|
7
|
Cheng M, Liu L, Zhang P, Xiong S, Dou H. Cell Coding Arrays Based on Fluorescent Glycan Nanoparticles for Cell Line Identification and Cell Contamination Evaluation. ACS APPLIED MATERIALS & INTERFACES 2021; 13:44054-44064. [PMID: 34499479 DOI: 10.1021/acsami.1c12674] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Cell lines are applied on a large scale in the field of biomedicine, but they are susceptible to issues such as misidentification and cross-contamination. This situation is becoming worse over time due to the rapid growth of the biomedical field, and thus there is an urgent need for a more effective strategy to address the problem. As described herein, a cell coding method is established based on two types of uniform and stable glycan nanoparticles that are synthesized using the graft-copolymerization-induced self-assembly (GISA) method, which further exhibit distinct fluorescent properties due to elaborate modification with fluorescent labeling molecules. The different affinity between each nanoparticle and various cell lines results in clearly distinguishable differences in their endocytosis degrees, thus resulting in distinct characteristic fluorescence intensities. Through flow cytometry measurements, the specific signals of each cell sample can be recorded and turned into a map divided into different regions by statistical processing. Using this sensing array strategy, we have successfully identified six human cell lines, including one normal type and five tumor types. Moreover, cell contamination evaluation of different cell lines with HeLa cells as the contaminant in a semiquantitative analysis has also been successfully achieved. Notably, the whole process of nanoparticle fabrication and fluorescent testing is facile and the results are highly reliable.
Collapse
Affiliation(s)
- Meng Cheng
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Lingshan Liu
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Peipei Zhang
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Shuhan Xiong
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Hongjing Dou
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| |
Collapse
|