1
|
Chiang CC, Anne R, Chawla P, Shaw RM, He S, Rock EC, Zhou M, Cheng J, Gong YN, Chen YC. Deep learning unlocks label-free viability assessment of cancer spheroids in microfluidics. LAB ON A CHIP 2024; 24:3169-3182. [PMID: 38804084 PMCID: PMC11165951 DOI: 10.1039/d4lc00197d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Accepted: 05/22/2024] [Indexed: 05/29/2024]
Abstract
Despite recent advances in cancer treatment, refining therapeutic agents remains a critical task for oncologists. Precise evaluation of drug effectiveness necessitates the use of 3D cell culture instead of traditional 2D monolayers. Microfluidic platforms have enabled high-throughput drug screening with 3D models, but current viability assays for 3D cancer spheroids have limitations in reliability and cytotoxicity. This study introduces a deep learning model for non-destructive, label-free viability estimation based on phase-contrast images, providing a cost-effective, high-throughput solution for continuous spheroid monitoring in microfluidics. Microfluidic technology facilitated the creation of a high-throughput cancer spheroid platform with approximately 12 000 spheroids per chip for drug screening. Validation involved tests with eight conventional chemotherapeutic drugs, revealing a strong correlation between viability assessed via LIVE/DEAD staining and phase-contrast morphology. Extending the model's application to novel compounds and cell lines not in the training dataset yielded promising results, implying the potential for a universal viability estimation model. Experiments with an alternative microscopy setup supported the model's transferability across different laboratories. Using this method, we also tracked the dynamic changes in spheroid viability during the course of drug administration. In summary, this research integrates a robust platform with high-throughput microfluidic cancer spheroid assays and deep learning-based viability estimation, with broad applicability to various cell lines, compounds, and research settings.
Collapse
Affiliation(s)
- Chun-Cheng Chiang
- UPMC Hillman Cancer Center, University of Pittsburgh, 5115 Centre Ave, Pittsburgh, PA 15232, USA.
- Department of Computational and Systems Biology, University of Pittsburgh, 3420 Forbes Avenue, Pittsburgh, PA 15260, USA
| | - Rajiv Anne
- UPMC Hillman Cancer Center, University of Pittsburgh, 5115 Centre Ave, Pittsburgh, PA 15232, USA.
- Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, PA 15260, USA
| | - Pooja Chawla
- Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, PA 15260, USA
| | - Rachel M Shaw
- Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, PA 15260, USA
| | - Sarah He
- UPMC Hillman Cancer Center, University of Pittsburgh, 5115 Centre Ave, Pittsburgh, PA 15232, USA.
- Carnegie Mellon University, Department of Biological Sciences, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA
| | - Edwin C Rock
- Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, PA 15260, USA
| | - Mengli Zhou
- UPMC Hillman Cancer Center, University of Pittsburgh, 5115 Centre Ave, Pittsburgh, PA 15232, USA.
- Department of Computational and Systems Biology, University of Pittsburgh, 3420 Forbes Avenue, Pittsburgh, PA 15260, USA
- Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China
| | - Jinxiong Cheng
- UPMC Hillman Cancer Center, University of Pittsburgh, 5115 Centre Ave, Pittsburgh, PA 15232, USA.
- Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, PA 15260, USA
| | - Yi-Nan Gong
- UPMC Hillman Cancer Center, University of Pittsburgh, 5115 Centre Ave, Pittsburgh, PA 15232, USA.
- Department of Immunology, University of Pittsburgh School of Medicine, 3420 Forbes Avenue, Pittsburgh, PA, 15260, USA
| | - Yu-Chih Chen
- UPMC Hillman Cancer Center, University of Pittsburgh, 5115 Centre Ave, Pittsburgh, PA 15232, USA.
- Department of Computational and Systems Biology, University of Pittsburgh, 3420 Forbes Avenue, Pittsburgh, PA 15260, USA
- Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, PA 15260, USA
- CMU-Pitt Ph.D. Program in Computational Biology, University of Pittsburgh, 3420 Forbes Avenue, Pittsburgh, PA 15260, USA
| |
Collapse
|
2
|
Ma K, Gauthier LO, Cheung F, Huang S, Lek M. High-throughput assays to assess variant effects on disease. Dis Model Mech 2024; 17:dmm050573. [PMID: 38940340 PMCID: PMC11225591 DOI: 10.1242/dmm.050573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
Interpreting the wealth of rare genetic variants discovered in population-scale sequencing efforts and deciphering their associations with human health and disease present a critical challenge due to the lack of sufficient clinical case reports. One promising avenue to overcome this problem is deep mutational scanning (DMS), a method of introducing and evaluating large-scale genetic variants in model cell lines. DMS allows unbiased investigation of variants, including those that are not found in clinical reports, thus improving rare disease diagnostics. Currently, the main obstacle limiting the full potential of DMS is the availability of functional assays that are specific to disease mechanisms. Thus, we explore high-throughput functional methodologies suitable to examine broad disease mechanisms. We specifically focus on methods that do not require robotics or automation but instead use well-designed molecular tools to transform biological mechanisms into easily detectable signals, such as cell survival rate, fluorescence or drug resistance. Here, we aim to bridge the gap between disease-relevant assays and their integration into the DMS framework.
Collapse
Affiliation(s)
- Kaiyue Ma
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| | - Logan O. Gauthier
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| | - Frances Cheung
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| | - Shushu Huang
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| | - Monkol Lek
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| |
Collapse
|
3
|
Wen JW, Zhang HL, Du PF. Vislocas: Vision transformers for identifying protein subcellular mis-localization signatures of different cancer subtypes from immunohistochemistry images. Comput Biol Med 2024; 174:108392. [PMID: 38608321 DOI: 10.1016/j.compbiomed.2024.108392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 03/22/2024] [Accepted: 04/01/2024] [Indexed: 04/14/2024]
Abstract
Proteins must be sorted to specific subcellular compartments to perform their functions. Abnormal protein subcellular localizations are related to many diseases. Although many efforts have been made in predicting protein subcellular localization from various static information, including sequences, structures and interactions, such static information cannot predict protein mis-localization events in diseases. On the contrary, the IHC (immunohistochemistry) images, which have been widely applied in clinical diagnosis, contains information that can be used to find protein mis-localization events in disease states. In this study, we create the Vislocas method, which is capable of finding mis-localized proteins from IHC images as markers of cancer subtypes. By combining CNNs and vision transformer encoders, Vislocas can automatically extract image features at both global and local level. Vislocas can be trained with full-sized IHC images from scratch. It is the first attempt to create an end-to-end IHC image-based protein subcellular location predictor. Vislocas achieved comparable or better performances than state-of-the-art methods. We applied Vislocas to find significant protein mis-localization events in different subtypes of glioma, melanoma and skin cancer. The mis-localized proteins, which were found purely from IHC images by Vislocas, are in consistency with clinical or experimental results in literatures. All codes of Vislocas have been deposited in a Github repository (https://github.com/JingwenWen99/Vislocas). All datasets of Vislocas have been deposited in Zenodo (https://zenodo.org/records/10632698).
Collapse
Affiliation(s)
- Jing-Wen Wen
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Han-Lin Zhang
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| |
Collapse
|
4
|
Ferreira EKGD, Silveira GF. Classification and counting of cells in brightfield microscopy images: an application of convolutional neural networks. Sci Rep 2024; 14:9031. [PMID: 38641688 PMCID: PMC11031575 DOI: 10.1038/s41598-024-59625-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 04/12/2024] [Indexed: 04/21/2024] Open
Abstract
Microscopy is integral to medical research, facilitating the exploration of various biological questions, notably cell quantification. However, this process's time-consuming and error-prone nature, attributed to human intervention or automated methods usually applied to fluorescent images, presents challenges. In response, machine learning algorithms have been integrated into microscopy, automating tasks and constructing predictive models from vast datasets. These models adeptly learn representations for object detection, image segmentation, and target classification. An advantageous strategy involves utilizing unstained images, preserving cell integrity and enabling morphology-based classification-something hindered when fluorescent markers are used. The aim is to introduce a model proficient in classifying distinct cell lineages in digital contrast microscopy images. Additionally, the goal is to create a predictive model identifying lineage and determining optimal quantification of cell numbers. Employing a CNN machine learning algorithm, a classification model predicting cellular lineage achieved a remarkable accuracy of 93%, with ROC curve results nearing 1.0, showcasing robust performance. However, some lineages, namely SH-SY5Y (78%), HUH7_mayv (85%), and A549 (88%), exhibited slightly lower accuracies. These outcomes not only underscore the model's quality but also emphasize CNNs' potential in addressing the inherent complexities of microscopic images.
Collapse
Affiliation(s)
| | - G F Silveira
- Carlos Chagas Institute, Curitiba, PR, CEP 81310-020, Brazil.
| |
Collapse
|
5
|
Xiao H, Zou Y, Wang J, Wan S. A Review for Artificial Intelligence Based Protein Subcellular Localization. Biomolecules 2024; 14:409. [PMID: 38672426 PMCID: PMC11048326 DOI: 10.3390/biom14040409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/21/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open
Abstract
Proteins need to be located in appropriate spatiotemporal contexts to carry out their diverse biological functions. Mislocalized proteins may lead to a broad range of diseases, such as cancer and Alzheimer's disease. Knowing where a target protein resides within a cell will give insights into tailored drug design for a disease. As the gold validation standard, the conventional wet lab uses fluorescent microscopy imaging, immunoelectron microscopy, and fluorescent biomarker tags for protein subcellular location identification. However, the booming era of proteomics and high-throughput sequencing generates tons of newly discovered proteins, making protein subcellular localization by wet-lab experiments a mission impossible. To tackle this concern, in the past decades, artificial intelligence (AI) and machine learning (ML), especially deep learning methods, have made significant progress in this research area. In this article, we review the latest advances in AI-based method development in three typical types of approaches, including sequence-based, knowledge-based, and image-based methods. We also elaborately discuss existing challenges and future directions in AI-based method development in this research field.
Collapse
Affiliation(s)
- Hanyu Xiao
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Yijin Zou
- College of Veterinary Medicine, China Agricultural University, Beijing 100193, China;
| | - Jieqiong Wang
- Department of Neurological Sciences, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Shibiao Wan
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| |
Collapse
|
6
|
Jan M, Spangaro A, Lenartowicz M, Mattiazzi Usaj M. From pixels to insights: Machine learning and deep learning for bioimage analysis. Bioessays 2024; 46:e2300114. [PMID: 38058114 DOI: 10.1002/bies.202300114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 10/25/2023] [Accepted: 11/13/2023] [Indexed: 12/08/2023]
Abstract
Bioimage analysis plays a critical role in extracting information from biological images, enabling deeper insights into cellular structures and processes. The integration of machine learning and deep learning techniques has revolutionized the field, enabling the automated, reproducible, and accurate analysis of biological images. Here, we provide an overview of the history and principles of machine learning and deep learning in the context of bioimage analysis. We discuss the essential steps of the bioimage analysis workflow, emphasizing how machine learning and deep learning have improved preprocessing, segmentation, feature extraction, object tracking, and classification. We provide examples that showcase the application of machine learning and deep learning in bioimage analysis. We examine user-friendly software and tools that enable biologists to leverage these techniques without extensive computational expertise. This review is a resource for researchers seeking to incorporate machine learning and deep learning in their bioimage analysis workflows and enhance their research in this rapidly evolving field.
Collapse
Affiliation(s)
- Mahta Jan
- Department of Chemistry and Biology, Toronto Metropolitan University, Toronto, Canada
| | - Allie Spangaro
- Department of Chemistry and Biology, Toronto Metropolitan University, Toronto, Canada
| | - Michelle Lenartowicz
- Department of Chemistry and Biology, Toronto Metropolitan University, Toronto, Canada
| | - Mojca Mattiazzi Usaj
- Department of Chemistry and Biology, Toronto Metropolitan University, Toronto, Canada
| |
Collapse
|
7
|
Xiao Q, Wang Y, Fan J, Yi Z, Hong H, Xie X, Huang QA, Fu J, Ouyang J, Zhao X, Wang Z, Zhu Z. A computer vision and residual neural network (ResNet) combined method for automated and accurate yeast replicative aging analysis of high-throughput microfluidic single-cell images. Biosens Bioelectron 2024; 244:115807. [PMID: 37948914 DOI: 10.1016/j.bios.2023.115807] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 10/17/2023] [Accepted: 10/30/2023] [Indexed: 11/12/2023]
Abstract
With the rapid development of microfluidic platforms in high-throughput single-cell culturing, laborious operation to manipulate massive budding yeast cells (Saccharomyces cerevisiae) in replicative aging studies has been greatly simplified and automated. As a result, large datasets of microscopy images bring challenges to fast and accurately determine yeast replicative lifespan (RLS), which is the most important parameter to study cell aging. Based on our microfluidic diploid yeast long-term culturing (DYLC) chip that features 1100 traps to immobilize single cells and record their proliferation and aging via time-lapse imaging, herein, a dedicated algorithm combined with computer vision and residual neural network (ResNet) was presented to efficiently process tremendous micrographs in a high-throughput and automated manner. The image-processing algorithm includes following pivotal steps: (i) segmenting multi-trap micrographs into time-lapse single-trap sub-images, (ii) labeling 8 yeast budding features and training the 18-layer ResNet, (iii) converting the ResNet predictions in analog values into digital signals, (iv) recognizing cell dynamic events, and (v) determining yeast RLS and budding time interval (BTI) ultimately. The ResNet algorithm achieved high F1 scores (over 92%) demonstrating the effectiveness and accuracy in the recognition of yeast budding events, such as bud appearance, daughter dissection and cell death. Therefore, the results conduct that similar deep learning algorithms could be tailored to analyze high-throughput microscopy images and extract multiple cell behaviors in microfluidic single-cell analysis.
Collapse
Affiliation(s)
- Qin Xiao
- Southeast University, School of Integrated Circuits, School of Electronic Science and Engineering, Key Laboratory of MEMS of Ministry of Education, Sipailou 2, Nanjing, 210096, China
| | - Yingying Wang
- Southeast University, School of Integrated Circuits, School of Electronic Science and Engineering, Key Laboratory of MEMS of Ministry of Education, Sipailou 2, Nanjing, 210096, China
| | - Juncheng Fan
- Southeast University, School of Integrated Circuits, School of Electronic Science and Engineering, Key Laboratory of MEMS of Ministry of Education, Sipailou 2, Nanjing, 210096, China
| | - Zhenxiang Yi
- Southeast University, School of Integrated Circuits, School of Electronic Science and Engineering, Key Laboratory of MEMS of Ministry of Education, Sipailou 2, Nanjing, 210096, China
| | - Hua Hong
- Southeast University, School of Integrated Circuits, School of Electronic Science and Engineering, Key Laboratory of MEMS of Ministry of Education, Sipailou 2, Nanjing, 210096, China
| | - Xiao Xie
- Southeast University, School of Integrated Circuits, School of Electronic Science and Engineering, Key Laboratory of MEMS of Ministry of Education, Sipailou 2, Nanjing, 210096, China
| | - Qing-An Huang
- Southeast University, School of Integrated Circuits, School of Electronic Science and Engineering, Key Laboratory of MEMS of Ministry of Education, Sipailou 2, Nanjing, 210096, China
| | - Jiaming Fu
- Nanjing Forestry University, College of Chemical Engineering, Longpan Road 159, Nanjing, 210037, China
| | - Jia Ouyang
- Nanjing Forestry University, College of Chemical Engineering, Longpan Road 159, Nanjing, 210037, China
| | - Xiangwei Zhao
- Southeast University, School of Biological Science and Medical Engineering, State Key Laboratory of Digital Medical Engineering, Sipailou 2, Nanjing, 210096, China
| | - Zixin Wang
- Sun Yat-Sen University, School of Electronics and Information Technology, Waihuan Dong Road 132, Guangzhou, 510006, China.
| | - Zhen Zhu
- Southeast University, School of Integrated Circuits, School of Electronic Science and Engineering, Key Laboratory of MEMS of Ministry of Education, Sipailou 2, Nanjing, 210096, China.
| |
Collapse
|
8
|
Lau TA, Mair E, Rabbitts BM, Lohith A, Lokey RS. High-Content Image-Based Screening and Deep Learning for the Detection of Anti-Inflammatory Drug Leads. Chembiochem 2024; 25:e202300136. [PMID: 37815526 PMCID: PMC11126213 DOI: 10.1002/cbic.202300136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 10/02/2023] [Accepted: 10/10/2023] [Indexed: 10/11/2023]
Abstract
We developed a high-content image-based screen that utilizes the pro-inflammatory stimulus lipopolysaccharide (LPS) and murine macrophages (RAW264.7) with the goal of enabling the identification of novel anti-inflammatory lead compounds. We screened 2,259 bioactive compounds with annotated mechanisms of action (MOA) to identify compounds that block the LPS-induced phenotype in macrophages. We utilized a set of seven fluorescence microscopy probes to generate images that were used to train and optimize a deep neural network classifier to distinguish between unstimulated and LPS-stimulated macrophages. The top hits from the deep learning classifier were validated using a linear classifier trained on individual cells and subsequently investigated in a multiplexed cytokine secretion assay. All 12 hits significantly modulated the expression of at least one cytokine upon LPS stimulation. Seven of these were allosteric inhibitors of the mitogen-activated protein kinase kinase (MEK1/2) and showed similar effects on cytokine expression. This deep learning morphological assay identified compounds that modulate the innate immune response to LPS and may aid in identifying new anti-inflammatory drug leads.
Collapse
Affiliation(s)
- Tannia A Lau
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Elmar Mair
- No affiliation, Santa Cruz, CA 95060, USA
| | - Beverley M Rabbitts
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Akshar Lohith
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - R Scott Lokey
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
9
|
Corbe M, Boncompain G, Perez F, Del Nery E, Genovesio A. Transfer learning for versatile and training free high content screening analyses. Sci Rep 2023; 13:22599. [PMID: 38114550 PMCID: PMC10730630 DOI: 10.1038/s41598-023-49554-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 12/09/2023] [Indexed: 12/21/2023] Open
Abstract
High content screening (HCS) is a technology that automates cell biology experiments at large scale. A High Content Screen produces a high amount of microscopy images of cells under many conditions and requires that a dedicated image and data analysis workflow be designed for each assay to select hits. This heavy data analytic step remains challenging and has been recognized as one of the burdens hindering the adoption of HCS. In this work we propose a solution to hit selection by using transfer learning without additional training. A pretrained residual network is employed to encode each image of a screen into a discriminant representation. The deep features obtained are then corrected to account for well plate bias and misalignment. We then propose two training-free pipelines dedicated to the two main categories of HCS for compound selection: with or without positive control. When a positive control is available, it is used alongside the negative control to compute a linear discriminant axis, thus building a classifier without training. Once all samples are projected onto this axis, the conditions that best reproduce the positive control can be selected. When no positive control is available, the Mahalanobis distance is computed from each sample to the negative control distribution. The latter provides a metric to identify the conditions that alter the negative control's cell phenotype. This metric is subsequently used to categorize hits through a clustering step. Given the lack of available ground truth in HCS, we provide a qualitative comparison of the results obtained using this approach with results obtained with handcrafted image analysis features for compounds and siRNA screens with or without control. Our results suggests that the fully automated and generic pipeline we propose offers a good alternative to handcrafted dedicated image analysis approaches. Furthermore, we demonstrate that this solution select conditions of interest that had not been identified using the primary dedicated analysis. Altogether, this approach provides a fully automated, reproducible, versatile and comprehensive alternative analysis solution for HCS encompassing compound-based or downregulation screens, with or without positive controls, without the need for training or cell detection, or the development of a dedicated image analysis workflow.
Collapse
Affiliation(s)
- Maxime Corbe
- Computational Bioimaging and Bioinformatics, Institut de Biologie de l'Ecole Normale Supérieure, PSL University, 46 Rue d'Ulm, 75005, Paris, France
- Biophenics Laboratory, Department of Translational Research, Cell and Tissue Imaging Facility (PICT-IBiSA), Institut Curie, PSL Research University, 26 Rue d'Ulm, 75005, Paris, France
| | - Gaëlle Boncompain
- Dynamics of Intra-Cellular Organization - UMR144, Institut Curie, PSL Research University, Paris, France
| | - Franck Perez
- Biophenics Laboratory, Department of Translational Research, Cell and Tissue Imaging Facility (PICT-IBiSA), Institut Curie, PSL Research University, 26 Rue d'Ulm, 75005, Paris, France
- Dynamics of Intra-Cellular Organization - UMR144, Institut Curie, PSL Research University, Paris, France
| | - Elaine Del Nery
- Biophenics Laboratory, Department of Translational Research, Cell and Tissue Imaging Facility (PICT-IBiSA), Institut Curie, PSL Research University, 26 Rue d'Ulm, 75005, Paris, France.
| | - Auguste Genovesio
- Computational Bioimaging and Bioinformatics, Institut de Biologie de l'Ecole Normale Supérieure, PSL University, 46 Rue d'Ulm, 75005, Paris, France.
| |
Collapse
|
10
|
Khwaja E, Song YS, Agarunov A, Huang B. CELL-E 2: Translating Proteins to Pictures and Back with a Bidirectional Text-to-Image Transformer. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2023; 36:4899-4914. [PMID: 39021511 PMCID: PMC11254339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
We present CELL-E 2, a novel bidirectional transformer that can generate images depicting protein subcellular localization from the amino acid sequences (and vice versa). Protein localization is a challenging problem that requires integrating sequence and image information, which most existing methods ignore. CELL-E 2 extends the work of CELL-E, not only capturing the spatial complexity of protein localization and produce probability estimates of localization atop a nucleus image, but also being able to generate sequences from images, enabling de novo protein design. We train and finetune CELL-E 2 on two large-scale datasets of human proteins. We also demonstrate how to use CELL-E 2 to create hundreds of novel nuclear localization signals (NLS). Results and interactive demos are featured at https://bohuanglab.github.io/CELL-E_2/.
Collapse
Affiliation(s)
- Emaad Khwaja
- UC Berkeley - UCSF Joint Bioengineering Graduate Program
- Computer Science Division, UC Berkeley, CA 94720
| | - Yun S Song
- Department of Statistics, UC Berkeley, CA 94720
- Computer Science Division, UC Berkeley, CA 94720
| | - Aaron Agarunov
- Department of Pathology, Memorial Sloan Kettering Cancer Center, 10065
| | - Bo Huang
- Department of Pharmaceutical Chemistry, UCSF, San Francisco, CA 94143
- Department of Biochemistry and Biophysics, UCSF, San Francisco, CA 94143
- Chan Zuckerberg Biohub - San Francisco, San Francisco, CA 94158
| |
Collapse
|
11
|
Xu L, Kan S, Yu X, Liu Y, Fu Y, Peng Y, Liang Y, Cen Y, Zhu C, Jiang W. Deep learning enables stochastic optical reconstruction microscopy-like superresolution image reconstruction from conventional microscopy. iScience 2023; 26:108145. [PMID: 37867953 PMCID: PMC10587619 DOI: 10.1016/j.isci.2023.108145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/05/2023] [Accepted: 10/02/2023] [Indexed: 10/24/2023] Open
Abstract
Despite its remarkable potential for transforming low-resolution images, deep learning faces significant challenges in achieving high-quality superresolution microscopy imaging from wide-field (conventional) microscopy. Here, we present X-Microscopy, a computational tool comprising two deep learning subnets, UR-Net-8 and X-Net, which enables STORM-like superresolution microscopy image reconstruction from wide-field images with input-size flexibility. X-Microscopy was trained using samples of various subcellular structures, including cytoskeletal filaments, dot-like, beehive-like, and nanocluster-like structures, to generate prediction models capable of producing images of comparable quality to STORM-like images. In addition to enabling multicolour superresolution image reconstructions, X-Microscopy also facilitates superresolution image reconstruction from different conventional microscopic systems. The capabilities of X-Microscopy offer promising prospects for making superresolution microscopy accessible to a broader range of users, going beyond the confines of well-equipped laboratories.
Collapse
Affiliation(s)
- Lei Xu
- Department of Etiology and Carcinogenesis and State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China
- Key Laboratory of Molecular and Cellular Systems Biology, College of Life Sciences, Tianjin Normal University, Tianjin 300387, China
| | - Shichao Kan
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Xiying Yu
- Department of Etiology and Carcinogenesis and State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China
| | - Ye Liu
- HAMD (Ningbo) Intelligent Medical Technology Co., Ltd, Ningbo 315194, China
| | - Yuxia Fu
- Department of Etiology and Carcinogenesis and State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China
| | - Yiqiang Peng
- HAMD (Ningbo) Intelligent Medical Technology Co., Ltd, Ningbo 315194, China
| | - Yanhui Liang
- HAMD (Ningbo) Intelligent Medical Technology Co., Ltd, Ningbo 315194, China
| | - Yigang Cen
- Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China
| | - Changjun Zhu
- Key Laboratory of Molecular and Cellular Systems Biology, College of Life Sciences, Tianjin Normal University, Tianjin 300387, China
| | - Wei Jiang
- Department of Etiology and Carcinogenesis and State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China
| |
Collapse
|
12
|
Zou K, Wang S, Wang Z, Zou H, Yang F. Dual-Signal Feature Spaces Map Protein Subcellular Locations Based on Immunohistochemistry Image and Protein Sequence. SENSORS (BASEL, SWITZERLAND) 2023; 23:9014. [PMID: 38005402 PMCID: PMC10675401 DOI: 10.3390/s23229014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 10/29/2023] [Accepted: 11/01/2023] [Indexed: 11/26/2023]
Abstract
Protein is one of the primary biochemical macromolecular regulators in the compartmental cellular structure, and the subcellular locations of proteins can therefore provide information on the function of subcellular structures and physiological environments. Recently, data-driven systems have been developed to predict the subcellular location of proteins based on protein sequence, immunohistochemistry (IHC) images, or immunofluorescence (IF) images. However, the research on the fusion of multiple protein signals has received little attention. In this study, we developed a dual-signal computational protocol by incorporating IHC images into protein sequences to learn protein subcellular localization. Three major steps can be summarized as follows in this protocol: first, a benchmark database that includes 281 proteins sorted out from 4722 proteins of the Human Protein Atlas (HPA) and Swiss-Prot database, which is involved in the endoplasmic reticulum (ER), Golgi apparatus, cytosol, and nucleoplasm; second, discriminative feature operators were first employed to quantitate protein image-sequence samples that include IHC images and protein sequence; finally, the feature subspace of different protein signals is absorbed to construct multiple sub-classifiers via dimensionality reduction and binary relevance (BR), and multiple confidence derived from multiple sub-classifiers is adopted to decide subcellular location by the centralized voting mechanism at the decision layer. The experimental results indicated that the dual-signal model embedded IHC images and protein sequences outperformed the single-signal models with accuracy, precision, and recall of 75.41%, 80.38%, and 74.38%, respectively. It is enlightening for further research on protein subcellular location prediction under multi-signal fusion of protein.
Collapse
Affiliation(s)
- Kai Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330038, China
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Simeng Wang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330038, China
| | - Ziqian Wang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330038, China
| | - Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330038, China
| | - Fan Yang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang 330038, China
- Artificial Intelligence and Bioinformation Cognition Laboratory, Jiangxi Science and Technology Normal University, Nanchang 330038, China
| |
Collapse
|
13
|
Yue T, Wang Y, Zhang L, Gu C, Xue H, Wang W, Lyu Q, Dun Y. Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models. Int J Mol Sci 2023; 24:15858. [PMID: 37958843 PMCID: PMC10649223 DOI: 10.3390/ijms242115858] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 10/24/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open
Abstract
The data explosion driven by advancements in genomic research, such as high-throughput sequencing techniques, is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in various fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning, since we expect a superhuman intelligence that explores beyond our knowledge to interpret the genome from deep learning. A powerful deep learning model should rely on the insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with proper deep learning-based architecture, and we remark on practical considerations of developing deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research and point out current challenges and potential research directions for future genomics applications. We believe the collaborative use of ever-growing diverse data and the fast iteration of deep learning models will continue to contribute to the future of genomics.
Collapse
Affiliation(s)
- Tianwei Yue
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Yuanxin Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Longxiang Zhang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Chunming Gu
- Department of Biomedical Engineering, School of Medicine, Johns Hopkins University, Baltimore, MD 21218, USA;
| | - Haoru Xue
- The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA;
| | - Wenping Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Qi Lyu
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI 48824, USA;
| | - Yujie Dun
- School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an 710049, China;
| |
Collapse
|
14
|
Zhou S, Chen B, Fu ES, Yan H. Computer vision meets microfluidics: a label-free method for high-throughput cell analysis. MICROSYSTEMS & NANOENGINEERING 2023; 9:116. [PMID: 37744264 PMCID: PMC10511704 DOI: 10.1038/s41378-023-00562-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 03/21/2023] [Accepted: 04/10/2023] [Indexed: 09/26/2023]
Abstract
In this paper, we review the integration of microfluidic chips and computer vision, which has great potential to advance research in the life sciences and biology, particularly in the analysis of cell imaging data. Microfluidic chips enable the generation of large amounts of visual data at the single-cell level, while computer vision techniques can rapidly process and analyze these data to extract valuable information about cellular health and function. One of the key advantages of this integrative approach is that it allows for noninvasive and low-damage cellular characterization, which is important for studying delicate or fragile microbial cells. The use of microfluidic chips provides a highly controlled environment for cell growth and manipulation, minimizes experimental variability and improves the accuracy of data analysis. Computer vision can be used to recognize and analyze target species within heterogeneous microbial populations, which is important for understanding the physiological status of cells in complex biological systems. As hardware and artificial intelligence algorithms continue to improve, computer vision is expected to become an increasingly powerful tool for in situ cell analysis. The use of microelectromechanical devices in combination with microfluidic chips and computer vision could enable the development of label-free, automatic, low-cost, and fast cellular information recognition and the high-throughput analysis of cellular responses to different compounds, for broad applications in fields such as drug discovery, diagnostics, and personalized medicine.
Collapse
Affiliation(s)
- Shizheng Zhou
- State Key Laboratory of Marine Resource Utilization in South China Sea, Hainan University, Haikou, 570228 China
| | - Bingbing Chen
- State Key Laboratory of Marine Resource Utilization in South China Sea, Hainan University, Haikou, 570228 China
| | - Edgar S. Fu
- Graduate School of Computing and Information Science, University of Pittsburgh, Pittsburgh, PA 15260 USA
| | - Hong Yan
- State Key Laboratory of Marine Resource Utilization in South China Sea, Hainan University, Haikou, 570228 China
| |
Collapse
|
15
|
Zou K, Wang S, Wang Z, Zhang Z, Yang F. HAR_Locator: a novel protein subcellular location prediction model of immunohistochemistry images based on hybrid attention modules and residual units. Front Mol Biosci 2023; 10:1171429. [PMID: 37664182 PMCID: PMC10470064 DOI: 10.3389/fmolb.2023.1171429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 08/04/2023] [Indexed: 09/05/2023] Open
Abstract
Introduction: Proteins located in subcellular compartments have played an indispensable role in the physiological function of eukaryotic organisms. The pattern of protein subcellular localization is conducive to understanding the mechanism and function of proteins, contributing to investigating pathological changes of cells, and providing technical support for targeted drug research on human diseases. Automated systems based on featurization or representation learning and classifier design have attracted interest in predicting the subcellular location of proteins due to a considerable rise in proteins. However, large-scale, fine-grained protein microscopic images are prone to trapping and losing feature information in the general deep learning models, and the shallow features derived from statistical methods have weak supervision abilities. Methods: In this work, a novel model called HAR_Locator was developed to predict the subcellular location of proteins by concatenating multi-view abstract features and shallow features, whose advanced advantages are summarized in the following three protocols. Firstly, to get discriminative abstract feature information on protein subcellular location, an abstract feature extractor called HARnet based on Hybrid Attention modules and Residual units was proposed to relieve gradient dispersion and focus on protein-target regions. Secondly, it not only improves the supervision ability of image information but also enhances the generalization ability of the HAR_Locator through concatenating abstract features and shallow features. Finally, a multi-category multi-classifier decision system based on an Artificial Neural Network (ANN) was introduced to obtain the final output results of samples by fitting the most representative result from five subset predictors. Results: To evaluate the model, a collection of 6,778 immunohistochemistry (IHC) images from the Human Protein Atlas (HPA) database was used to present experimental results, and the accuracy, precision, and recall evaluation indicators were significantly increased to 84.73%, 84.77%, and 84.70%, respectively, compared with baseline predictors.
Collapse
Affiliation(s)
- Kai Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Simeng Wang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Ziqian Wang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Zhihai Zhang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Fan Yang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
- Artificial Intelligence and Bioinformation Cognition Laboratory, Jiangxi Science and Technology Normal University, Nanchang, China
| |
Collapse
|
16
|
Li J, Zou Q, Yuan L. A review from biological mapping to computation-based subcellular localization. MOLECULAR THERAPY. NUCLEIC ACIDS 2023; 32:507-521. [PMID: 37215152 PMCID: PMC10192651 DOI: 10.1016/j.omtn.2023.04.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Subcellular localization is crucial to the study of virus and diseases. Specifically, research on protein subcellular localization can help identify clues between virus and host cells that can aid in the design of targeted drugs. Research on RNA subcellular localization is significant for human diseases (such as Alzheimer's disease, colon cancer, etc.). To date, only reviews addressing subcellular localization of proteins have been published, which are outdated for reference, and reviews of RNA subcellular localization are not comprehensive. Therefore, we collated (the most up-to-date) literature on protein and RNA subcellular localization to help researchers understand changes in the field of protein and RNA subcellular localization. Extensive and complete methods for constructing subcellular localization models have also been summarized, which can help readers understand the changes in application of biotechnology and computer science in subcellular localization research and explore how to use biological data to construct improved subcellular localization models. This paper is the first review to cover both protein subcellular localization and RNA subcellular localization. We urge researchers from biology and computational biology to jointly pay attention to transformation patterns, interrelationships, differences, and causality of protein subcellular localization and RNA subcellular localization.
Collapse
Affiliation(s)
- Jing Li
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang 324000, China
- School of Biomedical Sciences, University of Hong Kong, Hong Kong, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang 324000, China
| | - Lei Yuan
- Department of Hepatobiliary Surgery, Quzhou People's Hospital, 100 Minjiang Main Road, Quzhou, Zhejiang 324000, China
| |
Collapse
|
17
|
Zhu XL, Bao LX, Xue MQ, Xu YY. Automatic recognition of protein subcellular location patterns in single cells from immunofluorescence images based on deep learning. Brief Bioinform 2023; 24:6964519. [PMID: 36577448 DOI: 10.1093/bib/bbac609] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 11/16/2022] [Accepted: 12/11/2022] [Indexed: 12/30/2022] Open
Abstract
With the improvement of single-cell measurement techniques, there is a growing awareness that individual differences exist among cells, and protein expression distribution can vary across cells in the same tissue or cell line. Pinpointing the protein subcellular locations in single cells is crucial for mapping functional specificity of proteins and studying related diseases. Currently, research about single-cell protein location is still in its infancy, and most studies and databases do not annotate proteins at the cell level. For example, in the human protein atlas database, an immunofluorescence image stained for a particular protein shows multiple cells, but the subcellular location annotation is for the whole image, ignoring intercellular difference. In this study, we used large-scale immunofluorescence images and image-level subcellular locations to develop a deep-learning-based pipeline that could accurately recognize protein localizations in single cells. The pipeline consisted of two deep learning models, i.e. an image-based model and a cell-based model. The former used a multi-instance learning framework to comprehensively model protein distribution in multiple cells in each image, and could give both image-level and cell-level predictions. The latter firstly used clustering and heuristics algorithms to assign pseudo-labels of subcellular locations to the segmented cell images, and then used the pseudo-labels to train a classification model. Finally, the image-based model was fused with the cell-based model at the decision level to obtain the final ensemble model for single-cell prediction. Our experimental results showed that the ensemble model could achieve higher accuracy and robustness on independent test sets than state-of-the-art methods.
Collapse
Affiliation(s)
- Xi-Liang Zhu
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Lin-Xia Bao
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Min-Qi Xue
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Ying-Ying Xu
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
18
|
Mou M, Pan Z, Lu M, Sun H, Wang Y, Luo Y, Zhu F. Application of Machine Learning in Spatial Proteomics. J Chem Inf Model 2022; 62:5875-5895. [PMID: 36378082 DOI: 10.1021/acs.jcim.2c01161] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Spatial proteomics is an interdisciplinary field that investigates the localization and dynamics of proteins, and it has gained extensive attention in recent years, especially the subcellular proteomics. Numerous evidence indicate that the subcellular localization of proteins is associated with various cellular processes and disease progression. Mass spectrometry (MS)-based and imaging-based experimental approaches have been developed to acquire large-scale spatial proteomic data. To allow the reliable analysis of increasingly complex spatial proteomics data, machine learning (ML) methods have been widely used in both MS-based and imaging-based spatial proteomic data analysis pipelines. Here, we comprehensively survey the applications of ML in spatial proteomics from following aspects: (1) data resources for spatial proteome are comprehensively introduced; (2) the roles of different ML algorithms in data analysis pipelines are elaborated; (3) successful applications of spatial proteomics and several analytical tools integrating ML methods are presented; (4) challenges existing in modern ML-based spatial proteomics studies are discussed. This review provides guidelines for researchers seeking to apply ML methods to analyze spatial proteomic data and can facilitate insightful understanding of cell biology as well as the future research in medical and drug discovery communities.
Collapse
Affiliation(s)
- Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Huaicheng Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
19
|
Khan AI, Kim MJ, Dutta P. Fine-tuning-based Transfer Learning for Characterization of Adeno-Associated Virus. JOURNAL OF SIGNAL PROCESSING SYSTEMS 2022; 94:1515-1529. [PMID: 36742147 PMCID: PMC9897492 DOI: 10.1007/s11265-022-01758-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 04/01/2022] [Indexed: 06/18/2023]
Abstract
Accurate and precise identification of adeno-associated virus (AAV) vectors play an important role in dose-dependent gene therapy. Although solid-state nanopore techniques can potentially be used to characterize AAV vectors by capturing ionic current, the existing data analysis techniques fall short of identifying them from their ionic current profiles. Recently introduced machine learning methods such as deep convolutional neural network (CNN), developed for image identification tasks, can be applied for such classification. However, with smaller data set for the problem in hand, it is not possible to train a deep neural network from scratch for accurate classification of AAV vectors. To circumvent this, we applied a pre-trained deep CNN (GoogleNet) model to capture the basic features from ionic current signals and subsequently used fine-tuning-based transfer learning to classify AAV vectors. The proposed method is very generic as it requires minimal preprocessing and does not require any handcrafted features. Our results indicate that fine-tuning-based transfer learning can achieve an average classification accuracy between 90 and 99% in three realizations with a very small standard deviation. Results also indicate that the classification accuracy depends on the applied electric field (across nanopore) and the time frame used for data segmentation. We also found that the fine-tuning of the deep network outperforms feature extraction-based classification for the resistive pulse dataset. To expand the usefulness of the fine-tuning-based transfer learning, we have tested two other pre-trained deep networks (ResNet50 and InceptionV3) for the classification of AAVs. Overall, the fine-tuning-based transfer learning from pre-trained deep networks is very effective for classification, though deep networks such as ResNet50 and InceptionV3 take significantly longer training time than GoogleNet.
Collapse
Affiliation(s)
- Aminul Islam Khan
- School of Mechanical and Materials Engineering, Washington State University, Pullman, WA, 99164, USA
| | - Min Jun Kim
- Department of Mechanical Engineering, Southern Methodist University, Dallas, TX, 75275, USA
| | - Prashanta Dutta
- School of Mechanical and Materials Engineering, Washington State University, Pullman, WA, 99164, USA
| |
Collapse
|
20
|
Hardo G, Noka M, Bakshi S. Synthetic Micrographs of Bacteria (SyMBac) allows accurate segmentation of bacterial cells using deep neural networks. BMC Biol 2022; 20:263. [PMID: 36447211 PMCID: PMC9710168 DOI: 10.1186/s12915-022-01453-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 10/31/2022] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Deep-learning-based image segmentation models are required for accurate processing of high-throughput timelapse imaging data of bacterial cells. However, the performance of any such model strictly depends on the quality and quantity of training data, which is difficult to generate for bacterial cell images. Here, we present a novel method of bacterial image segmentation using machine learning models trained with Synthetic Micrographs of Bacteria (SyMBac). RESULTS We have developed SyMBac, a tool that allows for rapid, automatic creation of arbitrary amounts of training data, combining detailed models of cell growth, physical interactions, and microscope optics to create synthetic images which closely resemble real micrographs, and is capable of training accurate image segmentation models. The major advantages of our approach are as follows: (1) synthetic training data can be generated virtually instantly and on demand; (2) these synthetic images are accompanied by perfect ground truth positions of cells, meaning no data curation is required; (3) different biological conditions, imaging platforms, and imaging modalities can be rapidly simulated, meaning any change in one's experimental setup no longer requires the laborious process of manually generating new training data for each change. Deep-learning models trained with SyMBac data are capable of analysing data from various imaging platforms and are robust to drastic changes in cell size and morphology. Our benchmarking results demonstrate that models trained on SyMBac data generate more accurate cell identifications and precise cell masks than those trained on human-annotated data, because the model learns the true position of the cell irrespective of imaging artefacts. We illustrate the approach by analysing the growth and size regulation of bacterial cells during entry and exit from dormancy, which revealed novel insights about the physiological dynamics of cells under various growth conditions. CONCLUSIONS The SyMBac approach will help to adapt and improve the performance of deep-learning-based image segmentation models for accurate processing of high-throughput timelapse image data.
Collapse
Affiliation(s)
- Georgeos Hardo
- Department of Engineering, University of Cambridge, Trumpington Street, Cambridge, UK
| | - Maximilian Noka
- Department of Engineering, University of Cambridge, Trumpington Street, Cambridge, UK
| | - Somenath Bakshi
- Department of Engineering, University of Cambridge, Trumpington Street, Cambridge, UK
| |
Collapse
|
21
|
Severin Y, Hale BD, Mena J, Goslings D, Frey BM, Snijder B. Multiplexed high-throughput immune cell imaging reveals molecular health-associated phenotypes. SCIENCE ADVANCES 2022; 8:eabn5631. [PMID: 36322666 PMCID: PMC9629716 DOI: 10.1126/sciadv.abn5631] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Phenotypic plasticity is essential to the immune system, yet the factors that shape it are not fully understood. Here, we comprehensively analyze immune cell phenotypes including morphology across human cohorts by single-round multiplexed immunofluorescence, automated microscopy, and deep learning. Using the uncertainty of convolutional neural networks to cluster the phenotypes of eight distinct immune cell subsets, we find that the resulting maps are influenced by donor age, gender, and blood pressure, revealing distinct polarization and activation-associated phenotypes across immune cell classes. We further associate T cell morphology to transcriptional state based on their joint donor variability and validate an inflammation-associated polarized T cell morphology and an age-associated loss of mitochondria in CD4+ T cells. Together, we show that immune cell phenotypes reflect both molecular and personal health information, opening new perspectives into the deep immune phenotyping of individual people in health and disease.
Collapse
Affiliation(s)
- Yannik Severin
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, 8049 Zürich, Switzerland
| | - Benjamin D. Hale
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, 8049 Zürich, Switzerland
| | - Julien Mena
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, 8049 Zürich, Switzerland
| | - David Goslings
- Blood Transfusion Service Zürich, SRC, 8952 Schlieren, Switzerland
| | - Beat M. Frey
- Blood Transfusion Service Zürich, SRC, 8952 Schlieren, Switzerland
| | - Berend Snijder
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, 8049 Zürich, Switzerland
- Corresponding author.
| |
Collapse
|
22
|
Imaging and analysis for simultaneous tracking of fluorescent biosensors in barcoded cells. STAR Protoc 2022; 3:101611. [PMID: 36042884 PMCID: PMC9420398 DOI: 10.1016/j.xpro.2022.101611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
We recently developed a biosensor barcoding approach for highly multiplexed tracking of molecular activities in live cells. In this protocol, we detail the labeling of cells expressing different genetically encoded fluorescent biosensors with a pair of barcoding proteins and parallel imaging. Signals from cells with the same barcodes are then pooled together to obtain the dynamics of the corresponding biosensor activity. We describe the steps involved in cell barcoding, image acquisition, and analysis by deep learning models. For complete details on the use and execution of this protocol, please refer to Yang et al. (2021). Cells expressing different biosensors can be barcoded for simultaneous imaging Spectral imaging is used to resolve multiple red barcoding proteins Deep learning models accelerate barcode reading during image analysis
Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.
Collapse
|
23
|
Tirinato L, Onesto V, Garcia-Calderon D, Pagliari F, Spadea MF, Seco J, Gentile F. Human lung-cancer-cell radioresistance investigated through 2D network topology. Sci Rep 2022; 12:12980. [PMID: 35902618 PMCID: PMC9334295 DOI: 10.1038/s41598-022-17018-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 07/19/2022] [Indexed: 11/22/2022] Open
Abstract
Radiation therapy (RT) is now considered to be a main component of cancer therapy, alongside surgery, chemotherapy and monoclonal antibody-based immunotherapy. In RT, cancer tissues are exposed to ionizing radiation causing the death of malignant cells and favoring cancer regression. However, the efficiency of RT may be hampered by cell-radioresistance (RR)—that is a feature of tumor cells of withstanding RT. To improve the RT performance, it is decisive developing methods that can help to quantify cell sensitivity to radiation. In acknowledgment of the fact that none of the existing methods to assess RR are based on cell graphs topology, in this work we have examined how 2D cell networks, within a single colony, from different human lung cancer lines (H460, A549 and Calu-1) behave in response to doses of ionizing radiation ranging from 0 to 8 Gy. We measured the structure of resulting cell-graphs using well-assessed networks-analysis metrics, such as the clustering coefficient (cc), the characteristic path length (cpl), and the small world coefficient (SW). Findings of the work illustrate that the clustering characteristics of cell-networks show a marked sensitivity to the dose and cell line. Higher-than-one values of SW coefficient, clue of a discontinuous and inhomogeneous cell spatial layout, are associated to elevated levels of radiation and to a lower radio-resistance of the treated cell line. Results of the work suggest that topology could be used as a quantitative parameter to assess the cell radio-resistance and measure the performance of cancer radiotherapy.
Collapse
Affiliation(s)
- Luca Tirinato
- Department of Experimental and Clinical Medicine, Nanotechnology Research Center, University of Magna Graecia, 88100, Catanzaro, Italy.,Division of Biomedical Physics in Radiation Oncology, DKFZ - German Cancer Research Center, Heidelberg, Germany
| | - Valentina Onesto
- Department of Experimental and Clinical Medicine, Nanotechnology Research Center, University of Magna Graecia, 88100, Catanzaro, Italy
| | - Daniel Garcia-Calderon
- Division of Biomedical Physics in Radiation Oncology, DKFZ - German Cancer Research Center, Heidelberg, Germany.,Department of Physics and Astronomy, Heidelberg University, Heidelberg, Germany
| | - Francesca Pagliari
- Division of Biomedical Physics in Radiation Oncology, DKFZ - German Cancer Research Center, Heidelberg, Germany
| | - Maria-Francesca Spadea
- Department of Experimental and Clinical Medicine, University of Magna Graecia, 88100, Catanzaro, Italy
| | - Joao Seco
- Division of Biomedical Physics in Radiation Oncology, DKFZ - German Cancer Research Center, Heidelberg, Germany. .,Department of Physics and Astronomy, Heidelberg University, Heidelberg, Germany.
| | - Francesco Gentile
- Department of Experimental and Clinical Medicine, Nanotechnology Research Center, University of Magna Graecia, 88100, Catanzaro, Italy.
| |
Collapse
|
24
|
Cuny AP, Schlottmann FP, Ewald JC, Pelet S, Schmoller KM. Live cell microscopy: From image to insight. BIOPHYSICS REVIEWS 2022; 3:021302. [PMID: 38505412 PMCID: PMC10903399 DOI: 10.1063/5.0082799] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 03/18/2022] [Indexed: 03/21/2024]
Abstract
Live-cell microscopy is a powerful tool that can reveal cellular behavior as well as the underlying molecular processes. A key advantage of microscopy is that by visualizing biological processes, it can provide direct insights. Nevertheless, live-cell imaging can be technically challenging and prone to artifacts. For a successful experiment, many careful decisions are required at all steps from hardware selection to downstream image analysis. Facing these questions can be particularly intimidating due to the requirement for expertise in multiple disciplines, ranging from optics, biophysics, and programming to cell biology. In this review, we aim to summarize the key points that need to be considered when setting up and analyzing a live-cell imaging experiment. While we put a particular focus on yeast, many of the concepts discussed are applicable also to other organisms. In addition, we discuss reporting and data sharing strategies that we think are critical to improve reproducibility in the field.
Collapse
Affiliation(s)
| | - Fabian P. Schlottmann
- Interfaculty Institute of Cell Biology, University of Tuebingen, 72076 Tuebingen, Germany
| | - Jennifer C. Ewald
- Interfaculty Institute of Cell Biology, University of Tuebingen, 72076 Tuebingen, Germany
| | - Serge Pelet
- Department of Fundamental Microbiology, University of Lausanne, 1015 Lausanne, Switzerland
| | | |
Collapse
|
25
|
Nakai K, Wei L. Recent Advances in the Prediction of Subcellular Localization of Proteins and Related Topics. FRONTIERS IN BIOINFORMATICS 2022; 2:910531. [PMID: 36304291 PMCID: PMC9580943 DOI: 10.3389/fbinf.2022.910531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 04/25/2022] [Indexed: 11/13/2022] Open
Abstract
Prediction of subcellular localization of proteins from their amino acid sequences has a long history in bioinformatics and is still actively developing, incorporating the latest advances in machine learning and proteomics. Notably, deep learning-based methods for natural language processing have made great contributions. Here, we review recent advances in the field as well as its related fields, such as subcellular proteomics and the prediction/recognition of subcellular localization from image data.
Collapse
Affiliation(s)
- Kenta Nakai
- Institute of Medical Science, The University of Tokyo, Minato-Ku, Japan
- *Correspondence: Kenta Nakai,
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China
| |
Collapse
|
26
|
Phenomics approaches to understand genetic networks and gene function in yeast. Biochem Soc Trans 2022; 50:713-721. [PMID: 35285506 PMCID: PMC9162466 DOI: 10.1042/bst20210285] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 02/14/2022] [Accepted: 02/18/2022] [Indexed: 01/03/2023]
Abstract
Over the past decade, major efforts have been made to systematically survey the characteristics or phenotypes associated with genetic variation in a variety of model systems. These so-called phenomics projects involve the measurement of 'phenomes', or the set of phenotypic information that describes an organism or cell, in various genetic contexts or states, and in response to external factors, such as environmental signals. Our understanding of the phenome of an organism depends on the availability of reagents that enable systematic evaluation of the spectrum of possible phenotypic variation and the types of measurements that can be taken. Here, we highlight phenomics studies that use the budding yeast, a pioneer model organism for functional genomics research. We focus on genetic perturbation screens designed to explore genetic interactions, using a variety of phenotypic read-outs, from cell growth to subcellular morphology.
Collapse
|
27
|
Guo Y, Shen D, Zhou Y, Yang Y, Liang J, Zhou Y, Li N, Liu Y, Yang G, Li W. Deep Learning-Based Morphological Classification of Endoplasmic Reticulum Under Stress. Front Cell Dev Biol 2022; 9:767866. [PMID: 35223863 PMCID: PMC8865080 DOI: 10.3389/fcell.2021.767866] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 12/31/2021] [Indexed: 12/28/2022] Open
Abstract
Endoplasmic reticulum stress (ER stress) is a condition that is defined by abnormal accumulation of unfolded proteins. It plays an important role in maintaining cellular protein, lipid, and ion homeostasis. By triggering the unfolded protein response (UPR) under ER stress, cells restore homeostasis or undergo apoptosis. Chronic ER stress is implicated in many human diseases. Despite extensive studies on related signaling mechanisms, reliable image biomarkers for ER stress remain lacking. To address this deficiency, we have validated a morphological image biomarker for ER stress and have developed a deep learning-based assay to enable automated detection and analysis of this marker for screening studies. Specifically, ER under stress exhibits abnormal morphological patterns that feature ring-shaped structures called whorls (WHs). Using a highly specific chemical probe for unfolded and aggregated proteins, we find that formation of ER whorls is specifically associated with the accumulation of the unfolded and aggregated proteins. This confirms that ER whorls can be used as an image biomarker for ER stress. To this end, we have developed ER-WHs-Analyzer, a deep learning-based image analysis assay that automatically recognizes and localizes ER whorls similarly as human experts. It does not require laborious manual annotation of ER whorls for training of deep learning models. Importantly, it reliably classifies different patterns of ER whorls induced by different ER stress drugs. Overall, our study provides mechanistic insights into morphological patterns of ER under stress as well as an image biomarker assay for screening studies to dissect related disease mechanisms and to accelerate related drug discoveries. It demonstrates the effectiveness of deep learning in recognizing and understanding complex morphological phenotypes of ER.
Collapse
Affiliation(s)
- Yuanhao Guo
- Laboratory of Computational Biology and Machine Intelligence, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Di Shen
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, China
| | - Yanfeng Zhou
- Laboratory of Computational Biology and Machine Intelligence, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Yutong Yang
- Laboratory of Computational Biology and Machine Intelligence, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Jinzhao Liang
- Laboratory of Computational Biology and Machine Intelligence, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Yating Zhou
- Laboratory of Computational Biology and Machine Intelligence, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Ningning Li
- Tomas Lindahl Laboratory, The Seventh Affiliated Hospital, Sun Yat-Sen University, Shenzhen, China
| | - Yu Liu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, China
| | - Ge Yang
- Laboratory of Computational Biology and Machine Intelligence, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
- *Correspondence: Ge Yang, ; Wenjing Li,
| | - Wenjing Li
- Laboratory of Computational Biology and Machine Intelligence, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
- *Correspondence: Ge Yang, ; Wenjing Li,
| |
Collapse
|
28
|
Watson ER, Taherian Fard A, Mar JC. Computational Methods for Single-Cell Imaging and Omics Data Integration. Front Mol Biosci 2022; 8:768106. [PMID: 35111809 PMCID: PMC8801747 DOI: 10.3389/fmolb.2021.768106] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 11/29/2021] [Indexed: 12/12/2022] Open
Abstract
Integrating single cell omics and single cell imaging allows for a more effective characterisation of the underlying mechanisms that drive a phenotype at the tissue level, creating a comprehensive profile at the cellular level. Although the use of imaging data is well established in biomedical research, its primary application has been to observe phenotypes at the tissue or organ level, often using medical imaging techniques such as MRI, CT, and PET. These imaging technologies complement omics-based data in biomedical research because they are helpful for identifying associations between genotype and phenotype, along with functional changes occurring at the tissue level. Single cell imaging can act as an intermediary between these levels. Meanwhile new technologies continue to arrive that can be used to interrogate the genome of single cells and its related omics datasets. As these two areas, single cell imaging and single cell omics, each advance independently with the development of novel techniques, the opportunity to integrate these data types becomes more and more attractive. This review outlines some of the technologies and methods currently available for generating, processing, and analysing single-cell omics- and imaging data, and how they could be integrated to further our understanding of complex biological phenomena like ageing. We include an emphasis on machine learning algorithms because of their ability to identify complex patterns in large multidimensional data.
Collapse
Affiliation(s)
| | - Atefeh Taherian Fard
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| | - Jessica Cara Mar
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
29
|
Wang G, Xue MQ, Shen HB, Xu YY. Learning protein subcellular localization multi-view patterns from heterogeneous data of imaging, sequence and networks. Brief Bioinform 2022; 23:6499983. [PMID: 35018423 DOI: 10.1093/bib/bbab539] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 11/03/2021] [Accepted: 11/20/2021] [Indexed: 11/13/2022] Open
Abstract
Location proteomics seeks to provide automated high-resolution descriptions of protein location patterns within cells. Many efforts have been undertaken in location proteomics over the past decades, thereby producing plenty of automated predictors for protein subcellular localization. However, most of these predictors are trained solely from high-throughput microscopic images or protein amino acid sequences alone. Unifying heterogeneous protein data sources has yet to be exploited. In this paper, we present a pipeline called sequence, image, network-based protein subcellular locator (SIN-Locator) that constructs a multi-view description of proteins by integrating multiple data types including images of protein expression in cells or tissues, amino acid sequences and protein-protein interaction networks, to classify the patterns of protein subcellular locations. Proteins were encoded by both handcrafted features and deep learning features, and multiple combining methods were implemented. Our experimental results indicated that optimal integrations can considerately enhance the classification accuracy, and the utility of SIN-Locator has been demonstrated through applying to new released proteins in the human protein atlas. Furthermore, we also investigate the contribution of different data sources and influence of partial absence of data. This work is anticipated to provide clues for reconciliation and combination of multi-source data for protein location analysis.
Collapse
Affiliation(s)
- Ge Wang
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Min-Qi Xue
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China.,School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ying-Ying Xu
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
30
|
Götz T, Göb S, Sawant S, Erick X, Wittenberg T, Schmidkonz C, Tomé A, Lang E, Ramming A. Number of necessary training examples for Neural Networks with different number of trainable parameters. J Pathol Inform 2022; 13:100114. [PMID: 36268092 PMCID: PMC9577052 DOI: 10.1016/j.jpi.2022.100114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/12/2021] [Indexed: 11/03/2022] Open
Abstract
In this work, the network complexity should be reduced with a concomitant reduction in the number of necessary training examples. The focus thus was on the dependence of proper evaluation metrics on the number of adjustable parameters of the considered deep neural network. The used data set encompassed Hematoxylin and Eosin (H&E) colored cell images provided by various clinics. We used a deep convolutional neural network to get the relation between a model’s complexity, its concomitant set of parameters, and the size of the training sample necessary to achieve a certain classification accuracy. The complexity of the deep neural networks was reduced by pruning a certain amount of filters in the network. As expected, the unpruned neural network showed best performance. The network with the highest number of trainable parameter achieved, within the estimated standard error of the optimized cross-entropy loss, best results up to 30% pruning. Strongly pruned networks are highly viable and the classification accuracy declines quickly with decreasing number of training patterns. However, up to a pruning ratio of 40%, we found a comparable performance of pruned and unpruned deep convolutional neural networks (DCNN) and densely connected convolutional networks (DCCN).
Collapse
|
31
|
Liao Z, Pan G, Sun C, Tang J. Predicting subcellular location of protein with evolution information and sequence-based deep learning. BMC Bioinformatics 2021; 22:515. [PMID: 34686152 PMCID: PMC8539821 DOI: 10.1186/s12859-021-04404-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 09/24/2021] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Protein subcellular localization prediction plays an important role in biology research. Since traditional methods are laborious and time-consuming, many machine learning-based prediction methods have been proposed. However, most of the proposed methods ignore the evolution information of proteins. In order to improve the prediction accuracy, we present a deep learning-based method to predict protein subcellular locations. RESULTS Our method utilizes not only amino acid compositions sequence but also evolution matrices of proteins. Our method uses a bidirectional long short-term memory network that processes the entire protein sequence and a convolutional neural network that extracts features from protein sequences. The position specific scoring matrix is used as a supplement to protein sequences. Our method was trained and tested on two benchmark datasets. The experiment results show that our method yields accurate results on the two datasets with an average precision of 0.7901, ranking loss of 0.0758 and coverage of 1.2848. CONCLUSION The experiment results show that our method outperforms five methods currently available. According to those experiments, we can see that our method is an acceptable alternative to predict protein subcellular location.
Collapse
Affiliation(s)
- Zhijun Liao
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, 1 Xuefu North Road, University Town, Fuzhou, 350122 FJ China
- Department of Computer Science and Engineering, University of South Carolina, 550 Assembly St, Columbia, SC 29208 USA
| | - Gaofeng Pan
- Department of Computer Science and Engineering, University of South Carolina, 550 Assembly St, Columbia, SC 29208 USA
| | - Chao Sun
- Department of Computer Science and Engineering, University of South Carolina, 550 Assembly St, Columbia, SC 29208 USA
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, 550 Assembly St, Columbia, SC 29208 USA
- College of Electrical and Power Engineering, Taiyuan University of Technology, No. 79 Yinze West Street, Taiyuan, 030024 SX China
| |
Collapse
|
32
|
Hu JX, Yang Y, Xu YY, Shen HB. Incorporating label correlations into deep neural networks to classify protein subcellular location patterns in immunohistochemistry images. Proteins 2021; 90:493-503. [PMID: 34546597 DOI: 10.1002/prot.26244] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 03/16/2021] [Accepted: 09/13/2021] [Indexed: 12/17/2022]
Abstract
Analysis of protein subcellular localization is a critical part of proteomics. In recent years, as both the number and quality of microscopic images are increasing rapidly, many automated methods, especially convolutional neural networks (CNN), have been developed to predict protein subcellular location(s) based on bioimages, but their performance always suffers from some inherent properties of the problem. First, many microscopic images have non-informative or noisy sections, like unstained stroma and unspecific background, which affect the extraction of protein expression information. Second, the patterns of protein subcellular localization are very complex, as a lot of proteins locate in more than one compartment. In this study, we propose a new label-correlation enhanced deep neural network, laceDNN, to classify the subcellular locations of multi-label proteins from immunohistochemistry images. The model uses small representative patches as input to alleviate the image noise issue, and its backbone is a hybrid architecture of CNN and recurrent neural network, where the former network extracts representative image features and the latter learns the organelle dependency relationships. Our experimental results indicate that the proposed model can improve the performance of multi-label protein subcellular classification.
Collapse
Affiliation(s)
- Jin-Xian Hu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| | - Yang Yang
- Department of Computer Science and Engineering, Center for Brain-Like Computing and Machine Intelligence, Shanghai Jiao Tong University, Shanghai, China
| | - Ying-Ying Xu
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| |
Collapse
|
33
|
Shigene K, Hiasa Y, Otake Y, Soufi M, Janewanthanakul S, Nishimura T, Sato Y, Suetsugu S. Translation of Cellular Protein Localization Using Convolutional Networks. Front Cell Dev Biol 2021; 9:635231. [PMID: 34422790 PMCID: PMC8375474 DOI: 10.3389/fcell.2021.635231] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 07/15/2021] [Indexed: 12/15/2022] Open
Abstract
Protein localization in cells has been analyzed by fluorescent labeling using indirect immunofluorescence and fluorescent protein tagging. However, the relationships between the localization of different proteins had not been analyzed using artificial intelligence. Here, we applied convolutional networks for the prediction of localization of the cytoskeletal proteins from the localization of the other proteins. Lamellipodia are one of the actin-dependent subcellular structures involved in cell migration and are mainly generated by the Wiskott-Aldrich syndrome protein (WASP)-family verprolin homologous protein 2 (WAVE2) and the membrane remodeling I-BAR domain protein IRSp53. Focal adhesion is another actin-based structure that contains vinculin protein and promotes lamellipodia formation and cell migration. In contrast, microtubules are not directly related to actin filaments. The convolutional network was trained using images of actin filaments paired with WAVE2, IRSp53, vinculin, and microtubules. The generated images of WAVE2, IRSp53, and vinculin were highly similar to their real images. In contrast, the microtubule images generated from actin filament images were inferior without the generation of filamentous structures, suggesting that microscopic images of actin filaments provide more information about actin-related protein localization. Collectively, this study suggests that image translation by the convolutional network can predict the localization of functionally related proteins, and the convolutional network might be used to describe the relationships between the proteins by their localization.
Collapse
Affiliation(s)
- Kei Shigene
- Division of Biological Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Yuta Hiasa
- Division of Information Science, Nara Institute of Science and Technology, Ikoma, Japan
| | - Yoshito Otake
- Division of Information Science, Nara Institute of Science and Technology, Ikoma, Japan
| | - Mazen Soufi
- Division of Information Science, Nara Institute of Science and Technology, Ikoma, Japan
| | - Suphamon Janewanthanakul
- Division of Biological Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Tamako Nishimura
- Division of Biological Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan
| | - Yoshinobu Sato
- Division of Information Science, Nara Institute of Science and Technology, Ikoma, Japan.,Data Science Center, Nara Institute of Science and Technology, Ikoma, Japan
| | - Shiro Suetsugu
- Division of Biological Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan.,Data Science Center, Nara Institute of Science and Technology, Ikoma, Japan.,Center for Digital Green-Innovation, Nara Institute of Science and Technology, Ikoma, Japan
| |
Collapse
|
34
|
Chen Y, Liang D, Bai X, Xu Y, Yang X. Cell Localization and Counting Using Direction Field Map. IEEE J Biomed Health Inform 2021; 26:359-368. [PMID: 34406952 DOI: 10.1109/jbhi.2021.3105545] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Automatic cell counting in pathology images is challenging due to blurred boundaries, low-contrast, and overlapping between cells. In this paper, we train a convolutional neural network (CNN) to predict a two-dimensional direction field map and then use it to localize cell individuals for counting. Specifically, we define a direction field on each pixel in the cell regions (obtained by dilating the original annotation in terms of cell centers) as a two-dimensional unit vector pointing from the pixel to its corresponding cell center. Direction field for adjacent pixels in different cells have opposite directions departing from each other, while those in the same cell region have directions pointing to the same center. Such unique property is used to partition overlapped cells for localization and counting. To deal with those blurred boundaries or low contrast cells, we set the direction field of the background pixels to be zeros in the ground-truth generation. Thus, adjacent pixels belonging to cells and background will have an obvious difference in the predicted direction field. To further deal with cells of varying density and overlapping issues, we adopt geometry adaptive (varying) radius for cells of different densities in the generation of ground-truth direction field map, which guides the CNN model to separate cells of different densities and overlapping cells. Extensive experimental results on three widely used datasets (i.e., Cell, CRCHistoPhenotype2016, and MBM datasets) demonstrate the effectiveness of the proposed approach.
Collapse
|
35
|
Fisch D, Evans R, Clough B, Byrne SK, Channell WM, Dockterman J, Frickel EM. HRMAn 2.0: Next-generation artificial intelligence-driven analysis for broad host-pathogen interactions. Cell Microbiol 2021; 23:e13349. [PMID: 33930228 DOI: 10.1111/cmi.13349] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 04/21/2021] [Accepted: 04/26/2021] [Indexed: 12/15/2022]
Abstract
To study the dynamics of infection processes, it is common to manually enumerate imaging-based infection assays. However, manual counting of events from imaging data is biased, error-prone and a laborious task. We recently presented HRMAn (Host Response to Microbe Analysis), an automated image analysis program using state-of-the-art machine learning and artificial intelligence algorithms to analyse pathogen growth and host defence behaviour. With HRMAn, we can quantify intracellular infection by pathogens such as Toxoplasma gondii and Salmonella in a variety of cell types in an unbiased and highly reproducible manner, measuring multiple parameters including pathogen growth, pathogen killing and activation of host cell defences. Since HRMAn is based on the KNIME Analytics platform, it can easily be adapted to work with other pathogens and produce more readouts from quantitative imaging data. Here we showcase improvements to HRMAn resulting in the release of HRMAn 2.0 and new applications of HRMAn 2.0 for the analysis of host-pathogen interactions using the established pathogen T. gondii and further extend it for use with the bacterial pathogen Chlamydia trachomatis and the fungal pathogen Cryptococcus neoformans.
Collapse
Affiliation(s)
- Daniel Fisch
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Edgbaston, UK
- Host-Toxoplasma Interaction Laboratory, The Francis Crick Institute, London, UK
| | - Robert Evans
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Edgbaston, UK
- Host-Toxoplasma Interaction Laboratory, The Francis Crick Institute, London, UK
| | - Barbara Clough
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Edgbaston, UK
| | - Sophie K Byrne
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Edgbaston, UK
| | - Will M Channell
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Edgbaston, UK
| | - Jacob Dockterman
- Department of Immunology, Duke University Medical Center, Durham, North Carolina, USA
| | - Eva-Maria Frickel
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Edgbaston, UK
| |
Collapse
|
36
|
Mattiazzi Usaj M, Yeung CHL, Friesen H, Boone C, Andrews BJ. Single-cell image analysis to explore cell-to-cell heterogeneity in isogenic populations. Cell Syst 2021; 12:608-621. [PMID: 34139168 PMCID: PMC9112900 DOI: 10.1016/j.cels.2021.05.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 04/26/2021] [Accepted: 05/12/2021] [Indexed: 12/26/2022]
Abstract
Single-cell image analysis provides a powerful approach for studying cell-to-cell heterogeneity, which is an important attribute of isogenic cell populations, from microbial cultures to individual cells in multicellular organisms. This phenotypic variability must be explained at a mechanistic level if biologists are to fully understand cellular function and address the genotype-to-phenotype relationship. Variability in single-cell phenotypes is obscured by bulk readouts or averaging of phenotypes from individual cells in a sample; thus, single-cell image analysis enables a higher resolution view of cellular function. Here, we consider examples of both small- and large-scale studies carried out with isogenic cell populations assessed by fluorescence microscopy, and we illustrate the advantages, challenges, and the promise of quantitative single-cell image analysis.
Collapse
Affiliation(s)
- Mojca Mattiazzi Usaj
- Department of Chemistry and Biology, Ryerson University, Toronto, ON M5B 2K3, Canada
| | - Clarence Hue Lok Yeung
- The Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Helena Friesen
- The Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Charles Boone
- The Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada; RIKEN Centre for Sustainable Resource Science, Wako, Saitama 351-0198, Japan
| | - Brenda J Andrews
- The Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada.
| |
Collapse
|
37
|
Fishman D, Salumaa SO, Majoral D, Laasfeld T, Peel S, Wildenhain J, Schreiner A, Palo K, Parts L. Practical segmentation of nuclei in brightfield cell images with neural networks trained on fluorescently labelled samples. J Microsc 2021; 284:12-24. [PMID: 34081320 DOI: 10.1111/jmi.13038] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 05/27/2021] [Accepted: 05/27/2021] [Indexed: 11/28/2022]
Abstract
Identifying nuclei is a standard first step when analysing cells in microscopy images. The traditional approach relies on signal from a DNA stain, or fluorescent transgene expression localised to the nucleus. However, imaging techniques that do not use fluorescence can also carry useful information. Here, we used brightfield and fluorescence images of fixed cells with fluorescently labelled DNA, and confirmed that three convolutional neural network architectures can be adapted to segment nuclei from the brightfield channel, relying on fluorescence signal to extract the ground truth for training. We found that U-Net achieved the best overall performance, Mask R-CNN provided an additional benefit of instance segmentation, and that DeepCell proved too slow for practical application. We trained the U-Net architecture on over 200 dataset variations, established that accurate segmentation is possible using as few as 16 training images, and that models trained on images from similar cell lines can extrapolate well. Acquiring data from multiple focal planes further helps distinguish nuclei in the samples. Overall, our work helps to liberate a fluorescence channel reserved for nuclear staining, thus providing more information from the specimen, and reducing reagents and time required for preparing imaging experiments.
Collapse
Affiliation(s)
- Dmytro Fishman
- Department of Computer Science, University of Tartu, Narva Str 20, Tartu, 51009, Estonia
| | - Sten-Oliver Salumaa
- Department of Computer Science, University of Tartu, Narva Str 20, Tartu, 51009, Estonia
| | - Daniel Majoral
- Department of Computer Science, University of Tartu, Narva Str 20, Tartu, 51009, Estonia
| | - Tõnis Laasfeld
- Department of Computer Science, University of Tartu, Narva Str 20, Tartu, 51009, Estonia.,Chair of Bioorganic Chemistry, Institute of Chemistry, University of Tartu, Ravila, Estonia
| | | | | | | | - Kaupo Palo
- PerkinElmer Cellular Technologies, Germany GmbH, Hamburg, Germany
| | - Leopold Parts
- Department of Computer Science, University of Tartu, Narva Str 20, Tartu, 51009, Estonia.,Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| |
Collapse
|
38
|
GPCR_LigandClassify.py; a rigorous machine learning classifier for GPCR targeting compounds. Sci Rep 2021; 11:9510. [PMID: 33947911 PMCID: PMC8097070 DOI: 10.1038/s41598-021-88939-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Accepted: 04/12/2021] [Indexed: 02/02/2023] Open
Abstract
The current study describes the construction of various ligand-based machine learning models to be used for drug-repurposing against the family of G-Protein Coupled Receptors (GPCRs). In building these models, we collected > 500,000 data points, encompassing experimentally measured molecular association data of > 160,000 unique ligands against > 250 GPCRs. These data points were retrieved from the GPCR-Ligand Association (GLASS) database. We have used diverse molecular featurization methods to describe the input molecules. Multiple supervised ML algorithms were developed, tested and compared for their accuracy, F scores, as well as for their Matthews' correlation coefficient scores (MCC). Our data suggest that combined with molecular fingerprinting, ensemble decision trees and gradient boosted trees ML algorithms are on the accuracy border of the rather sophisticated deep neural nets (DNNs)-based algorithms. On a test dataset, these models displayed an excellent performance, reaching a ~ 90% classification accuracy. Additionally, we showcase a few examples where our models were able to identify interesting connections between known drugs from the Drug-Bank database and members of the GPCR family of receptors. Our findings are in excellent agreement with previously reported experimental observations in the literature. We hope the models presented in this paper synergize with the currently ongoing interest of applying machine learning modeling in the field of drug repurposing and computational drug discovery in general.
Collapse
|
39
|
Chao JT, Roskelley CD, Loewen CJR. MAPS: machine-assisted phenotype scoring enables rapid functional assessment of genetic variants by high-content microscopy. BMC Bioinformatics 2021; 22:202. [PMID: 33879063 PMCID: PMC8056608 DOI: 10.1186/s12859-021-04117-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 04/02/2021] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Genetic testing is widely used in evaluating a patient's predisposition to hereditary diseases. In the case of cancer, when a functionally impactful mutation (i.e. genetic variant) is identified in a disease-relevant gene, the patient is at elevated risk of developing a lesion in their lifetime. Unfortunately, as the rate and coverage of genetic testing has accelerated, our ability to assess the functional status of new variants has fallen behind. Therefore, there is an urgent need for more practical, streamlined and cost-effective methods for classifying variants. RESULTS To directly address this issue, we designed a new approach that uses alterations in protein subcellular localization as a key indicator of loss of function. Thus, new variants can be rapidly functionalized using high-content microscopy (HCM). To facilitate the analysis of the large amounts of imaging data, we developed a new software toolkit, named MAPS for machine-assisted phenotype scoring, that utilizes deep learning to extract and classify cell-level features. MAPS helps users leverage cloud-based deep learning services that are easy to train and deploy to fit their specific experimental conditions. Model training is code-free and can be done with limited training images. Thus, MAPS allows cell biologists to easily incorporate deep learning into their image analysis pipeline. We demonstrated an effective variant functionalization workflow that integrates HCM and MAPS to assess missense variants of PTEN, a tumor suppressor that is frequently mutated in hereditary and somatic cancers. CONCLUSIONS This paper presents a new way to rapidly assess variant function using cloud deep learning. Since most tumor suppressors have well-defined subcellular localizations, our approach could be widely applied to functionalize variants of uncertain significance and help improve the utility of genetic testing.
Collapse
Affiliation(s)
- Jesse T Chao
- Department of Cellular and Physiological Sciences, Life Sciences Institute, University of British Columbia, Vancouver, V6T1Z3, Canada.
| | - Calvin D Roskelley
- Department of Cellular and Physiological Sciences, Life Sciences Institute, University of British Columbia, Vancouver, V6T1Z3, Canada
| | - Christopher J R Loewen
- Department of Cellular and Physiological Sciences, Life Sciences Institute, University of British Columbia, Vancouver, V6T1Z3, Canada
| |
Collapse
|
40
|
Pond AJR, Hwang S, Verd B, Steventon B. A deep learning approach for staging embryonic tissue isolates with small data. PLoS One 2021; 16:e0244151. [PMID: 33417603 PMCID: PMC7793293 DOI: 10.1371/journal.pone.0244151] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 12/03/2020] [Indexed: 12/12/2022] Open
Abstract
Machine learning approaches are becoming increasingly widespread and are now present in most areas of research. Their recent surge can be explained in part due to our ability to generate and store enormous amounts of data with which to train these models. The requirement for large training sets is also responsible for limiting further potential applications of machine learning, particularly in fields where data tend to be scarce such as developmental biology. However, recent research seems to indicate that machine learning and Big Data can sometimes be decoupled to train models with modest amounts of data. In this work we set out to train a CNN-based classifier to stage zebrafish tail buds at four different stages of development using small information-rich data sets. Our results show that two and three dimensional convolutional neural networks can be trained to stage developing zebrafish tail buds based on both morphological and gene expression confocal microscopy images, achieving in each case up to 100% test accuracy scores. Importantly, we show that high accuracy can be achieved with data set sizes of under 100 images, much smaller than the typical training set size for a convolutional neural net. Furthermore, our classifier shows that it is possible to stage isolated embryonic structures without the need to refer to classic developmental landmarks in the whole embryo, which will be particularly useful to stage 3D culture in vitro systems such as organoids. We hope that this work will provide a proof of principle that will help dispel the myth that large data set sizes are always required to train CNNs, and encourage researchers in fields where data are scarce to also apply ML approaches.
Collapse
Affiliation(s)
| | - Seongwon Hwang
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Berta Verd
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - Benjamin Steventon
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
41
|
Jadhav S, Acuña S, Opstad IS, Singh Ahluwalia B, Agarwal K, Prasad DK. Artefact removal in ground truth deficient fluctuations-based nanoscopy images using deep learning. BIOMEDICAL OPTICS EXPRESS 2021; 12:191-210. [PMID: 33659075 PMCID: PMC7899514 DOI: 10.1364/boe.410617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 11/08/2020] [Accepted: 11/17/2020] [Indexed: 05/04/2023]
Abstract
Image denoising or artefact removal using deep learning is possible in the availability of supervised training dataset acquired in real experiments or synthesized using known noise models. Neither of the conditions can be fulfilled for nanoscopy (super-resolution optical microscopy) images that are generated from microscopy videos through statistical analysis techniques. Due to several physical constraints, a supervised dataset cannot be measured. Further, the non-linear spatio-temporal mixing of data and valuable statistics of fluctuations from fluorescent molecules that compete with noise statistics. Therefore, noise or artefact models in nanoscopy images cannot be explicitly learned. Here, we propose a robust and versatile simulation-supervised training approach of deep learning auto-encoder architectures for the highly challenging nanoscopy images of sub-cellular structures inside biological samples. We show the proof of concept for one nanoscopy method and investigate the scope of generalizability across structures, and nanoscopy algorithms not included during simulation-supervised training. We also investigate a variety of loss functions and learning models and discuss the limitation of existing performance metrics for nanoscopy images. We generate valuable insights for this highly challenging and unsolved problem in nanoscopy, and set the foundation for the application of deep learning problems in nanoscopy for life sciences.
Collapse
Affiliation(s)
- Suyog Jadhav
- Indian Institute of Technology (Indian School of Mines), Dhanbad 826004, India
| | - Sebastian Acuña
- Department of Physics and Technology, UiT The Arctic University of Norway, Tromsø, Norway
| | - Ida S. Opstad
- Department of Physics and Technology, UiT The Arctic University of Norway, Tromsø, Norway
| | | | - Krishna Agarwal
- Department of Physics and Technology, UiT The Arctic University of Norway, Tromsø, Norway
| | - Dilip K. Prasad
- Department of Computer Science, UiT The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
42
|
Su R, He L, Liu T, Liu X, Wei L. Protein subcellular localization based on deep image features and criterion learning strategy. Brief Bioinform 2020; 22:6035269. [PMID: 33320936 DOI: 10.1093/bib/bbaa313] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Revised: 09/26/2020] [Accepted: 10/14/2020] [Indexed: 01/05/2023] Open
Abstract
The spatial distribution of proteome at subcellular levels provides clues for protein functions, thus is important to human biology and medicine. Imaging-based methods are one of the most important approaches for predicting protein subcellular location. Although deep neural networks have shown impressive performance in a number of imaging tasks, its application to protein subcellular localization has not been sufficiently explored. In this study, we developed a deep imaging-based approach to localize the proteins at subcellular levels. Based on deep image features extracted from convolutional neural networks (CNNs), both single-label and multi-label locations can be accurately predicted. Particularly, the multi-label prediction is quite a challenging task. Here we developed a criterion learning strategy to exploit the label-attribute relevancy and label-label relevancy. A criterion that was used to determine the final label set was automatically obtained during the learning procedure. We concluded an optimal CNN architecture that could give the best results. Besides, experiments show that compared with the hand-crafted features, the deep features present more accurate prediction with less features. The implementation for the proposed method is available at https://github.com/RanSuLab/ProteinSubcellularLocation.
Collapse
Affiliation(s)
- Ran Su
- School of Computer Software, College of Intelligence and Computing, Tianjin University, China
| | - Linlin He
- School of Computer Software, College of Intelligence and Computing, Tianjin University, China
| | - Tianling Liu
- School of Computer Software, College of Intelligence and Computing, Tianjin University, China
| | - Xiaofeng Liu
- Key Laboratory of Breast Cancer Prevention and Therapy, Ministry of Education, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center of Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, China
| | - Leyi Wei
- School of Software, Shandong University, China
| |
Collapse
|
43
|
Cong H, Liu H, Chen Y, Cao Y. Self-evoluting framework of deep convolutional neural network for multilocus protein subcellular localization. Med Biol Eng Comput 2020; 58:3017-3038. [PMID: 33078303 DOI: 10.1007/s11517-020-02275-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 10/14/2020] [Indexed: 12/12/2022]
Abstract
In the present paper, deep convolutional neural network (DCNN) is applied to multilocus protein subcellular localization as it is more suitable for multi-class classification. There are two main problems with this application. First, the appropriate features for correlation between multiple sites are hard to find. Second, the classifier structure is difficult to determine as it is greatly affected by the distribution of classified data. To solve these problems, a self-evoluting framework using DCNNs for multilocus protein subcellular localization is proposed. It has three characteristics that the previous algorithms do not. The first is that it combines the ant colony algorithm with the DCNN to form a self-evoluting algorithm for multilocus protein subcellular localization. The second is that it randomly groups subcellular sites using a limited random k-labelsets multi-label classification method. It also solves complex problems in a divide-and-conquer approach and proposes a flexible expansion model. The third is that it realizes the random selection feature extraction method in the positioning process and avoids the defects in individual feature extraction methods. The algorithm in the present paper is tested on the human database, and the overall correct rate is 67.17%, which is higher than that for the stacked self-encoder (SAE), support vector machine (SVM), random forest classifier (RF), or single deep convolutional neural network.Graphical abstract The algorithm mentioned in the present paper mainly includes four parts. They are protein sequence data preprocessing, integrated DCNN model construction, finding optimal DCNN combination by ant colony optimization, and protein subcellular localization for sequences. These parts are sequential relationships and the data obtained in the previous part is the basis for the latter part of the function. In the part of data preprocessing, the limited RAkEL multi-label classification method is used to randomly group subcellular sites. At the same time, the feature fusion of protein sequences is carried out by using multiple feature extraction methods. Each combination including features and sites information corresponds to a DCNN model. In the part of finding optimal DCNN combination by ant colony optimization, the main purpose is to find the best combination of DCNN models through the global optimization ability of the ant colony algorithm. The positioning of sequences is mainly to obtain multilocus subcellular localization by the optimal model combination.
Collapse
Affiliation(s)
- Hanhan Cong
- School of Information Science and Engineering, Shandong Normal University, No. 88, Wenhua East Road, Jinan City, China.,Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Shandong Normal University, Jinan, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, No. 88, Wenhua East Road, Jinan City, China. .,Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Shandong Normal University, Jinan, China.
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China
| | - Yi Cao
- School of Information Science and Engineering, University of Jinan, Jinan, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China
| |
Collapse
|
44
|
Schormann W, Hariharan S, Andrews DW. A reference library for assigning protein subcellular localizations by image-based machine learning. J Cell Biol 2020; 219:133635. [PMID: 31968357 PMCID: PMC7055006 DOI: 10.1083/jcb.201904090] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 09/30/2019] [Accepted: 12/15/2019] [Indexed: 12/11/2022] Open
Abstract
Confocal micrographs of EGFP fusion proteins localized at key cell organelles in murine and human cells were acquired for use as subcellular localization landmarks. For each of the respective 789,011 and 523,319 optically validated cell images, morphology and statistical features were measured. Machine learning algorithms using these features permit automated assignment of the localization of other proteins and dyes in both cell types with very high accuracy. Automated assignment of subcellular localizations for model tail-anchored proteins with randomly mutated C-terminal targeting sequences allowed the discovery of motifs responsible for targeting to mitochondria, endoplasmic reticulum, and the late secretory pathway. Analysis of directed mutants enabled refinement of these motifs and characterization of protein distributions in within cellular subcompartments.
Collapse
Affiliation(s)
- Wiebke Schormann
- Biological Sciences, Sunnybrook Research Institute, Toronto, Canada
| | | | - David W Andrews
- Biological Sciences, Sunnybrook Research Institute, Toronto, Canada.,Department of Biochemistry, University of Toronto, Toronto, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Canada
| |
Collapse
|
45
|
He Y, Shen Z, Zhang Q, Wang S, Huang DS. A survey on deep learning in DNA/RNA motif mining. Brief Bioinform 2020; 22:5916939. [PMID: 33005921 PMCID: PMC8293829 DOI: 10.1093/bib/bbaa229] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Revised: 08/19/2020] [Accepted: 08/24/2020] [Indexed: 01/18/2023] Open
Abstract
DNA/RNA motif mining is the foundation of gene function research. The DNA/RNA motif mining plays an extremely important role in identifying the DNA- or RNA-protein binding site, which helps to understand the mechanism of gene regulation and management. For the past few decades, researchers have been working on designing new efficient and accurate algorithms for mining motif. These algorithms can be roughly divided into two categories: the enumeration approach and the probabilistic method. In recent years, machine learning methods had made great progress, especially the algorithm represented by deep learning had achieved good performance. Existing deep learning methods in motif mining can be roughly divided into three types of models: convolutional neural network (CNN) based models, recurrent neural network (RNN) based models, and hybrid CNN–RNN based models. We introduce the application of deep learning in the field of motif mining in terms of data preprocessing, features of existing deep learning architectures and comparing the differences between the basic deep learning models. Through the analysis and comparison of existing deep learning methods, we found that the more complex models tend to perform better than simple ones when data are sufficient, and the current methods are relatively simple compared with other fields such as computer vision, language processing (NLP), computer games, etc. Therefore, it is necessary to conduct a summary in motif mining by deep learning, which can help researchers understand this field.
Collapse
Affiliation(s)
- Ying He
- computer science and technology at Tongji University, China
| | - Zhen Shen
- computer science and technology at Tongji University, China
| | - Qinhu Zhang
- computer science and technology at Tongji University, China
| | - Siguo Wang
- computer science and technology at Tongji University, China
| | - De-Shuang Huang
- Institute of Machines Learning and Systems Biology, Tongji University
| |
Collapse
|
46
|
Xu YY, Zhou H, Murphy RF, Shen HB. Consistency and variation of protein subcellular location annotations. Proteins 2020; 89:242-250. [PMID: 32935893 DOI: 10.1002/prot.26010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 07/09/2020] [Accepted: 09/13/2020] [Indexed: 11/09/2022]
Abstract
A major challenge for protein databases is reconciling information from diverse sources. This is especially difficult when some information consists of secondary, human-interpreted rather than primary data. For example, the Swiss-Prot database contains curated annotations of subcellular location that are based on predictions from protein sequence, statements in scientific articles, and published experimental evidence. The Human Protein Atlas (HPA) consists of millions of high-resolution microscopic images that show protein spatial distribution on a cellular and subcellular level. These images are manually annotated with protein subcellular locations by trained experts. The image annotations in HPA can capture the variation of subcellular location across different cell lines, tissues, or tissue states. Systematic investigation of the consistency between HPA and Swiss-Prot assignments of subcellular location, which is important for understanding and utilizing protein location data from the two databases, has not been described previously. In this paper, we quantitatively evaluate the consistency of subcellular location annotations between HPA and Swiss-Prot at multiple levels, as well as variation of protein locations across cell lines and tissues. Our results show that annotations of these two databases differ significantly in many cases, leading to proposed procedures for deriving and integrating the protein subcellular location data. We also find that proteins having highly variable locations are more likely to be biomarkers of diseases, providing support for incorporating analysis of subcellular location in protein biomarker identification and screening.
Collapse
Affiliation(s)
- Ying-Ying Xu
- School of Biomedical Engineering, Southern Medical University, Guangzhou, China.,Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China.,Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Hang Zhou
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
| | - Robert F Murphy
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
47
|
Asgharzadeh P, Birkhold AI, Trivedi Z, Özdemir B, Reski R, Röhrle O. A NanoFE simulation-based surrogate machine learning model to predict mechanical functionality of protein networks from live confocal imaging. Comput Struct Biotechnol J 2020; 18:2774-2788. [PMID: 33101614 PMCID: PMC7559262 DOI: 10.1016/j.csbj.2020.09.024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 09/12/2020] [Accepted: 09/13/2020] [Indexed: 02/07/2023] Open
Abstract
Sub-cellular mechanics plays a crucial role in a variety of biological functions and dysfunctions. Due to the strong structure-function relationship in cytoskeletal protein networks, light can be shed on their mechanical functionality by investigating their structures. Here, we present a data-driven approach employing a combination of confocal live imaging of fluorescent tagged protein networks, in silico mechanical experiments and machine learning to investigate this relationship. Our designed image processing and nanoFE mechanical simulation framework resolves the structure and mechanical behaviour of cytoskeletal networks and the developed gradient boosting surrogate models linking network structure to its functionality. In this study, for the first time, we perform mechanical simulations of Filamentous Temperature Sensitive Z (FtsZ) complex protein networks with realistic network geometry depicting its skeletal functionality inside organelles (here, chloroplasts) of the moss Physcomitrella patens. Training on synthetically produced simulation data enables predicting the mechanical characteristics of FtsZ network purely based on its structural features (R2⩾0.93), therefore allowing to extract structural principles enabling specific mechanical traits of FtsZ, such as load bearing and resistance to buckling failure in case of large network deformation.
Collapse
Affiliation(s)
- Pouyan Asgharzadeh
- Institute for Modelling and Simulation of Biomechanical Systems, University of Stuttgart, Stuttgart, Germany.,Stuttgart Center for Simulation Science (SC SimTech), Stuttgart, Germany
| | - Annette I Birkhold
- Institute for Modelling and Simulation of Biomechanical Systems, University of Stuttgart, Stuttgart, Germany.,Stuttgart Center for Simulation Science (SC SimTech), Stuttgart, Germany
| | - Zubin Trivedi
- Institute for Modelling and Simulation of Biomechanical Systems, University of Stuttgart, Stuttgart, Germany
| | - Bugra Özdemir
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Freiburg, Germany.,Signalling Research Centres BIOSS and CIBSS, Freiburg, Germany
| | - Ralf Reski
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Freiburg, Germany.,Signalling Research Centres BIOSS and CIBSS, Freiburg, Germany.,Cluster of Excellence livMatS @ FIT - Freiburg Centre for Interactive Materials and Bioinspired Technologies, Freiburg, Germany
| | - Oliver Röhrle
- Institute for Modelling and Simulation of Biomechanical Systems, University of Stuttgart, Stuttgart, Germany.,Stuttgart Center for Simulation Science (SC SimTech), Stuttgart, Germany
| |
Collapse
|
48
|
Automated classification of protein subcellular localization in immunohistochemistry images to reveal biomarkers in colon cancer. BMC Bioinformatics 2020; 21:398. [PMID: 32907537 PMCID: PMC7487883 DOI: 10.1186/s12859-020-03731-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Accepted: 08/31/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Protein biomarkers play important roles in cancer diagnosis. Many efforts have been made on measuring abnormal expression intensity in biological samples to identity cancer types and stages. However, the change of subcellular location of proteins, which is also critical for understanding and detecting diseases, has been rarely studied. RESULTS In this work, we developed a machine learning model to classify protein subcellular locations based on immunohistochemistry images of human colon tissues, and validated the ability of the model to detect subcellular location changes of biomarker proteins related to colon cancer. The model uses representative image patches as inputs, and integrates feature engineering and deep learning methods. It achieves 92.69% accuracy in classification of new proteins. Two validation datasets of colon cancer biomarkers derived from published literatures and the human protein atlas database respectively are employed. It turns out that 81.82 and 65.66% of the biomarker proteins can be identified to change locations. CONCLUSIONS Our results demonstrate that using image patches and combining predefined and deep features can improve the performance of protein subcellular localization, and our model can effectively detect biomarkers based on protein subcellular translocations. This study is anticipated to be useful in annotating unknown subcellular localization for proteins and discovering new potential location biomarkers.
Collapse
|
49
|
Verma R, Mehrotra R, Rane C, Tiwari R, Agariya AK. Synthetic image augmentation with generative adversarial network for enhanced performance in protein classification. Biomed Eng Lett 2020; 10:443-452. [PMID: 32850179 DOI: 10.1007/s13534-020-00162-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 05/02/2020] [Accepted: 06/29/2020] [Indexed: 01/22/2023] Open
Abstract
Proteins are complex macromolecules accountable for the biological processes in the cell. In biomedical research, the images of protein are extensively used in medicine. The rate at which these images are produced makes it difficult to evaluate them manually and hence there exists a need to automate the system. The quality of images is still a major issue. In this paper, we present the use of different image enhancement techniques that improves the contrast of these images. Besides the quality of images, the challenge of gathering such datasets in the field of medicine persists. We use generative adversarial networks for generating synthetic samples to ameliorate the results of CNN. The performance of the synthetic data augmentation was compared with the classic data augmentation on the classification task, an increase of 2.7% in Macro F1 and 2.64% in Micro F1 score was observed. Our best results were obtained by the pretrained Inception V4 model that gave a fivefold cross-validated macro F1 of 0.603. The achieved results are contrasted with the existing work and comparisons show that the proposed method outperformed.
Collapse
Affiliation(s)
- Rohit Verma
- Soft Computing and Expert Systems Laboratory, ABV-IIITM, Gwalior, M.P. 474015 India
| | - Raj Mehrotra
- Soft Computing and Expert Systems Laboratory, ABV-IIITM, Gwalior, M.P. 474015 India
| | - Chinmay Rane
- Soft Computing and Expert Systems Laboratory, ABV-IIITM, Gwalior, M.P. 474015 India
| | - Ritu Tiwari
- Soft Computing and Expert Systems Laboratory, ABV-IIITM, Gwalior, M.P. 474015 India
| | - Arun Kumar Agariya
- Soft Computing and Expert Systems Laboratory, ABV-IIITM, Gwalior, M.P. 474015 India
| |
Collapse
|
50
|
Steigele S, Siegismund D, Fassler M, Kustec M, Kappler B, Hasaka T, Yee A, Brodte A, Heyse S. Deep Learning-Based HCS Image Analysis for the Enterprise. SLAS DISCOVERY 2020; 25:812-821. [PMID: 32432952 PMCID: PMC7372584 DOI: 10.1177/2472555220918837] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Drug discovery programs are moving increasingly toward phenotypic imaging assays to model disease-relevant pathways and phenotypes in vitro. These assays offer richer information than target-optimized assays by investigating multiple cellular pathways simultaneously and producing multiplexed readouts. However, extracting the desired information from complex image data poses significant challenges, preventing broad adoption of more sophisticated phenotypic assays. Deep learning-based image analysis can address these challenges by reducing the effort required to analyze large volumes of complex image data at a quality and speed adequate for routine phenotypic screening in pharmaceutical research. However, while general purpose deep learning frameworks are readily available, they are not readily applicable to images from automated microscopy. During the past 3 years, we have optimized deep learning networks for this type of data and validated the approach across diverse assays with several industry partners. From this work, we have extracted five essential design principles that we believe should guide deep learning-based analysis of high-content images and multiparameter data: (1) insightful data representation, (2) automation of training, (3) multilevel quality control, (4) knowledge embedding and transfer to new assays, and (5) enterprise integration. We report a new deep learning-based software that embodies these principles, Genedata Imagence, which allows screening scientists to reliably detect stable endpoints for primary drug response, assess toxicity and safety-relevant effects, and discover new phenotypes and compound classes. Furthermore, we show how the software retains expert knowledge from its training on a particular assay and successfully reapplies it to different, novel assays in an automated fashion.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Ada Yee
- Genedata AG, Basel, Switzerland
| | | | | |
Collapse
|